Back to work
Engineering · Agentic Systems

Agentic Capabilities

Most agentic demos work once. Making them work reliably in production — across real users, edge cases and tool failures — is an engineering discipline. This is how LMXAI designs agentic systems, and the real tool ecosystem they operate.

Orchestration
LangGraph · LangChain
Tool protocol
MCP integration
Tool reliability
LoRA fine-tune · BFCL-v3
Deployed in
Savion · mdGPT

The approach

LMXAI builds agentic systems on LangGraph and LangChain, with external capabilities integrated through the Model Context Protocol (MCP). But orchestration is only half the problem — the harder half is making the underlying model reliable at calling those tools.

That reliability is engineered, not assumed. The base model is fine-tuned for tool-use (LoRA / QLoRA) and validated on BFCL-v3, HumanEval and GSM8K with evalscope before it enters an agent loop — the same process that took mdAgent-Hermes-32B to 82.3% single-turn accuracy. Production hardening then adds structured retries, human-in-the-loop checkpoints and end-to-end tracing with OpenTelemetry and Phoenix.

The failure modes we design against

  • Tool-use hallucination — the model calls a real tool with wrong or invented arguments.
  • Unrecoverable state — a failed step has no retry logic or fallback path.
  • Silent errors — the agent continues past a failure without surfacing it to the user or the trace.
  • Runaway loops — no termination condition, no budget for steps or tokens.

Design patterns

Six patterns applied consistently across LMXAI's production agentic systems.

Pattern 01

Graph-based orchestration

Agents are modelled as directed graphs in LangGraph rather than free-form loops. Every state transition is explicit, conditional and inspectable at each step.

Why: Debugging a graph trace is a diff; debugging a free-form loop is archaeology.
Pattern 02

Tool reliability via fine-tuning

Off-the-shelf models aren't reliable enough at function calling for production. The base model is fine-tuned on curated tool-use data and validated against BFCL-v3 before deployment.

Why: mdAgent-Hermes-32B reached 82.3% single-turn accuracy through exactly this process.
Pattern 03

MCP tool integration

External capabilities are exposed over the Model Context Protocol — a consistent, schema-validated interface the model uses to discover and call tools without hardcoded glue code.

Why: Adding or swapping a tool becomes a schema change, not a code change.
Pattern 04

Human-in-the-loop checkpoints

High-stakes workflows surface explicit approval gates before consequential actions, using LangGraph's interrupt/resume to keep a human in control without breaking the execution graph.

Why: In healthcare and finance, an agent that can't be paused is a liability.
Pattern 05

Structured retry & fallback

Every tool-calling node has a retry budget with backoff, output-schema validation, and a fallback branch that degrades gracefully instead of propagating a corrupt state.

Why: A transient API timeout shouldn't abort a 10-step clinical workflow.
Pattern 06

End-to-end observability

Every step emits a structured trace via OpenTelemetry and Phoenix: which node ran, the model in/out, which tool was called with what arguments, and step latency.

Why: You can't improve what you can't see — and regulators increasingly require audit trails.

The tool ecosystem

A concrete example: the workspace toolset exposed to mdGPT's agents over MCP spans far more than chat — it lets the model act across documents, data, knowledge systems and the web.

Document processing

  • Read PDF / DOCX / PPTX
  • OCR pipeline (PyMuPDF + RapidOCR)
  • Create & edit documents
  • Export to DOCX / XLSX

Tabular data analysis

  • Schema, preview & summary (Excel/CSV)
  • SQL-like filtering & group-by
  • Value distributions
  • Correlation analysis

Search & retrieval

  • Semantic / vector search
  • Knowledge-base querying
  • Workspace-wide grep
  • In-file search

Knowledge & notes

  • Notion pages & databases
  • Create / append pages
  • Notes management
  • Chat history search

Calendar & tasks

  • Search / create events
  • Update & delete events
  • Todo lists
  • Scheduled automations

Database (Snowflake)

  • List tables
  • Inspect table schema
  • Execute SQL queries
  • Structured result handling

Financial calculators

  • ROI, VAT, discounts
  • Loan & compound interest
  • Percentages
  • Currency conversion

Web

  • Fetch URL content
  • Web search
  • Source-grounded responses
  • Live information retrieval

File & workspace management

  • Browse / list workspace
  • Move, copy, rename
  • Create directories
  • Download links

Applied in production

Clinical AI

Savion

LangGraph diet-planning agent with human approval before clinical plan delivery — 2× clinician productivity.

Case study
LLM Gateway

mdGPT Gateway

SSE-streaming multi-step agent over the full workspace toolset, with per-user isolation and OTel tracing on every step.

Case study
Model · Fine-tune

mdAgent-Hermes-32B

LoRA fine-tune validated at 82.3% single-turn tool accuracy — the reliable foundation under every agent.

Case study

Stack

LangGraphLangChainMCPFastAPILoRA / QLoRABFCL-v3HumanEvalGSM8KevalscopeOpenTelemetryPhoenix
All work Next: Savion