Engineering · Agentic Systems

Agentic Capabilities

Most agentic demos work once. Making them work reliably in production — across real users, edge cases and tool failures — is an engineering discipline. This is how LMXAI designs agentic systems, and the real tool ecosystem they operate.

Orchestration

LangGraph · LangChain

Tool protocol

MCP integration

Tool reliability

LoRA fine-tune · BFCL-v3

Deployed in

Savion · mdGPT

The approach

LMXAI builds agentic systems on LangGraph and LangChain, with external capabilities integrated through the Model Context Protocol (MCP). But orchestration is only half the problem — the harder half is making the underlying model reliable at calling those tools.

That reliability is engineered, not assumed. The base model is fine-tuned for tool-use (LoRA / QLoRA) and validated on BFCL-v3, HumanEval and GSM8K with evalscope before it enters an agent loop — the same process that took mdAgent-Hermes-32B to 82.3% single-turn accuracy. Production hardening then adds structured retries, human-in-the-loop checkpoints and end-to-end tracing with OpenTelemetry and Phoenix.

The failure modes we design against

Tool-use hallucination — the model calls a real tool with wrong or invented arguments.
Unrecoverable state — a failed step has no retry logic or fallback path.
Silent errors — the agent continues past a failure without surfacing it to the user or the trace.
Runaway loops — no termination condition, no budget for steps or tokens.

Design patterns

Six patterns applied consistently across LMXAI's production agentic systems.

Pattern 01

Graph-based orchestration

Agents are modelled as directed graphs in LangGraph rather than free-form loops. Every state transition is explicit, conditional and inspectable at each step.

Why: Debugging a graph trace is a diff; debugging a free-form loop is archaeology.

Pattern 02

Tool reliability via fine-tuning

Off-the-shelf models aren't reliable enough at function calling for production. The base model is fine-tuned on curated tool-use data and validated against BFCL-v3 before deployment.

Why: mdAgent-Hermes-32B reached 82.3% single-turn accuracy through exactly this process.

Pattern 03

MCP tool integration

External capabilities are exposed over the Model Context Protocol — a consistent, schema-validated interface the model uses to discover and call tools without hardcoded glue code.

Why: Adding or swapping a tool becomes a schema change, not a code change.

Pattern 04

Human-in-the-loop checkpoints

High-stakes workflows surface explicit approval gates before consequential actions, using LangGraph's interrupt/resume to keep a human in control without breaking the execution graph.

Why: In healthcare and finance, an agent that can't be paused is a liability.

Pattern 05

Structured retry & fallback

Every tool-calling node has a retry budget with backoff, output-schema validation, and a fallback branch that degrades gracefully instead of propagating a corrupt state.

Why: A transient API timeout shouldn't abort a 10-step clinical workflow.

Pattern 06

End-to-end observability

Every step emits a structured trace via OpenTelemetry and Phoenix: which node ran, the model in/out, which tool was called with what arguments, and step latency.

Why: You can't improve what you can't see — and regulators increasingly require audit trails.

The tool ecosystem

A concrete example: the workspace toolset exposed to mdGPT's agents over MCP spans far more than chat — it lets the model act across documents, data, knowledge systems and the web.

Document processing

Read PDF / DOCX / PPTX
OCR pipeline (PyMuPDF + RapidOCR)
Create & edit documents
Export to DOCX / XLSX

Tabular data analysis

Schema, preview & summary (Excel/CSV)
SQL-like filtering & group-by
Value distributions
Correlation analysis

Search & retrieval

Semantic / vector search
Knowledge-base querying
Workspace-wide grep
In-file search

Knowledge & notes

Notion pages & databases
Create / append pages
Notes management
Chat history search

Calendar & tasks

Search / create events
Update & delete events
Todo lists
Scheduled automations

Database (Snowflake)

List tables
Inspect table schema
Execute SQL queries
Structured result handling

Financial calculators

ROI, VAT, discounts
Loan & compound interest
Percentages
Currency conversion

Web

Fetch URL content
Web search
Source-grounded responses
Live information retrieval

File & workspace management

Browse / list workspace
Move, copy, rename
Create directories
Download links

Applied in production

Clinical AI

Savion

LangGraph diet-planning agent with human approval before clinical plan delivery — 2× clinician productivity.

Case study

LLM Gateway

mdGPT Gateway

SSE-streaming multi-step agent over the full workspace toolset, with per-user isolation and OTel tracing on every step.

Case study

Model · Fine-tune

mdAgent-Hermes-32B

LoRA fine-tune validated at 82.3% single-turn tool accuracy — the reliable foundation under every agent.

Case study

Stack

LangGraphLangChainMCPFastAPILoRA / QLoRABFCL-v3HumanEvalGSM8KevalscopeOpenTelemetryPhoenix

All work Next: Savion