Most agentic demos work once. Making them work reliably in production — across real users, edge cases and tool failures — is an engineering discipline. This is how LMXAI designs agentic systems, and the real tool ecosystem they operate.
LMXAI builds agentic systems on LangGraph and LangChain, with external capabilities integrated through the Model Context Protocol (MCP). But orchestration is only half the problem — the harder half is making the underlying model reliable at calling those tools.
That reliability is engineered, not assumed. The base model is fine-tuned for tool-use (LoRA / QLoRA) and validated on BFCL-v3, HumanEval and GSM8K with evalscope before it enters an agent loop — the same process that took mdAgent-Hermes-32B to 82.3% single-turn accuracy. Production hardening then adds structured retries, human-in-the-loop checkpoints and end-to-end tracing with OpenTelemetry and Phoenix.
Six patterns applied consistently across LMXAI's production agentic systems.
Agents are modelled as directed graphs in LangGraph rather than free-form loops. Every state transition is explicit, conditional and inspectable at each step.
Off-the-shelf models aren't reliable enough at function calling for production. The base model is fine-tuned on curated tool-use data and validated against BFCL-v3 before deployment.
External capabilities are exposed over the Model Context Protocol — a consistent, schema-validated interface the model uses to discover and call tools without hardcoded glue code.
High-stakes workflows surface explicit approval gates before consequential actions, using LangGraph's interrupt/resume to keep a human in control without breaking the execution graph.
Every tool-calling node has a retry budget with backoff, output-schema validation, and a fallback branch that degrades gracefully instead of propagating a corrupt state.
Every step emits a structured trace via OpenTelemetry and Phoenix: which node ran, the model in/out, which tool was called with what arguments, and step latency.
A concrete example: the workspace toolset exposed to mdGPT's agents over MCP spans far more than chat — it lets the model act across documents, data, knowledge systems and the web.
LangGraph diet-planning agent with human approval before clinical plan delivery — 2× clinician productivity.
Case studySSE-streaming multi-step agent over the full workspace toolset, with per-user isolation and OTel tracing on every step.
Case studyLoRA fine-tune validated at 82.3% single-turn tool accuracy — the reliable foundation under every agent.
Case study