Back to work
Infrastructure · Gateway

mdGPT Gateway

A FastAPI-based LLM gateway with token streaming, full observability and per-user namespace isolation — the production backbone behind multi-tenant chat and deep-research products.

Role
Backend architecture & build
Domain
LLM infrastructure
Core stack
FastAPI · Redis · SSE
Status
In production

Overview

Shipping LLM products to real users means solving the unglamorous parts: streaming responses with low latency, isolating each tenant's data, and seeing exactly what the model did when something goes wrong. mdGPT Gateway is the production layer that makes those guarantees.

It sits between applications and the inference stack, exposing a clean streaming API while enforcing per-user isolation and emitting full traces for every request — so chat assistants and long-running deep-research agents stay observable and safe to operate.

What it provides

  • Token streaming over Server-Sent Events for responsive, real-time output.
  • Per-user namespace isolation — strict multi-tenant data separation.
  • Full observability via OpenTelemetry traces and metrics on every call.
  • Async throughput with asyncpg and Redis for high concurrency.

Highlights

SSELow-latency streaming
Multi-tenantPer-user isolation
OTelEnd-to-end observability
AsyncHigh-concurrency core

In production

A deep-research assistant powered by the gateway, streaming a long-form, source-backed report end to end.

Stack

PythonFastAPIasyncpgRedisSSEOpenTelemetryPhoenixvLLM
All work Next: Agentic Capabilities