mdAgent-Hermes-32B

Overview

Off-the-shelf models are rarely reliable enough at tool use for production agents — they hallucinate arguments, miss schemas and break on multimodal input. mdAgent-Hermes-32B is a targeted fine-tune that makes a 32B vision-language model dependable at calling the right tool with the right arguments.

Trained with LoRA on curated multimodal tool-use data and then quantized for efficient serving, the model keeps strong function-calling accuracy while fitting a realistic on-prem hardware budget.

Approach

LoRA fine-tuning of Qwen3-VL-32B on multimodal tool-use traces.
Quantization (AWQ, INT4) for low-memory, high-throughput inference.
Rigorous evaluation with evalscope across function-calling and reasoning suites.
vLLM serving for production deployment on an A100 cluster.

Overview

Approach

Evaluation

Stack