Base model
Qwen3-VL-32B
A multimodal tool-use LoRA fine-tune of Qwen3-VL-32B, optimized for reliable function calling and document understanding — quantized to run on a single on-prem GPU node.
Off-the-shelf models are rarely reliable enough at tool use for production agents — they hallucinate arguments, miss schemas and break on multimodal input. mdAgent-Hermes-32B is a targeted fine-tune that makes a 32B vision-language model dependable at calling the right tool with the right arguments.
Trained with LoRA on curated multimodal tool-use data and then quantized for efficient serving, the model keeps strong function-calling accuracy while fitting a realistic on-prem hardware budget.