Medical imaging workflows integrate radiology images with
their corresponding free-text reports. Large language
models (LLMs) and large vision–language models (LVLMs)
achieve strong results but face deployment barriers in
hospitals due to computational demands, privacy risks, and
infrastructure needs. Small language models (SLMs) and
small vision–language models (SVLMs), typically under 10B
parameters, provide a more efficient and auditable
alternative for on-premise, privacy-preserving applications
in radiology. Recent advancements, including CheXzero,
MedCLIP, XrayGPT, LLaVA-Med, MedFILIP, and MedBridge, show
that smaller multimodal models support classification,
retrieval, and report generation. Complementary baselines
from lightweight SLMs such as DistilBERT, TinyBERT,
BioClinicalBERT, and T5-Small highlight opportunities for
radiology report understanding.
Building on these efforts, we propose a reproducible
evaluation framework anchored on MIMIC-CXR, with potential
extensions to CT, MRI, and ophthalmology datasets. Our
framework integrates task metrics such as AUROC, ROUGE, and
F1 score, together with efficiency measures including VRAM
usage, latency, and model size, alongside trust dimensions
like factuality, calibration, and robustness. We also
conduct ablation studies on model architecture, tokenizers,
and parameter-efficient fine-tuning (e.g., qLoRA), while
analyzing trade-offs between accuracy, efficiency, and
stability. This work establishes reproducible baselines and
guidance for deploying radiology AI.