September 18, 2025
Large Language Models Fail to Reliably Diagnose Pediatric Pneumonia on Chest Radiographs
Pneumonia remains a leading cause of illness and death among children globally. Chest radiographs are commonly used to support diagnosis, but differentiating bacterial from viral pneumonia remains a challenge. A recent study assessed whether large language models (LLMs) with vision capabilities could accurately classify pediatric chest radiographs as bacterial, viral, or normal. Four publicly available LLMs were evaluated on 44 pediatric cases. Each model produced multiple readings, yielding 352 total interpretations. Results demonstrated an average diagnostic accuracy of only 31%, consistent with chance performance in a three-class task. Viral pneumonia was most frequently identified, while normal images were least reliably classified. Internal consistency and concordance with human experts were low, leading to early study termination for futility. These findings indicate that current general-purpose LLMs are not reliable diagnostic tools in pediatric radiology and highlight the need for purpose-built systems with rigorous clinical validation.
Citation: Gillette J, Lu M, Heston TF. Large language models perform at chance level in the diagnosis of pediatric pneumonia using chest radiographs. Cureus. 2025;17(9):e92596. doi:10.7759/cureus.92596