Comparative Study
AUTOMATING FRACTURE DETECTION: BENCHMARKING LANGUAGE MODELS AGAINST SPECIALIZED AI IN PLAIN RADIOGRAPHS
1 Vita-Salute University, IRCCS San Raffaele Hospital, Milan, Italy
2 IRCCS San Raffaele Hospital, Milan, Italy
Correspondence to:
IRCCS Ospedale San Raffaele,
UnitĂ Clinica di Ortopedia e Traumatologia
Via Olgettina 60,
20132, Milano, Italy
Journal of Orthopedics 2024 September-December; 16(3): 118-125
Received: 13 August 2024 Accepted: 18 September 2024
Copyright © by LAB srl 2024 ISSN 1973-6401 (print) / 3035-2916 (online)
Abstract
This study aims to compare the diagnostic capabilities of the emerging natural language AI model, ChatGPT, with Qure.ai, an established reference standard AI model, in the classification of fractures from plain radiographs. Employing a retrospective cross-sectional design, this diagnostic accuracy study was set in the Orthopedic Department of IRCSS San Raffaele Milano. A sample of 200 de-identified anteroposterior and lateral femur radiographs was utilized, equally divided into fractured and normal. Two AI models independently evaluated the radiographs, classifying them as fractured or normal, against the radiologist reports serving as the reference standard. The reference standard AI, Qure.ai, exhibited a marginally superior sensitivity (0.89 vs 0.73, p<0.01) and overall accuracy (0.92 vs 0.84) compared to ChatGPT. Both models demonstrated high specificity (>0.90), with the reference AI achieving closer-to-ideal diagnostic discrimination (AUC 0.92 vs 0.84). Fracture complexity diminished accuracy, and a strong inter-model concordance was noted. Both AI models showed a performance surpassing established clinical benchmarks, with the reference AI model slightly outperforming ChatGPT. The study’s robust methodological framework offers essential insights for the clinical application of AI in radiographic fracture diagnosis. Further studies, particularly expanded multi-center trials, are recommended to validate these findings.
Keywords: AI fracture detection, artificial intelligence, ChatGpt, femur fracture, LLM