Academics
October 6, 2025
Fragility Metrics as Legal Standards in Gun Control Research
In this article, I propose two statistical fragility diagnostics for evaluating gun policy evidence in legal proceedings. The Percent Fragility Index (PFI) indicates the minimum percentage of outcomes that would need to change to reverse a study’s statistical significance. In contrast, the Risk Quotient (RQ) indicates the percentage that would need to change to eliminate…
October 2, 2025
The Relative Risk Index: A Complementary Metric for Assessing Statistical Fragility in Orthopaedic Surgery Research
My recent letter to the editor in the Journal of the American Academy of Orthopaedic Surgeons addresses an important gap in how we evaluate the robustness of medical research findings. The fragility index has become a popular tool for assessing statistical fragility—it identifies the minimum number of outcome changes needed to flip a study’s statistical…
September 20, 2025
A Unified Framework for Fragility Metrics in 2×2 Trials
My new article, “A Unified Framework for Fragility Metrics in 2×2 Trials,” is now online: https://doi.org/10.5281/zenodo.17167247 It introduces standardized definitions and computational rules for seven measures of robustness in binary outcome studies: Fragility Index (FI), Fragility Quotient (FQ), Intervention Fragility Quotient (IFQ), normalized IFQ, Percent Fragility Index (PFI), Relative Risk Index (RRI), and Robustness Index…
September 18, 2025
Large Language Models Fail to Reliably Diagnose Pediatric Pneumonia on Chest Radiographs
Pneumonia remains a leading cause of illness and death among children globally. Chest radiographs are commonly used to support diagnosis, but differentiating bacterial from viral pneumonia remains a challenge. A recent study assessed whether large language models (LLMs) with vision capabilities could accurately classify pediatric chest radiographs as bacterial, viral, or normal. Four publicly available…
September 9, 2025
Co-Editing Special Collection: Large Language Models for Medical Applications
I served as co-editor for a special collection in Frontiers in Medicine focused on large language models in medical applications. Our editorial team worked to curate research examining the intersection of AI technology and healthcare practice. The introductory editorial, published May 2025, establishes the framework for this special collection by addressing critical implementation challenges facing…
July 26, 2025
Large Language Models Display Distinct Personality Traits in Clinical Testing
Researchers conducted the first comprehensive psychometric evaluation of four leading large language models (LLMs) to assess their personality profiles using validated psychological instruments. The study tested ChatGPT-3.5, Gemini Advanced, Claude 3 Opus, and Grok-Regular using the Open Extended Jungian Type Scales and Big Five Personality Test. Results revealed statistically significant differences between models, with Claude…
April 12, 2025
The Relative Risk Index: Advancing Statistical Fragility Assessment in Orthopaedic Research
Statistical fragility remains a critical concern in medical research reliability. Dr. Thomas F. Heston proposes the Relative Risk Index (RRI) as a complementary metric to the established Fragility Index (FI) for assessing statistical robustness in orthopaedic surgery studies. While the FI identifies how many changed outcomes flip statistical significance, the RRI measures the percentage change…
April 25, 2024
Integrating Large Language Models and Blockchain Technology for Enhanced Telemedicine
The integration of large language models (LLMs) and blockchain technology holds significant potential for transforming telemedicine. LLMs can rapidly analyze vast amounts of medical data, providing personalized recommendations and augmenting diagnostic processes. Blockchain technology enables secure, decentralized storage and sharing of patient records, ensuring privacy and interoperability. The synergistic combination of these technologies can facilitate…
April 17, 2024
ChatGPT Provides Inconsistent Risk Stratification for Patients with Atraumatic Chest Pain
A recent study investigated ChatGPT-4’s ability to risk-stratify patients with atraumatic chest pain. The researchers found that while ChatGPT-4’s mean risk scores correlated well with established tools like TIMI and HEART, the AI model provided inconsistent results when presented with identical patient data on multiple occasions. This variability raises concerns about the reliability of using…
April 2, 2024
Critical Gaps in Medical Research Reporting by Online News Media
This study reveals significant shortcomings in reporting medical research by online news outlets. The study found that crucial information such as conflicts of interest, study limitations, and inferential statistics were frequently omitted from news reports. While research conclusions were generally conveyed accurately, underreporting these key elements raises concerns about the transparency and credibility of medical…
Next page