Overview
Primary Goal:
This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty.
Key Research Questions:
Diagnostic Accuracy:
Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons?
Diagnostic Completeness:
Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons?
Treatment Accuracy:
Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty?
Treatment Completeness:
Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon?
Study Design:
- Participants
20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024.
Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data).
Comparison Groups:
GPT-4 (via ChatGPT interface)
Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon)
- Method
Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors.
They must provide a diagnosis and treatment recommendations.
Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete).
Statistical analysis compares GPT-4 vs. human performance.
Expected Outcomes:
Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures.
Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making.
Ethical & Privacy Considerations:
No real-time patient data is used-only anonymized past cases.
No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface).
Study complies with GDPR, HIPAA, and ethical AI guidelines.
- Timeline
Study duration: ~8 months (from ethics approval to final analysis).
Results will be published regardless of outcome.
Why This Study Matters:
First study evaluating GPT-4's role in complex orthopedic diagnostics.
Could influence future AI-assisted clinical decision-making in joint replacement surgeries.
Eligibility
Inclusion Criteria:
- Adults (≥18 and ≤80 years old).
- Documented painful or failed total hip arthroplasty requiring clinical/radiological evaluation (2004-2024).
- Complete pre-operative clinical history, imaging (X-ray/tomography), and surgical reports.
- Clear diagnosis of failure mode (e.g., aseptic loosening, infection, fracture, wear).
- Treatment and outcomes fully documented in the institutional database.
- "Exemplary" cases with minimal diagnostic ambiguity (per Engh/MusculoSkleletal Infection Society criteria, etc.).
Exclusion Criteria:
- total hip arthroplasty with no documented failure/pain (well-functioning implants).
- Incomplete clinical/radiological records (e.g., missing pre-operative imaging or surgical notes).
- Complex/multifactorial failures (e.g., concurrent infection + loosening + fracture).
- Radiographs/images non-interpretable (poor quality, missing views).
- Cases with conflicting diagnoses/treatments in original records.