Overview
We conducted a single-center, retrospective observational study to evaluate large language models (ChatGPT 4o, GPT-5, DeepSeek) for automated interpretation of de-identified IOLMaster 700 reports provided as raster images. Models produced structured biometric extraction, toric IOL recommendation, and refractive predictions (sphere, cylinder, axis). Primary outcomes included parameter-level agreement and refractive error metrics; secondary outcomes included decision-support performance for toric IOL selection and agreement on ordered T-codes. No clinical intervention was performed.
Description
This study compares three large language models accessed in their native configurations, without fine-tuning or external tools. For each examination, the original IOLMaster 700 report image was supplied without manual annotation or pre-processing. A standardized instruction required: (i) structured extraction of AL, ACD, LT, WTW, K1/K2 and axes, ΔK, TK1/TK2 and axes, and ΔTK; (ii) binary toric candidacy and T-code according to institutional ALCON mapping; and (iii) refractive recommendations (sphere, cylinder, implantation axis). Each model generated three independent outputs per case. De-identification and IRB oversight (waiver of consent) were implemented according to institutional policy. The unit of enrollment is participants (n=54), with outcomes analyzed per eye (162 eyes) and per model generation where applicable.
Eligibility
Inclusion Criteria:
-postoperative corrected distance visual acuity (CDVA) of 0.10 logMAR or better -an absolute IOL rotational stability of less than 10∘ at the 1-month follow-up examination
Exclusion Criteria:
- incomplete biometric data on the examination report;
- a history of previous ocular surgery or ocular trauma
- the occurrence of intraoperative complications, such as an anterior capsular tear or posterior capsular rupture
- the development of significant postoperative complications, including but not limited to severe intraocular infection or inadequate pupillary dilation.