Overview
The precise treatment of primary hepatocellular carcinoma (HCC) highly depends on accurate disease staging (CNLC, TNM, BCLC) and scientific treatment decision-making, which necessitate the integration of both imaging and clinical baseline data. This study prospectively recruits HCC patients and clinical physicians across different hospital tiers to evaluate the clinical value of a self-developed artificial intelligence (AI) model in assisting multi-dimensional comprehensive assessment and treatment decision-making. Utilizing a Multi-Rater Multi-Case (MRMC) crossover balanced design, the study compares the accuracy of clinical evaluations performed by physicians under "unassisted (without AI)" versus "AI-assisted" conditions. A key focus is to explore whether AI can significantly enhance the comprehensive assessment capabilities of physicians in primary/secondary care hospitals, thereby prospectively reducing diagnostic and therapeutic heterogeneity across different institutional levels.
Description
- Study Description
Brief Summary: The precise treatment of primary hepatocellular carcinoma (HCC) highly depends on accurate disease staging (CNLC, TNM, BCLC) and scientific treatment decision-making, which necessitate the integration of both imaging and clinical baseline data. This study prospectively recruits HCC patients and clinical physicians across different hospital tiers to evaluate the clinical value of a self-developed artificial intelligence (AI) model in assisting multi-dimensional comprehensive assessment and treatment decision-making. Utilizing a Multi-Rater Multi-Case (MRMC) crossover balanced design, the study compares the accuracy of clinical evaluations performed by physicians under "unassisted (without AI)" versus "AI-assisted" conditions. A key focus is to explore whether AI can significantly enhance the comprehensive assessment capabilities of physicians in primary/secondary care hospitals, thereby prospectively reducing diagnostic and therapeutic heterogeneity across different institutional levels.
Gold Standard (Reference Standard): The reference standard (Ground Truth) for all prospectively enrolled cases is established by an independent expert panel consisting of 3 authoritative experts. The panel determines the final standard answers for the four classification tasks through blinded independent evaluation and joint discussion (voting system), incorporating complete prospective imaging data, clinical baseline data, multidisciplinary team (MDT) consensus, and final pathological or clinical follow-up results. 2. Eligibility Criteria
2.1 Evaluator Eligibility:
- Senior Physicians in Tertiary Hospitals: Employed in the department of hepato-pancreato-biliary surgery, oncology, or related departments in Class III Grade A (tertiary) hospitals, with the professional title of attending physician or above.
- Junior Physicians in Tertiary Hospitals: Employed in related departments in Class III Grade A (tertiary) hospitals, with the professional title of resident physician.
- Physicians in Primary/Secondary Care Hospitals: Clinical physicians employed in county-level or Class II general hospitals.
- Informed Consent: Must voluntarily agree to participate in the assessment and sign the informed consent form.
2.2 Patient/Case Eligibility:
Inclusion Criteria:
- Age \> 18 years.
- Patients prospectively presenting with suspected or newly diagnosed primary hepatocellular carcinoma (HCC) later confirmed by pathology or meeting the China Liver Cancer (CNLC) guidelines.
- Complete baseline clinical data acquired during the prospective enrollment period, including complete history of present/past illness, ECOG PS score, comprehensive laboratory tests (liver function, coagulation, tumor markers such as AFP, etc.), and baseline abdominal contrast-enhanced CT.
- Patients (or their legal representatives) must provide written informed consent for their clinical data to be used in this trial.
Exclusion Criteria:
- Patients with secondary (metastatic) liver cancer or concurrent severe malignancies of other systems.
- Patients who fail to complete the required baseline imaging or laboratory tests, preventing accurate staging calculation (e.g., missing data for Child-Pugh score).
- Patients who have previously received anti-tumor therapies for liver cancer prior to enrollment.
3\. Study Design
Intervention Model: Crossover Assignment Masking: Single Blind. Participating evaluators are blinded to the gold standard answers of the cases and to the evaluation results of other participating physicians.
Arms and Interventions:
Case Set Partition: 108 prospectively and consecutively enrolled eligible HCC cases are batched and randomly divided into Dataset Set A (54 cases) and Dataset Set B (54 cases). It is ensured that there are no statistically significant differences between the two sets regarding tumor burden, liver function grading, and staging distribution.
Evaluator Grouping: A total of 12 prospectively recruited clinical physicians are included, comprising 4 in the tertiary hospital senior group, 4 in the tertiary hospital junior group, and 4 in the primary/secondary hospital group. They are divided into two evaluation groups based on stratified randomization:
Group A (6 evaluators): 2 tertiary senior, 2 tertiary junior, 2 primary/secondary hospital.
Group B (6 evaluators): 2 tertiary senior, 2 tertiary junior, 2 primary/secondary hospital.
Arm 1 - Group A Evaluators:
Phase 1 Intervention (Control): Independent evaluation of Set A (54 cases) combining clinical texts and imaging data, recording 4 classification results, without AI assistance.
Phase 2 Intervention (Experimental): Evaluation of Set B (54 cases). The system presents the AI model's 4 prediction results and related evidence; physicians provide the final judgment after comprehensive reference.
Arm 2 - Group B Evaluators:
Phase 1 Intervention (Control): Independent evaluation of Set B (54 cases) combining clinical texts and imaging data, recording 4 classification results, without AI assistance.
Phase 2 Intervention (Experimental): Evaluation of Set A (54 cases). Physicians provide the final judgment after referencing the AI model's results.
4\. Outcome Measures
Primary Outcome:
Improvement in Overall Accuracy: The difference in average accuracy across the 4 classification tasks between AI-assisted evaluation (experimental group) and independent evaluation (control group).
Secondary Outcomes:
Homogenization Effect: Assessment of whether the difference in clinical evaluation accuracy between physicians in the primary/secondary hospital group and the tertiary hospital groups is significantly reduced under AI assistance.
Evaluation Efficiency: Comparison of the average evaluation time per case between physicians with and without AI assistance.
Inter-rater Agreement: Comparison of the consistency of evaluation results among physicians (e.g., using Kappa statistics), with and without AI assistance.
5\. Statistical Analysis Plan \& Sample Size
Sample Size Justification:
The sample size calculation for this study is based on the expected change in the overall average accuracy across all levels of prospectively recruited physicians. It is estimated that the overall average accuracy without AI assistance is 0.60, and with AI assistance is 0.70.
Setting the significance level for a two-sided test at 0.05 (corresponding to a Z-value of approximately 1.96) and the statistical power at 0.80 (corresponding to a Z-value of approximately 0.84), the sample size was determined using the standard statistical method for comparing two independent proportions. Assuming no clustering effect resulting from multiple case evaluations by the same physician, this calculation indicates that each intervention group requires at least 353 independent evaluations.
Power Verification:
In the actual configuration of this study, there are 12 physicians in total.
Total independent evaluations for the control group (without AI) = Group A (6 evaluators) x Set A (54 cases) + Group B (6 evaluators) x Set B (54 cases) = 648 independent evaluations.
Total independent evaluations for the experimental group (with AI) also = 648 independent evaluations.
Since 648 evaluations is greater than the required base of 353 evaluations, the current configuration of cases and physicians already possesses sufficient statistical power. This sample size provides a conservative margin (approximately 1.8 times the base requirement) to adequately account for any clustering effect (intra-class correlation) resulting from multiple case evaluations by the same physician in this MRMC design.
Eligibility
Inclusion Criteria:
- Age \>= 18 years.
- Patients prospectively presenting with suspected or newly diagnosed primary hepatocellular carcinoma (HCC) later confirmed by pathology or meeting the China Liver Cancer (CNLC) guidelines.
- Complete baseline clinical data acquired during the prospective enrollment period, including complete history of present/past illness, ECOG PS score, comprehensive laboratory tests (liver function, coagulation, tumor markers such as AFP, etc.), and baseline abdominal contrast-enhanced CT.
- Patients (or their legal representatives) must provide written informed consent for their clinical data to be used in this trial.
Exclusion Criteria:
- Patients with secondary (metastatic) liver cancer or concurrent severe malignancies of other systems.
- Patients who fail to complete the required baseline imaging or laboratory tests, preventing accurate staging calculation (e.g., missing data for Child-Pugh score).
- Patients who have previously received anti-tumor therapies for liver cancer prior to enrollment.


