Overview
This study aims to develop and validate an artificial intelligence (AI)-based predictive model to estimate the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.
For each participant, an index date will be defined as the date of a prior health screening or another protocol-defined baseline clinical date. Incident disease status for each target disease or condition will be ascertained by retrospective review of electronic medical records for up to 10 years after the index date.
The study integrates retrospective clinical, health screening, laboratory, imaging, and electronic medical record data with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data. Prospective study procedures will be completed over approximately 1 week, with up to 2 additional weeks if needed.
By combining multimodal data, this study seeks to improve disease risk prediction and to identify clinical and biological factors associated with disease onset, ultimately supporting personalized risk stratification and preventive healthcare strategies.
Description
This observational study aims to develop and validate an artificial intelligence (AI)-based predictive model for assessing the risk of incident onset of five major diseases or conditions: cardiovascular disease, type 2 diabetes mellitus, breast cancer, low back pain, and osteoarthritis, in adults aged 30 to 60 years.
The study uses a hybrid retrospective and prospective data collection design. Retrospective clinical, health screening, laboratory, imaging, and electronic medical record data will be combined with prospectively collected biospecimen, proteomic, genomic, questionnaire, lifestyle, and digital health data.
For disease-onset analyses, an index date will be defined for each participant as the date of a prior health screening or another protocol-defined baseline clinical date. For each target disease or condition, participants without that target disease or condition at the index date will be classified as incident cases if a new diagnosis is identified in electronic medical records up to 10 years after the index date. Participants without a diagnosis of that target disease or condition through the available observation period will be classified as persistent controls. Disease occurrence will be ascertained through retrospective electronic medical record review rather than through new prospective long-term follow-up.
A total of approximately 1,000 participants will be enrolled. The disease group will include approximately 880 adults aged 30 to 60 years with a confirmed diagnosis of one or more of the five target diseases or conditions. The healthy control group will include approximately 120 adults aged 30 to 60 years without a prior diagnosis of any of the five target diseases or conditions.
Retrospective data collection will include medical records, health screening results, laboratory results, and imaging-related data. Prospective data collection will include blood samples for proteomic and genomic analyses, questionnaires, lifestyle and behavioral data, and digital health assessments. App-based questionnaires and digital assessments will be performed at home over approximately 7 days. If app-based sleep assessment or other digital assessments are not completed within this period, up to 2 additional weeks may be provided.
These multimodal data will be integrated to create a high-dimensional phenomic and omics dataset for AI model development. Machine learning and deep learning approaches will be applied to predict disease risk for each target disease or condition. Model performance will be evaluated using discrimination, diagnostic performance, and calibration metrics. Reclassification metrics will be evaluated only if a prespecified comparator risk score is available for the relevant target disease or condition.
The study aims to improve prediction of disease onset and to enhance understanding of biological and clinical factors associated with disease risk. The resulting model is expected to support personalized risk stratification and preventive healthcare strategies.
Eligibility
Inclusion Criteria:
- Adults aged 30 to 60 years.
- Disease group: Participants with a confirmed diagnosis of at least one of the following conditions: type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
- Healthy control group: Participants with no prior diagnosis of type 2 diabetes mellitus, breast cancer, cardiovascular disease, osteoarthritis, or low back pain.
- No history or current diagnosis of major medical conditions that may affect study outcomes, including but not limited to chronic kidney disease or liver cirrhosis.
- Ability to understand the study procedures and provision of written informed consent prior to participation.
Exclusion Criteria:
- Participants with incomplete or insufficient clinical or health screening data.
- Participants considered inappropriate for study participation by the investigator.


