OBJECTIVE To establish a prediction model of coronary heart disease (CHD) in elderly patients with diabetes mellitus (DM) based on machine learning (ML) algorithms.
METHODS Based on the Medical Big Data Research Centre of Chinese PLA General Hospital in Beijing, China, we identified a cohort of elderly inpatients (≥ 60 years), including 10,533 patients with DM complicated with CHD and 12,634 patients with DM without CHD, from January 2008 to December 2017. We collected demographic characteristics and clinical data. After selecting the important features, we established five ML models, including extreme gradient boosting (XGBoost), random forest (RF), decision tree (DT), adaptive boosting (Adaboost) and logistic regression (LR). We compared the receiver operating characteristic curves, area under the curve (AUC) and other relevant parameters of different models and determined the optimal classification model. The model was then applied to 7447 elderly patients with DM admitted from January 2018 to December 2019 to further validate the performance of the model.
RESULTS Fifteen features were selected and included in the ML model. The classification precision in the test set of the XGBoost, RF, DT, Adaboost and LR models was 0.778, 0.789, 0.753, 0.750 and 0.689, respectively; and the AUCs of the subjects were 0.851, 0.845, 0.823, 0.833 and 0.731, respectively. Applying the XGBoost model with optimal performance to a newly recruited dataset for validation, the diagnostic sensitivity, specificity, precision, and AUC were 0.792, 0.808, 0.748 and 0.880, respectively.
CONCLUSIONS The XGBoost model established in the present study had certain predictive value for elderly patients with DM complicated with CHD.