Using machine learning models to discover new laboratory markers of age-related diseases
https://doi.org/10.37586/2949-4745-4-2024-208-213
Abstract
BACKGROUND. Global population aging leads to an increased rate of agerelated diseases such as arterial hypertension. Existing diagnostic methods are limited by the insufficient sensitivity of clinical and laboratory markers and the restrictions of conventional statistical methods. This highlights the need for new approaches to medical data analysis. This study is aimed to identify new laboratory markers of arterial hypertension using machine learning models and to compare their effectiveness compared with conventional methods. It involved the analysis of clinical and laboratory data of 2,228 patients over 65 years of age who sought medical advice at the clinic of the Medical Research and Educational Center of Lomonosov Moscow State University. Logistic regression with regularization, random forest, and gradient boosting were employed. Effectiveness was assessed using the AUC ROC, while attribute contributions were assessed by means of SHAP analysis. Machine learning models outperformed conventional logit regression. Gradient boosting achieved an AUC ROC of 0.85 compared to 0.78 for regression methods. The key attributes associated with arterial hypertension were age, RDW-SD value and creatinine levels, whereas conventional methods focused mainly on age. The conclusions confirm the high potential of machine learning methods in diagnosing age-related diseases. These models ensure higher accuracy and reveal complex interrelationships between indicators, which can improve early diagnosis and optimize diagnostic algorithms. Population aging is a global demographic trend leading to an increase in the incidence of agerelated diseases [1]. Early and accurate diagnosis of these conditions is becoming particularly important for effective treatment and minimization of the risk for complications in elderly patients. However, current diagnostic algorithms are often limited by the insufficient sensitivity and specificity of clinical and laboratory markers, while standard data analysis methods may not always detect hidden or nonlinear dependencies that are critical for diagnosis and prognosis assessment [2]. This necessitates the development of new approaches to interpretation of medical data and identification of more precise diagnostic criteria. Advances in machine learning (ML) and artificial intelligence (AI) offer new possibilities for the analysis of large and complex medical datasets. These methods allow uncovering complex and nonlinear relationships between various parameters, which can significantly enhance our understanding of pathogenesis, improve diagnostic accuracy, and increase the treatment effectiveness of age-related diseases in conditions of global population aging [3, 4].
AIM. To identify new laboratory markers associated with arterial hypertension using ML models and to evaluate their effectiveness in comparison with conventional statistical methods.
MATERIALS AND METHODS. A retrospective analysis was performed on a database containing clinical and laboratory indicators of 2,228 patients over 65 years of age who were affiliated with the Medical Research and Educational Center of Lomonosov Moscow State University and sought medical advice or underwent preventive examinations from 2020 to 2021. Data preprocessing included handling of missing values, dealing with outliers, and normalization of attributes. In this pilot study conducted to refine the analytical tools, the presence of arterial hypertension was considered the target variable. Binary logistic regression, a standard and widely accepted method for identification of disease markers, as well as ML models such as logistic regression with regularization, a random forest model, and gradient boosting were employed. The performance of these methods was assessed using the area under the ROC curve (AUC ROC). The contribution of each attribute was interpreted using the SHAP analysis.
RESULTS. The ML models have been proven to be superior compared with conventional logistic regression. Gradient boosting achieved an AUC ROC of 0.85, whereas logistic regression showed an AUC ROC of 0.78 (Fig. 1). The most important attributes associated with the presence of arterial hypertension were age, red blood cell distribution width (RDW-SD), and creatinine levels. Meanwhile, the standard regression analysis relied primarily on age (Fig. 2, 3). Age and creatinine levels were expectedly significant factors, while RDW-SD, typically evaluated in the context of anemia and associated with erythropoiesis, was also identified as a significant marker of arterial hypertension. These findings may indicate a broader relationship between inflammatory processes and the state of erythropoiesis with the development of arterial hypertension.
CONCLUSIONS. Our findings reveal the high potential of ML models in medical diagnostics for identification of new laboratory markers of age-related diseases. ML methods not only provide higher predictive accuracy but also uncover complex, previously unknown relationships between clinical and laboratory indicators and the onset and progression of diseases. Unlike conventional statistical methods, they account for nonlinear and multidimensional interactions between features. These findings can serve as a basis for further clinical research aimed at confirming the associations found and developing new targets for intervention, potentially leading to changes in diagnostic and treatment approaches. Implementation of these models in clinical practice may improve early diagnosis of age-related diseases, reduce morbidity and mortality, and ultimately enhance patient quality of life.
About the Authors
S. A. ZakharchukRussian Federation
Moscow
N. A. Mironov
Russian Federation
Moscow
A. G. Plisyuk
Russian Federation
Moscow
Ya. A. Orlova
Russian Federation
Moscow
References
1. World Health Organization. Global Report on Ageism. Geneva: WHO; 2021.
2. Farrag M.A., Al-Harthi R., Saeed M., et al. Machine Learning Applications in Geriatrics: A Systematic Review. Journal of the American Medical Directors Association. 2021;22(8):1621–1627.e1.
3. Wang F., Preininger A. AI in Health: State-of-the-Art, Challenges, and Future Directions. Yearbook of Medical Informatics. 2020;29(1):16–26.
4. Mesko B., Gorog M. A Short Guide for Medical Professionals in the Era of Artificial Intelligence. NPJ Digital Medicine. 2020;3:126.
Review
For citations:
Zakharchuk S.A., Mironov N.A., Plisyuk A.G., Orlova Ya.A. Using machine learning models to discover new laboratory markers of age-related diseases. Problems of Geroscience. 2024;(4):208-213. (In Russ.) https://doi.org/10.37586/2949-4745-4-2024-208-213