TY - JOUR
T1 - Machine Learning Framework for Early Detection of Chronic Kidney Disease Stages Using Optimized Estimated Glomerular Filtration Rate
AU - Ghosh, Samit
AU - Widatalla, Namareq
AU - Khandoker, Ahsan H.
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2025
Y1 - 2025
N2 - Chronic Kidney Disease (CKD) is a progressive condition that requires accurate diagnosis and staging for effective clinical management. Conventional CKD diagnosis relies on estimated Glomerular Filtration Rate (eGFR), a measure of kidney function derived from serum biomarkers such as serum creatinine (SCr) and cystatin C (SCysC). However, eGFR calculations may be inaccurate when applied to diverse patient populations. This study proposes a machine learning (ML) system that integrates regression-based eGFR estimation, metaheuristic optimization using the Grey Wolf Optimizer (GWO), and multi-class classification with various ML models to enhance CKD staging and classification. The model estimates eGFR using three established CKD Epidemiology Collaboration (CKD-EPI) equations incorporating SCr, SCysC, and their combined values. Regression models assess predictive performance, specifically Linear Regression (LR) and Support Vector Regression (SVR). SVR demonstrates superior performance compared to LR for CKD-EPISCr-SCysC achieved a root mean squared error (RMSE) of 3.03, a mean absolute percentage error (MAPE) of 2.97%, and a coefficient of determination (R2) score of 0.97. The application of GWO for hyperparameter tuning has resulted in a 37.3% reduction in root mean square error (RMSE), a 37.4% drop in mean absolute percentage error (MAPE), and a 2.06% improvement in R2 to improve the precision of prediction. Once the model fine-tunes the eGFR estimations, it feeds them into various algorithms for CKD stage classification, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Among these, XGBoost achieves the highest classification accuracy of 97.76%, along with an F1-score of 97.45%, demonstrating its effectiveness in CKD staging. Shapley Additive Explanations (SHAP) provide global and local feature importance insights, enhancing clinical decision-making and model transparency. Future research will validate the model using more extensive and more diverse datasets. Additionally, it will incorporate extra clinical parameters, including biomarkers and genetic data, to enhance the precision of CKD risk prediction. This research enhances AI-driven nephrology by providing a scalable, interpretable, and highly accurate solution for diagnosing and managing CKD.
AB - Chronic Kidney Disease (CKD) is a progressive condition that requires accurate diagnosis and staging for effective clinical management. Conventional CKD diagnosis relies on estimated Glomerular Filtration Rate (eGFR), a measure of kidney function derived from serum biomarkers such as serum creatinine (SCr) and cystatin C (SCysC). However, eGFR calculations may be inaccurate when applied to diverse patient populations. This study proposes a machine learning (ML) system that integrates regression-based eGFR estimation, metaheuristic optimization using the Grey Wolf Optimizer (GWO), and multi-class classification with various ML models to enhance CKD staging and classification. The model estimates eGFR using three established CKD Epidemiology Collaboration (CKD-EPI) equations incorporating SCr, SCysC, and their combined values. Regression models assess predictive performance, specifically Linear Regression (LR) and Support Vector Regression (SVR). SVR demonstrates superior performance compared to LR for CKD-EPISCr-SCysC achieved a root mean squared error (RMSE) of 3.03, a mean absolute percentage error (MAPE) of 2.97%, and a coefficient of determination (R2) score of 0.97. The application of GWO for hyperparameter tuning has resulted in a 37.3% reduction in root mean square error (RMSE), a 37.4% drop in mean absolute percentage error (MAPE), and a 2.06% improvement in R2 to improve the precision of prediction. Once the model fine-tunes the eGFR estimations, it feeds them into various algorithms for CKD stage classification, including Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). Among these, XGBoost achieves the highest classification accuracy of 97.76%, along with an F1-score of 97.45%, demonstrating its effectiveness in CKD staging. Shapley Additive Explanations (SHAP) provide global and local feature importance insights, enhancing clinical decision-making and model transparency. Future research will validate the model using more extensive and more diverse datasets. Additionally, it will incorporate extra clinical parameters, including biomarkers and genetic data, to enhance the precision of CKD risk prediction. This research enhances AI-driven nephrology by providing a scalable, interpretable, and highly accurate solution for diagnosing and managing CKD.
KW - Chronic Kidney Diseases
KW - CKD-EPI Equation
KW - Cystatin C
KW - Explainable AI
KW - Glomerular Filtration Rate
KW - Machine Learning
KW - Serum Creatinine
UR - https://www.scopus.com/pages/publications/105004058709
U2 - 10.1109/ACCESS.2025.3565549
DO - 10.1109/ACCESS.2025.3565549
M3 - Article
AN - SCOPUS:105004058709
SN - 2169-3536
JO - IEEE Access
JF - IEEE Access
ER -