Hybrid Ensemble Learning with Active Feature Selection for Early-Stage Cardiovascular Risk Stratification: A Multi-Modal Approach Using UCI Clinical Biomarkers
##plugins.themes.bootstrap3.article.main##
Abstract
Background: Cardiovascular disease (CVD) remains the leading cause of global mortality, making accurate early-stage risk stratification crucial for optimal patient management. Traditional risk assessment methods often lack precision and fail to effectively integrate various clinical biomarkers.
Objective: This study aims to develop a hybrid ensemble learning framework with active feature selection for early-stage cardiovascular risk stratification using multi-modal clinical biomarkers.
Method: We used the UCI Heart Disease dataset (n=303) with 17 clinical features. The comprehensive methodology included active feature selection, various base models (Random Forest, XGBoost, LightGBM, Logistic Regression, SVM, KNN, Naive Bayes, Extra Trees, Gradient Boosting), and hybrid ensemble techniques (soft voting and stacked ensemble). Model evaluation was conducted using 5-fold cross-validation and SMOTE.
Results: The Logistic Regression model achieved the highest performance with an AUC of 0.9600 and an F1-score of 0.8667. The hybrid ensemble framework successfully divided patients into three risk categories: high risk (28 patients, 45.9% with an actual positive rate of 89.29%), moderate risk (10 patients, 16.4% with a positive rate of 30.00%), and low risk (23 patients, 37.7% with a positive rate of 0.00%). Cross-validation demonstrated strong performance (AUC: 0.8962 ± 0.0142, 95% CI: 0.8684–0.9240).
Conclusion: The hybrid ensemble learning approach with active feature selection provides superior accuracy in cardiovascular risk stratification, with great potential to support clinical decision-making and early intervention strategies.