Comparative Analysis of Machine Learning Models for Heart Disease Classification: Evaluation of Random Forest, XGBoost, and Logistic Regression with Hyperparameter Optimization
##plugins.themes.bootstrap3.article.main##
Abstract
Background: Heart disease is the leading cause of death globally, necessitating an accurate early detection system using machine learning technology.
Method: This study used the UCI Heart Disease dataset with 303 samples and 14 features to compare the performance of three classification models: Random Forest, XGBoost, and Logistic Regression. Hyperparameter optimization was performed using GridSearchCV to improve model accuracy.
Results: Logistic Regression with hyperparameter optimization showed the best performance with an accuracy of 86.9%, precision of 81.3%, recall of 92.9%, and F1-score of 86.7%. The Random Forest model achieved an accuracy of 88.5%, while XGBoost achieved 85.2%.
Conclusion: Logistic Regression with hyperparameter optimization proved effective for heart disease classification with the highest recall (92.9%), which is important for early detection of heart disease, with thalach (maximum heart rate) and oldpeak (ST depression) as the main predictors.