In a recent study published in Scientific Reportsresearchers developed a machine learning-based heart disease prediction model (ML-HDPM) that uses various combinations of information and numerous recognized classification methods.
Study: Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Image credit: Summit Art Creations/Shutterstock.com
Record
Heart disease is a global health risk that healthcare professionals must assess and treat with medical tests, advanced imaging techniques and diagnostic procedures. Promoting heart-healthy practices and early diagnosis can help minimize the incidence of cardiovascular disease and improve overall health.
Current approaches such as machine learning, deep learning, and sensor-based data collection produce promising findings, but have limitations such as uneven diagnostic accuracy and overfitting.
The proposed approaches use modern technology and feature selection procedures to improve heart disease diagnosis and prognosis.
About the study
In the current study, the researchers constructed the ML-HDPM model for accurate heart disease prediction.
The researchers used the Cleveland database, the Swiss database, the Long Beach database, and the Hungarian database to obtain cardiovascular data. Clinical data were preprocessed followed by feature selection, feature extraction, cluster-based oversampling and classification.
They used training data to fit the model with the feature set, calculate the importance scores, and remove the lowest feature scores to achieve the desired feature.
The genetic algorithm (GA) involved population initialization, selection, crossover, and mutation to determine whether the termination criterion was met.
Researchers computed majority-labeled raw data samples and clustered minority-labeled samples to merge the training set and perform synthetic minority oversampling (SMOTE) to generate model output.
The model selects relevant features using the recursive feature elimination method (RFEM) and genetic algorithm (GA), which improves the robustness of the model. Techniques such as the oversampling undersampling clustering technique (USCOM) correct for data imbalances.
The classification work uses multi-layer deep convolutional neural networks (MLDCNN) and the adaptive elephant herd optimization method (AEHOM).
The model classifiers were principal component analysis (PCA), support vector machine (SVM), linear discriminant analysis (LDA), decision tree (DT), random forest (RF), and naive Bayes (NB).
The model combines supervised infinite feature selection with an upgraded weighted random forest algorithm. The ML-HDPM preprocessing step ensures data integrity and model efficiency. Extensive feature selection reveals important properties for predictive modeling.
A scalar technique achieves a consistent feature result, while SMOTE corrects for class imbalance. The genetic algorithm uses principles of natural selection to generate many solutions in a single generation.
The performance of the strategy is evaluated through simulation tests and compared with existing models. The test, training, and validation datasets included 80%, 10%, and 10% data, respectively.
Results
ML-HDPM performed admirably across a wide range of critical evaluation criteria, as evidenced by comprehensive testing. Using training data, the ML-HDPM model predicted cardiovascular disease with 96% accuracy and 95% accuracy.
The system’s sensitivity (recall) yielded a precision of 96%, while F-scores of 92% reflect its balanced performance. ML-HDPM specificity of 90% is remarkable.
ML-HDPM provides accurate and reliable results. It incorporates complex technologies such as feature selection, data balancing, deep learning, and adaptive elephant herding optimization (AEHOM). These strategies allow the model to reliably predict heart disease, which improves clinical decisions and patient outcomes.
ML-HDPM outperforms other algorithms in training (95%) and testing (88%). Success is due to the combination of complex feature extraction, data imbalance corrections and machine learning.
Feature selection algorithms enable finding important properties associated with cardiovascular health, allowing them to detect distinctive patterns indicative of cardiovascular disease.
Data correction using effective data smoothing techniques guarantees model training on representative datasets, including deep learning using the MLDCNN approach and AEHOM optimization to improve model accuracy.
ML-HDPM, a deep learning model, has lower false positive rates (FPR) in training (8.20%) and testing (15%) than other approaches due to feature selections, data balance, and improved machine learning components in ML-HDPM .
The model had high true positive rates (TPR) on the training (96%) and testing (91%) datasets due to feature recognition, data balance, and deep learning improvements. The approach improves the model’s ability to detect true positives.
conclusion
The study presents a unique ML-HDPM approach that integrates feature selection, data balancing, and machine learning to improve CVD prediction.
Balanced F-values for precision and recall, high precision and accuracy, and low false-positive rates in the training and test datasets highlight the promising potential of the model in cardiovascular diagnostic applications.
The findings show that the ML-HDPM model can increase the accuracy and speed of cardiovascular disease identification, thereby improving the standard of care.
However, further investigation is needed to improve model optimization and data quality and to investigate its use by healthcare professionals in real-world settings.