Comparison Model of Optimal Machine Learning Models With Feature Extraction for Heart Attack Disease Classification

Desmalia, Salsa (2024) Comparison Model of Optimal Machine Learning Models With Feature Extraction for Heart Attack Disease Classification. Scientific Journal of Informatics, 11 (2). ISSN 2460-0040

[thumbnail of 1. FILE JUDUL_240010_20416255201093_Salsa Desmalia.pdf] Text
1. FILE JUDUL_240010_20416255201093_Salsa Desmalia.pdf

Download (1MB)
[thumbnail of 2. DAFTAR ISI_240010_20416255201093_Salsa Desmalia.pdf] Text
2. DAFTAR ISI_240010_20416255201093_Salsa Desmalia.pdf

Download (217kB)
[thumbnail of 3. ARTIKEL UTAMA_240010_20416255201093_Salsa Desmalia.pdf] Text
3. ARTIKEL UTAMA_240010_20416255201093_Salsa Desmalia.pdf
Restricted to Registered users only

Download (858kB)
[thumbnail of 4. DAFTAR PUSTAKA_240010_20416255201093_Salsa Desmalia.pdf] Text
4. DAFTAR PUSTAKA_240010_20416255201093_Salsa Desmalia.pdf

Download (355kB)
[thumbnail of 5. LAMPIRAN_240010_20416255201093_Salsa Desmalia.pdf] Text
5. LAMPIRAN_240010_20416255201093_Salsa Desmalia.pdf
Restricted to Registered users only

Download (521kB)

Abstract

Purpose: The purpose of this study is to classify the number of people affected by heart disease and those not affected by heart disease based on various categories of heart attack causes. This study aims to urge people to take better care of their health and to serve as a reference for doctors to educate patients about the dangers of heart attacks.
Methods: This research uses machine learning methods, with Support Vector Machine, Random Forest, and K-Nearest Neighbor algorithms. This study uses Principal Component Analysis (PCA) for optimized feature extraction techniques with dimension reduction employed on the dataset prior to modeling the data.
Results: This is because the datasets used in this study tend to be high-dimensional, which would make modeling slow.
Therefore, the concept of cumulative explained variance is a fundamental aspect of PCA in this study, which is a dimensionality reduction technique used in multivariate data analysis for faster modeling, resulting in more optimal modeling. Subsequently, we utilized the dimension of the training and testing data for further use in modeling. The best modeling algorithm is K-Nearest Neighbor (KNN) using confusion matrix evaluation, with accuracy of 86%, precision of 86%, and recall of 91% and f1-score of 88%. Mean while, on the evaluation using ROC modeling, the best algorithm is KNN with a point area of 0.85.
Novelty: The researchers used 1,190 patient data sourced from Kaggle. Before modeling the algorithm, the researchers conducted EDA & Preprocessing which included missing values to find data that did not have information, then duplicated the data to find duplicated data. There were 270 duplicated data, then the duplicated data were deleted so that the data became 737, and then PCA implementation was carried out. PCA reduced the features automatically without changing the data.

Keywords: Heart Attack, Classification, Machine Learning, PCA, ROC Curve

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: Pustakawan UBP Karawang
Date Deposited: 29 Sep 2025 02:31
Last Modified: 29 Sep 2025 02:31
URI: http://repository.ubpkarawang.ac.id/id/eprint/4259

Actions (login required)

View Item
View Item