Optimalisasi Performa Machine Learning Dengan Seleksi Fitur Untuk Klasifikasi Penyakit Kanker Payudara

Koirunnisa, Koirunnisa (2023) Optimalisasi Performa Machine Learning Dengan Seleksi Fitur Untuk Klasifikasi Penyakit Kanker Payudara. Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), 9 (4). ISSN 2338-3070

[thumbnail of 1.Judul_240004_20416255201059_Koirunnisa.pdf] Text
1.Judul_240004_20416255201059_Koirunnisa.pdf

Download (575kB)
[thumbnail of 2. Daftar Isi_240004_20416255201059_Koirunnisa.pdf] Text
2. Daftar Isi_240004_20416255201059_Koirunnisa.pdf

Download (52kB)
[thumbnail of 3. Artikel_240004_20416255201059_Koirunnisa.pdf] Text
3. Artikel_240004_20416255201059_Koirunnisa.pdf
Restricted to Registered users only

Download (1MB)
[thumbnail of 4. Daftar Pustaka_240004_20416255201059_Koirunnisa.pdf] Text
4. Daftar Pustaka_240004_20416255201059_Koirunnisa.pdf

Download (338kB)
[thumbnail of 5. Lampiran_240004_20416255201059_Koirunnisa.pdf] Text
5. Lampiran_240004_20416255201059_Koirunnisa.pdf
Restricted to Registered users only

Download (589kB)

Abstract

The prevalence of breast cancer is relatively high among adults worldwide. Particularly in Indonesia, according to the latest data from the World Health Organization (WHO), breast cancer accounts for 1.41% of all deaths and continues to increase. In order to address this growing issue, a proactive approach becomes essential. Therefore, the objective of this study is to classify the diagnosis of breast cancer into two categories: Benign and Malignant. Moreover, this classification pattern can serve as a benchmark for early detection and is expected to reduce mortality and cancer rates in breast cancer cases. The dataset used in this study is obtained from Kaggle and consists of 569 rows with 32 attributes. Various machine learning algorithms, such as Artificial Neural Network (ANN), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), K- Nearest Neighbors (KNN), and Naïve Bayes (NB), are employed for the classification analysis in this disease. . This study uses Principal Component Analysis (PCA) for optimized feature selection techniques with dimension reduction are employed on the dataset prior to modeling the data. Our highest accuracy model is the Support Vector Machine (SVM) with an RBF kernel, utilizing c-value selection. Additionally, the Logistic Regression (LR) model achieves an accuracy of 97.3%. However, it is worth noting that the precision and recall of the SVM model are both 100%. Moreover, the Receiver Operating Characteristic (ROC) curve indicates that the SVM graph surpasses the LR graph, which can be attributed to the results obtained from the confusion matrix calculation, where the False Positive Rate is found to be 0. Consequently, the overall performance evaluation of the SVM model with an RBF kernel, along with the utilization of the c-value selection approach, is significantly superior. This is primarily due to the fact that the SVM model does not make any incorrect predictions by classifying something as positive when it is actually negative.

Keywords:
Breast Cancer; Classification; Machine Learning; Supervised Algorithm; PCA

Item Type: Article
Subjects: T Technology > T Technology (General)
Divisions: Faculty of Engineering, Science and Mathematics > School of Electronics and Computer Science
Depositing User: Pustakawan UBP Karawang
Date Deposited: 29 Sep 2025 02:29
Last Modified: 29 Sep 2025 02:29
URI: http://repository.ubpkarawang.ac.id/id/eprint/4253

Actions (login required)

View Item
View Item