Open Access

Examining the Performance of the Bagging Method in Breast Cancer Classification

1 Batman University, Department of Medical Documentation, Secretariat Program Health Services Vocational School, Batman

Abstract

The aim of this study is to classify breast cancer using the Bagging classifier, which is among the Ensemble methods. To this end, the breast cancer dataset available on the Kaggle database was used. The dataset consists of 569 observations and 32 variables, with 212 (37.3 %) being benign and 357 (62.7 %) malignant. Initially, the gain ratio feature selection method was used to determine the important variables. Then, the performance of the method was examined according to the 2-fold, 5-fold, and 10-fold cross-validation methods with the number of variables used. The analyses were performed using the WEKA program. As a result of the analysis, both with all variables included and after removing insignificant variables, the performance metrics were determined as follows: accuracy was 95.0791, with precision, recall, and F-measure values of 0.951, and the ROC area value was 0.988. Moreover, it was observed that when all variables were used and when insignificant variables were removed, the method's performance was similar, except for the time variable, and it showed better performance compared to other variable numbers. Additionally, it can be said that the 2-fold cross-validation method showed slightly better classification performance in all metrics except for the ROC area measure. It is recommended that the Bagging method be used in the classification of different diseases.

Keywords

How to Cite

BEZEK GÜRE, Özlem. (2024). Examining the Performance of the Bagging Method in Breast Cancer Classification. MAS Journal of Applied Sciences, 9(3), 711–720. https://doi.org/10.5281/zenodo.13335497

References

📄 Abdulkareem, S. A., Abdulkareem, Z. O., 2021. An evaluation of the Wisconsin breast cancer dataset using ensemble classifiers and RFE feature selection. International Journal of Sciences: Basic and Applied Research (IJSBAR), 55(2): 67-80.
📄 Ahmed, N., Shefat, S.N., 2022. Performance Evaluation of Data Mining Classification Algorithms for Predicting Breast Cancer. Malaysian Journal of Science and Advanced Technology, 90-95.
📄 Aksu, G., Doğan, N., 2018. Comparison of Learning Methods Used in Data Mining Under Different Conditions. Ankara University Journal of Faculty of Educational Sciences, 51(3): 71-100.
📄 Assegie, T. A., Tulasi, R. L., Kumar, N. K., 2021. Breast cancer prediction model with decision tree and adaptive boosting. IAES International Journal of Artificial Intelligence, 10(1): 184-190.
📄 Assegie, T.A., Tulasi, R.L., Elanangai, V., Kumar, N.K., 2022. Exploring the performance of feature selection method using breast cancer dataset. Indonesian Journal of Electrical Engineering and Computer Science, 25(1): 232-237.