Examining the Performance of the Bagging Method in Breast Cancer Classification

Breast cancer, classification, bagging, machine learningAbstract
The aim of this study is to classify breast cancer using the Bagging classifier, which is among the Ensemble methods. To this end, the breast cancer dataset available on the Kaggle database was used. The dataset consists of 569 observations and 32 variables, with 212 (37.3 %) being benign and 357 (62.7 %) malignant. Initially, the gain ratio feature selection method was used to determine the important variables. Then, the performance of the method was examined according to the 2-fold, 5-fold, and 10-fold cross-validation methods with the number of variables used. The analyses were performed using the WEKA program. As a result of the analysis, both with all variables included and after removing insignificant variables, the performance metrics were determined as follows: accuracy was 95.0791, with precision, recall, and F-measure values of 0.951, and the ROC area value was 0.988. Moreover, it was observed that when all variables were used and when insignificant variables were removed, the method's performance was similar, except for the time variable, and it showed better performance compared to other variable numbers. Additionally, it can be said that the 2-fold cross-validation method showed slightly better classification performance in all metrics except for the ROC area measure. It is recommended that the Bagging method be used in the classification of different diseases.
