Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets

Md. Arman Hussain Sujon; Hossen Mustafa

doi:10.62527/ijasce.5.1.105

DOI : https://doi.org/10.62527/ijasce.5.1.105

Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets

Md. Arman Hussain Sujon ⁽¹⁾, Hossen Mustafa ⁽²⁾

(1) Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh

(2) Institute of Information and Communication Technology, Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh

Fulltext View | Download

How to cite (IJASEIT) :

Hussain Sujon, M. A., & Mustafa, H. (2023). Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets. International Journal of Advanced Science Computing and Engineering, 5(1), 15–24. https://doi.org/10.62527/ijasce.5.1.105

Citation Format :

Carcinoma is one of the scariest and most frequently occurring cancers nowadays among females. It affects nearly around 10% of females all over the world at some point in their lives. Although the cure for this cancer is currently obtainable, the treatment is not effective enough if the disease is not identified at the early stages. Generally, some contemporary medical tests: roentgenogram, breast ultrasound, biopsy, etc., are used for identifying breast cancer. As an alternative, researchers are exploring machine learning techniques for classifying tumours at different stages, e.g., benign and malignant. Classification and data processing strategies can be effective mechanisms for the prediction of cancer. In this paper, we analyze six classification models: Decision Tree, K Nearest Neighbours, Random Forest, Logistic Regression, Extra Trees, and Support Vector Machine on three different datasets. We applied simple principle component analysis (PCA) to reduce dimensions of the datasets. Experimental results show that Random Forest obtained the best accuracy, recall, and F1 score among the six classification techniques for all three datasets. We also find that data attributes and values are important for accurate classification.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.