Comparative Study of Machine Learning Models on Multiple Breast Cancer Datasets
Abstract
Carcinoma is one of the scariest and most frequently occurring cancers nowadays among females. It affects nearly around 10% of females all over the world at some point in their lives. Although the cure for this cancer is currently obtainable, the treatment is not effective enough if the disease is not identified at the early stages. Generally, some contemporary medical tests: roentgenogram, breast ultrasound, biopsy, etc., are used for identifying breast cancer. As an alternative, researchers are exploring machine learning techniques for classifying tumours at different stages, e.g., benign and malignant. Classification and data processing strategies can be effective mechanisms for the prediction of cancer. In this paper, we analyze six classification models: Decision Tree, K Nearest Neighbours, Random Forest, Logistic Regression, Extra Trees, and Support Vector Machine on three different datasets. We applied simple principle component analysis (PCA) to reduce dimensions of the datasets. Experimental results show that Random Forest obtained the best accuracy, recall, and F1 score among the six classification techniques for all three datasets. We also find that data attributes and values are important for accurate classification.
Keywords
Classification; breast cancer prediction; data Science
Full Text:
PDFDOI: https://doi.org/10.30630/ijasce.5.1.105
Refbacks
- There are currently no refbacks.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Organized / Collaboration | - Soft Computing and Data Mining Centre, UTHM, Malaysia and Department of Information Technology - Society of Visual Informatics, Indonesia |