Feature Extraction and Classification On Single Nucleotide Polymorphism

Nur Fatihah Kamarudin, Zuraini Ali Shah, Mohd Farhan Md Fudzee, Shahreen Kasim


Malay in Peninsular Malaysia can be divided into eight sub-ethnics which are Malay Bugis, Malay, Malay Champa, Malay Jawa, Malay Kelantan, Malay Kedah, Malay Minang and Malay Pattani. Ancestry informative marker (AIM) can be used to represent the eight subethnic of Malay population in Peninsular Malaysia. In this research, single nucleotide polymorphism (SNP) datasets of eight sub-ethnics are analyses in order to obtain the AIM for Malays population in Peninsular Malaysia. However, the dataset may have outlier, missing data and redundancy that may impact the accuracy of the result. Pre-processing data is an important step that will remove the entire problem. Iterative pruning principal component analysis (ipPCA) is one of the techniques that usually use in analysis on genome datasets to extract the information. It can be applied on the high structured data and can improve the resolution of the data. It also used for structure a sub-population. Random Forest and Hidden Naïve Bayes is used to classify the SNP that can be used as AIM. Information Gain Ratio will rank the chosen AIM based on the value of each attribute

Full Text:


DOI: https://doi.org/10.30630/ijasce.1.2.6


  • There are currently no refbacks.

Flag Counter


Organized / Collaboration

- Soft Computing and Data Mining Centre, UTHM, Malaysia and Department of Information Technology

- Society of Visual Informatics, Indonesia