Online ISSN: 2515-8260

Author : Alias, Muhammad Syafiq Alza bin

Improved Sampling Data Workflow Using Smtmk To Increase The Classification Accuracy Of Imbalanced Dataset

Muhammad Syafiq Alza bin Alias; Norazlin Binti Ibrahim; Zalhan Bin Mohd Zin

European Journal of Molecular & Clinical Medicine, 2021, Volume 8, Issue 2, Pages 91-99

One of the main challenges in machine learning classification is handling
imbalanced data because imbalanced data can produce result bias towards the majority
class and a poor performance of classification. Therefore, in this paper, an improved
workflow is introduced to cater this issue. After combination of Synthetic Minority Oversampling
Technique (SMOTE) and Tomek Links or known as SMTmk method is
performed, additional step is required to further increase the performance of machine
learning classification especially in Specificity field. The step is completed by reducing the
number of majority class based on the ratio of minority class. Three machine learning
algorithms is used to test the classification result which are Extreme Gradient Boosting,
Random Forest and Logistic Regression. Result recorded in this research shows that the
ratio of 7 to 1 is better than the established methods which are SMOTE and hybrid method
of SMOTE and Tomek Links.