Friday, January 1, 2021

PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATA

 Author :  Shaukat Ali Shahee

Affiliation :  Indian Institute of Technology Bombay, Mumbai, India

Country :  India

Category :  Computer Science & Information Technology

Volume, Issue, Month, Year :  8, 6, April, 2018

Abstract :

In many applications of data mining, class imbalance is noticed when examples in one class are overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the class imbalance. Further, the presence of within class imbalance where classes are composed of multiple sub-concepts with different number of examples also affect the performance of classifier. In this paper, we propose an oversampling technique that handles between class and within class imbalance simultaneously and also takes into consideration the generalization ability in data space. The proposed method is based on two steps- performing Model Based Clustering with respect to classes to identify the sub-concepts; and then computing the separating hyperplane based on equal posterior probability between the classes. The proposed method is tested on 10 publicly available data sets and the result shows that the proposed method is statistically superior to other existing oversampling methods.

Keyword :  Supervised learning, Class Imbalance, Oversampling, Posterior Distribution

For More Detailshttps://airccj.org/CSCP/vol8/csit88607.pdf

No comments:

Post a Comment