Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

 

Xingquan Zhu

 

Abstract

Recently, mining from data streams has become an important and challenging task for many real-world applications including credit card fraud protection, sensor network, etc. where the most popular solution is to separate stream data into chunks, learn a classifier from each chunk, and then integrate all base classifiers for effective classification. In this talk, I will present a new dynamic classifier selection (DCS) mechanism to integrate base classifiers for effective mining from data streams, which dynamically selects a single “best” classifier to classify each test instance at run time. Our scheme uses statistical information from attribute values, and uses each attribute to partition the evaluation set into disjoint subsets, followed by a procedure that evaluates the classification accuracy of each base classifier on these subsets. Given a test instance, its attribute values determine the subsets that the similar instances in the evaluation set have constructed, and the classifier with the highest classification accuracy on those subsets is selected to classify the test instance. Experimental results and comparative studies demonstrate the efficiency and effectiveness of our method. Such a DCS scheme appears to be promising in mining data streams with dramatic concept change (drifting) or significant amount of noise, because in these situations, the base classifiers are likely conflictive or have low confidence.