Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

December 06,2019
Exuberant amount of data is unceasingly pooling in repositories through various sources, purposefully and as by-product of Internet ingress in society. Marginal classification information is available in the later case, specially, thus giving rise to unsupervised data analysis techniques for unlabeled data such as data clustering. Data clustering exploits unstructured information in the data and exposits underlying relations. Data clustering is abundantly acknowledged as one of the essential components in data mining, machine learning, computer vision, computational biology, clinical diagnostics and pattern recognition. Information in personal repositories and public forums is usually available in text form form of social networking, news, articles, discussions, educational material, emails, etc. In the study (Ahmad Muqeem Sheri, 2019) propose a consensus building strategy for document clustering.. The proposed approach, referred to as document clustering by consensus and classification (DCCC), uses classification tools as consensus measure among cluster solutions generated by different data clustering tools, Figure 1. The authors demonstrate through extensive empirical evaluation that DCCC achieves an improvement in accuracy starting from 4.92% to upto 22.43% on eight textual datasets. DCCC is a general consensus method that can be applied to other domains besides document clustering as well.  

Talent ibex is better on the App

Never miss a Job notification. Open this in the Talent ibex app to get the full experience.