Document Classification Using Expectation Maximization with Semi Supervised Learning
DOI:
https://doi.org/10.53075/Ijmsirq/145756854374585Keywords:
Data mining, semi-supervised, learning, supervised learning, expectation maximization, document classificationAbstract
As the amount of online documents increases, the demand for document classification to aid the analysis and management of documents is increasing. Text is cheap, but information, in the form of knowing what classes a document belongs to, is expensive. The main purpose of this paper is to explain the expectation-maximization technique of data mining to classify the document and to learn how to improve the accuracy while using a semi-supervised approach. The expectation-maximization algorithm is applied with both supervised and semi-supervised approaches. It is found that the semi-supervised approach is more accurate and effective. The main advantage of the semi-supervised approach is “DYNAMICALLY GENERATION OF NEW CLASS”. The algorithm first trains a classifier using the labeled document and probabilistically classifies the unlabeled documents. The car dataset for the evaluation purpose is collected from the UCI repository dataset in which some changes have been done from our side.
Downloads
Metrics
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2021 Scholars Journal of Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.