An approach for document pre-processing and K Means algorithm implementation

Title: An approach for document pre-processing and K Means algorithm implementation
Creator: Gowtham S.; Goswami M.; Balachandran K.; Purkayastha B.S.
Description: The web mining is a cutting edge technology, which includes information gathering and classification of information over web. This paper puts forth the concepts of document pre-processing, which is achieved by extraction of keywords from the documents fetched from the web, processing it and generating a term-document matrix, TF-IDF and the different approaches of TF-IDF (term frequency Inverse document frequency) for each respective document. The last step is the clustering of these results through K Means algorithm, by comparing the performance of each approach used. The algorithm is realized on an X64 architecture and coded on Java and Matlab platform. The results are tabulated. 2014 IEEE.
Source: Proceedings - 2014 4th International Conference on Advances in Computing and Communications, ICACC 2014, pp. 162-166.
Date: 2014-01-01
Publisher: Institute of Electrical and Electronics Engineers Inc.
Subject: augmented; frequency; K Means clustering; logarithmic; stemming; Stop words; term-document matrix; tf-idf
Coverage: Gowtham S., Christ University, Faculty of Engineering, Bangalore, India; Goswami M., Christ University, Faculty of Engineering, Bangalore, India; Balachandran K., Christ University, Faculty of Engineering, Bangalore, India; Purkayastha B.S., Assam Central University, Silchar, India
Rights: Restricted Access
Relation: ISBN: 978-147994364-7
Format: Online
Language: English
Type: Conference paper
Identifier: https://doi.org/10.1109/ICACC.2014.46

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84908701370&doi=10.1109%2fICACC.2014.46&partnerID=40&md5=cb62427cd1b749e2f499429f73f7a008

Collection

Citation

Gowtham S.; Goswami M.; Balachandran K.; Purkayastha B.S., “An approach for document pre-processing and K Means algorithm implementation,” CHRIST (Deemed To Be University) Institutional Repository, accessed July 22, 2026, https://archives.christuniversity.in/items/show/21042.

Collection

Citation

Output Formats