An approach for document pre-processing and K Means algorithm implementation
- Title
- An approach for document pre-processing and K Means algorithm implementation
- Creator
- Gowtham S.; Goswami M.; Balachandran K.; Purkayastha B.S.
- Description
- The web mining is a cutting edge technology, which includes information gathering and classification of information over web. This paper puts forth the concepts of document pre-processing, which is achieved by extraction of keywords from the documents fetched from the web, processing it and generating a term-document matrix, TF-IDF and the different approaches of TF-IDF (term frequency Inverse document frequency) for each respective document. The last step is the clustering of these results through K Means algorithm, by comparing the performance of each approach used. The algorithm is realized on an X64 architecture and coded on Java and Matlab platform. The results are tabulated. 2014 IEEE.
- Source
- Proceedings - 2014 4th International Conference on Advances in Computing and Communications, ICACC 2014, pp. 162-166.
- Date
- 2014-01-01
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Subject
- augmented; frequency; K Means clustering; logarithmic; stemming; Stop words; term-document matrix; tf-idf
- Coverage
- Gowtham S., Christ University, Faculty of Engineering, Bangalore, India; Goswami M., Christ University, Faculty of Engineering, Bangalore, India; Balachandran K., Christ University, Faculty of Engineering, Bangalore, India; Purkayastha B.S., Assam Central University, Silchar, India
- Rights
- Restricted Access
- Relation
- ISBN: 978-147994364-7
- Format
- Online
- Language
- English
- Type
- Conference paper
Collection
Citation
Gowtham S.; Goswami M.; Balachandran K.; Purkayastha B.S., “An approach for document pre-processing and K Means algorithm implementation,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 23, 2025, https://archives.christuniversity.in/items/show/21042.