An empirical analysis of similarity measures for unstructured data
- Title
- An empirical analysis of similarity measures for unstructured data
- Creator
- Goswami M.; Purkayastha B.S.
- Description
- With fast growth in size of digital text documents over internet and digital repositories, the pools of digital document is piling up day by day. Due to this digital revolution and growth, an efficient and effective technique is required to handle such an enormous amount of data. It is extremely important to understand the documents properly to mine them. To find coherence among documents text similarity measurement pays a humongous role. The goal of similarity computation is to identify cohesion among text documents and to make the text ready for the required applications such as document organization, plagiarism detection, query matching etc. This task is one of the most fundamental task in the area of information retrieval, information extraction, document organization, plagiarism detection and text mining problems. But effectiveness of document clustering is highly dependent on this task. In this paper four similarity measures are implemented and their descriptive statistics is compared. The results are found to be satisfactory. Graphs are drawn for visualization of results. 2019 COMPUSOFT, An international journal of advanced computer technology.
- Source
- Compusoft, Vol-8, No. 8, pp. 3302-3306.
- Date
- 2019-01-01
- Publisher
- National Institute of Science Communication and Information Resources (NISCAIR)
- Subject
- Commonality; Cosine similarity; Jaccard similarity; Pearson; Similarity; Spearman's correlation
- Coverage
- Goswami M., CHRIST (Deemed to be University), Bangalore, 560074, India; Purkayastha B.S., Assam University, Silchar, Assam, India
- Rights
- Restricted Access
- Relation
- ISSN: 23200790
- Format
- Online
- Language
- English
- Type
- Article
Collection
Citation
Goswami M.; Purkayastha B.S., “An empirical analysis of similarity measures for unstructured data,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 24, 2025, https://archives.christuniversity.in/items/show/16809.