Understanding document semantics from summaries: A case study on Hindi texts

Title: Understanding document semantics from summaries: A case study on Hindi texts
Creator: Krishnamurthi K.; Panuganti V.R.; Bulusu V.V.
Description: Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA's performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics. 2016 ACM.
Source: ACM Transactions on Asian and Low-Resource Language Information Processing, Vol-16, No. 1
Date: 2016-01-01
Publisher: Association for Computing Machinery
Subject: Dimensionality reduction; Document classification; Experimentation; Extractive summary; H.3.3 [information storage and retrieval]: information search and retrieval - retrieval models; I.2.7 [artificial intelligence]: natural language processing - language models; I.5.1 [pattern recognition]: models - statistical; Performance; Semantic structure; Singular value decomposition; Supervised term weighting; Supplemented latent semantic analysis; Text analysis
Coverage: Krishnamurthi K., Department of Computer Science, Christ University, Hosur Road, Bangalore, 560029, Karnataka, India; Panuganti V.R., Department of Computer Science and Engineering, Gokaraju Rangaraju Institute of Engineering and Technology (GRIET), Nizampet Road, Hyderabad, 500090, Telangana, India; Bulusu V.V., Department of Information Technology, Jawaharlal Nehru Technological University Hyderabad College of Engineering Jagityal (JNTUHCEJ), Nachupally, Karimnagar, 505501, Telangana, India
Rights: Restricted Access
Relation: ISSN: 23754699
Format: Online
Language: English
Type: Article
Identifier: https://doi.org/10.1145/2956236

https://www.scopus.com/inward/record.uri?eid=2-s2.0-84997294901&doi=10.1145%2f2956236&partnerID=40&md5=57d07c8fd4df79cf159885b17f8b1119

Collection

Citation

Krishnamurthi K.; Panuganti V.R.; Bulusu V.V., “Understanding document semantics from summaries: A case study on Hindi texts,” CHRIST (Deemed To Be University) Institutional Repository, accessed July 17, 2026, https://archives.christuniversity.in/items/show/17159.

Understanding document semantics from summaries: A case study on Hindi texts

Collection

Citation

Output Formats