Understanding document semantics from summaries: A case study on Hindi texts
- Title
- Understanding document semantics from summaries: A case study on Hindi texts
- Creator
- Krishnamurthi K.; Panuganti V.R.; Bulusu V.V.
- Description
- Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA's performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics. 2016 ACM.
- Source
- ACM Transactions on Asian and Low-Resource Language Information Processing, Vol-16, No. 1
- Date
- 2016-01-01
- Publisher
- Association for Computing Machinery
- Subject
- Dimensionality reduction; Document classification; Experimentation; Extractive summary; H.3.3 [information storage and retrieval]: information search and retrieval - retrieval models; I.2.7 [artificial intelligence]: natural language processing - language models; I.5.1 [pattern recognition]: models - statistical; Performance; Semantic structure; Singular value decomposition; Supervised term weighting; Supplemented latent semantic analysis; Text analysis
- Coverage
- Krishnamurthi K., Department of Computer Science, Christ University, Hosur Road, Bangalore, 560029, Karnataka, India; Panuganti V.R., Department of Computer Science and Engineering, Gokaraju Rangaraju Institute of Engineering and Technology (GRIET), Nizampet Road, Hyderabad, 500090, Telangana, India; Bulusu V.V., Department of Information Technology, Jawaharlal Nehru Technological University Hyderabad College of Engineering Jagityal (JNTUHCEJ), Nachupally, Karimnagar, 505501, Telangana, India
- Rights
- Restricted Access
- Relation
- ISSN: 23754699
- Format
- Online
- Language
- English
- Type
- Article
Collection
Citation
Krishnamurthi K.; Panuganti V.R.; Bulusu V.V., “Understanding document semantics from summaries: A case study on Hindi texts,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 22, 2025, https://archives.christuniversity.in/items/show/17159.