NLP and Topic Modeling in Healthcare: Identifying Diseases from Patient Histories
- Title
- NLP and Topic Modeling in Healthcare: Identifying Diseases from Patient Histories
- Creator
- Kokatnoor, Sujatha Arun; Shukla, Samiksha
- Description
- Topic modeling and Natural Language Processing (NLP) have demonstrated significant prospects in the healthcare industry for extracting insightful information from unstructured patient histories that can help diagnose diseases and enhance clinical decisions. In this study, patient histories are grouped into ten different clusters using advanced K-Means clustering, with the Dunn Index being used to validate the clustering performance. After the clusters are formed, each cluster is subjected to topic modeling approaches. Four topic modeling approaches are examined in this study, Latent Dirichlet Allocation (LDA), Hierarchical Dirichlet Process (HDP), Latent Semantic Indexing (LSI), and Non-negative Matrix Factorization (NMF). These techniques are used to find disease-related terms from patient histories. Coherence scores, which show the semantic significance of the terms produced, and execution times, which show the computational efficiency needed for real-time healthcare applications, are used to evaluate the models. According to experimental findings forthe USMLE Step 2 Clinical Skills exam dataset, NMF and HDP generated the most cohesive terms, with NMFs faster execution time (1.67s) making it appropriate for widespread healthcare applications. Whereas, a reasonable balance between coherence and computational demands is offered by LDA and LSI. The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
- Source
- Smart Innovation, Systems and Technologies;Volume;454 SIST;pp.501-516
- Date
- 01-01-2026
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Subject
- Dunn index; K-means clustering; Latent dirichlet allocation (LDA); Latent semantic indexing (LSI); Natural Language Processing (NLP); Non-negative matrix factorization (NMF); Personalized healthcare; Topic modeling
- Coverage
- Kokatnoor S.A., Department of Computer Science and Engineering, School of Engineering and Technology, Christ University, Karnataka, Bengaluru, India; Shukla S., Department of Computer Science and Engineering, School of Engineering and Technology, Christ University, Karnataka, Bengaluru, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 21903018; ISBN: 978-303207836-0;
- Format
- online
- Language
- English
- Type
- Conference paper
Collection
Citation
Kokatnoor, Sujatha Arun; Shukla, Samiksha, “NLP and Topic Modeling in Healthcare: Identifying Diseases from Patient Histories,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 19, 2026, https://archives.christuniversity.in/items/show/25369.
