A two-stepped feature engineering process for topic modeling using batchwise LDA with stochastic variational inference model
- Title
- A two-stepped feature engineering process for topic modeling using batchwise LDA with stochastic variational inference model
- Creator
- Kokatnoor S.A.; Krishnan B.
- Description
- Online ratings and customer feedback on hotel booking websites support the decision-making process of the customer as the reviews provide a deeper understanding about all aspects of a hotel. Consequently, review and rating analyses are of great interest to consumers and hotel owners for the hotel related social media services. The key challenge, however, is to make the wide variety of information accessible in a simple, fast and relevant way and the solution is Topic Modelling and Opinion Mining. Common approaches like Latent Semantic Analysis (LSA) and Hierarchical Dirichlet Process (HDP) have order affects. If the input dataset is shuffled then different topics are generated leading to misleading results. To overcome this, a two-stepped feature engineering process is used: first step is to use a TF-IDF with modified trigrams calculation followed by the second step in removing weak features from the corpus thereby reducing the dimensionality of the Vector Space Model (SVM) for efficient Topic Modeling and sentiment analysis of the considered corpus. Sentiment score is calculated using VADER tool and Topic Modeling is done with Batch Wise Latent Dirichlet Allocation (LDA) using Stochastic Variational Inference (SVI) model. The modified trigrams included calculation of probabilities of words not only in the backward direction but also the probability calculation of the next two words of the target word thereby retaining its context information. The proposed method using Batchwise LDA with SVI along with two-stepped feature engineering process considerably improved its performance when compared to LSA and HDP models due to the fact of identifying hidden and relevant topics in terms of their optimized posterior distribution in hotel reviews dataset. The Batchwise LDA with SVI improved its performance by 3% in terms of its coherence values by using two-stepped feature engineering process and by 9% and 4% increase when compared with LSA and HDP models respectively. 2020, Intelligent Network and Systems Society.
- Source
- International Journal of Intelligent Engineering and Systems, Vol-13, No. 4, pp. 333-345.
- Date
- 2020-01-01
- Publisher
- Intelligent Network and Systems Society
- Subject
- Feature engineering; Feature engineering; Hierarchical dirichlet process; Latent dirichlet allocation; Latent semantic analysis; Sentiment analysis; Stochastic variational inference; Topic modeling; Vector space model
- Coverage
- Kokatnoor S.A., Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India; Krishnan B., Department of Computer Science and Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Bangalore, India
- Rights
- All Open Access; Bronze Open Access
- Relation
- ISSN: 2185310X
- Format
- Online
- Language
- English
- Type
- Article
Collection
Citation
Kokatnoor S.A.; Krishnan B., “A two-stepped feature engineering process for topic modeling using batchwise LDA with stochastic variational inference model,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 24, 2025, https://archives.christuniversity.in/items/show/16404.