Tri-projection gated cross-modal fusion for robust multilingual emotion recognition
- Title
- Tri-projection gated cross-modal fusion for robust multilingual emotion recognition
- Creator
- Nair, Suja C.; S., Asha; K.V., Priya; Shaji, Kavya Clare P.; C., Balakrishnan; Kokilavani, T.
- Description
- Existing multimodal approaches in emotion recognition (ER) rely on static or pairwise fusion strategies. These systems do not adequately address the challenges in real-world conversational systems, which require resilience to both multilingual code-switching and variable reliability of multiple modalities. We propose a transformer-based tri-modal emotion identification framework with a novel Tri-Projection Gated Cross-Modal Fusion (T-GCMF) module the first multimodal emotion recognition architecture explicitly designed for code-switched conversational input. T-GCMF simulates tri-modal interactions by explicitly calculating modality-specific confidence and cross-modal consistency, allowing for dynamic suppression of unreliable modalities during inference. Acoustic and visual cues are retrieved using CNNLSTM and deep CNN encoders, respectively. Textual representations are generated using XLM-RoBERTa to handle code-switched language reliably. We introduce Hinglish-MELD, the first multimodal emotion recognition dataset with aligned text, audio, and visual streams containing code-switched conversational content, filling a critical gap in the literature. With an accuracy of 88.3% and an F1-score of 87.0, the suggested confidence-aware fusion technique greatly surpasses unimodal, monolingual, and non-gated multimodal baselines. These findings demonstrate T-GCMF as a successful approach for emotion recognition in linguistically heterogeneous, real-world interactive systems and emphasize the significance of confidence-driven tri-modal integration. 2026
- Source
- Knowledge-Based Systems;Volume;344;Issue;;Article No.;116079;
- Date
- 01-01-2026
- Publisher
- Elsevier B.V.
- Subject
- Affective computing; Code-switched emotion recognition; Confidence-aware multimodal fusion; Cross-modal modeling; Multilingual analysis
- Coverage
- Nair S.C., Department of Computer Science and Engineering, Muthoot Institute of Technology and Science, Kochi, India; S. A., Department of Computer Science and Engineering, Sahrdaya College of Engineering and Technology, Thrissur, India; K.V. P., Department of Computer Science and Engineering, School of Engineering and Sciences, SRM University-AP, Andhra Pradesh, Amaravati, 522240, India; Shaji K.C.P., Department of Computer Science and Engineering, Sahrdaya College of Engineering and Technology, Thrissur, India; C. B., Department of Computer Science, CHRIST University, Bengaluru, India; Kokilavani T., Department of Computer Science, CHRIST University, Bengaluru, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 9507051; CODEN: KNSYE
- Format
- online
- Language
- English
- Type
- Article
Collection
Citation
Nair, Suja C.; S., Asha; K.V., Priya; Shaji, Kavya Clare P.; C., Balakrishnan; Kokilavani, T., “Tri-projection gated cross-modal fusion for robust multilingual emotion recognition,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/22379.
