Tri-projection gated cross-modal fusion for robust multilingual emotion recognition

Title: Tri-projection gated cross-modal fusion for robust multilingual emotion recognition
Creator: Nair, Suja C.; S., Asha; K.V., Priya; Shaji, Kavya Clare P.; C., Balakrishnan; Kokilavani, T.
Description: Existing multimodal approaches in emotion recognition (ER) rely on static or pairwise fusion strategies. These systems do not adequately address the challenges in real-world conversational systems, which require resilience to both multilingual code-switching and variable reliability of multiple modalities. We propose a transformer-based tri-modal emotion identification framework with a novel Tri-Projection Gated Cross-Modal Fusion (T-GCMF) module the first multimodal emotion recognition architecture explicitly designed for code-switched conversational input. T-GCMF simulates tri-modal interactions by explicitly calculating modality-specific confidence and cross-modal consistency, allowing for dynamic suppression of unreliable modalities during inference. Acoustic and visual cues are retrieved using CNNLSTM and deep CNN encoders, respectively. Textual representations are generated using XLM-RoBERTa to handle code-switched language reliably. We introduce Hinglish-MELD, the first multimodal emotion recognition dataset with aligned text, audio, and visual streams containing code-switched conversational content, filling a critical gap in the literature. With an accuracy of 88.3% and an F1-score of 87.0, the suggested confidence-aware fusion technique greatly surpasses unimodal, monolingual, and non-gated multimodal baselines. These findings demonstrate T-GCMF as a successful approach for emotion recognition in linguistically heterogeneous, real-world interactive systems and emphasize the significance of confidence-driven tri-modal integration. 2026
Source: Knowledge-Based Systems;Volume;344;Issue;;Article No.;116079;
Date: 01-01-2026
Publisher: Elsevier B.V.
Subject: Affective computing; Code-switched emotion recognition; Confidence-aware multimodal fusion; Cross-modal modeling; Multilingual analysis
Coverage: Nair S.C., Department of Computer Science and Engineering, Muthoot Institute of Technology and Science, Kochi, India; S. A., Department of Computer Science and Engineering, Sahrdaya College of Engineering and Technology, Thrissur, India; K.V. P., Department of Computer Science and Engineering, School of Engineering and Sciences, SRM University-AP, Andhra Pradesh, Amaravati, 522240, India; Shaji K.C.P., Department of Computer Science and Engineering, Sahrdaya College of Engineering and Technology, Thrissur, India; C. B., Department of Computer Science, CHRIST University, Bengaluru, India; Kokilavani T., Department of Computer Science, CHRIST University, Bengaluru, India
Rights: Restricted Access; Hardcopy may be available in the library
Relation: ISSN: 9507051; CODEN: KNSYE
Format: online
Language: English
Type: Article
Identifier: https://doi.org/10.1016/j.knosys.2026.116079

https://www.scopus.com/pages/publications/105037454999?origin=resultslist

Collection

Citation

Nair, Suja C.; S., Asha; K.V., Priya; Shaji, Kavya Clare P.; C., Balakrishnan; Kokilavani, T., “Tri-projection gated cross-modal fusion for robust multilingual emotion recognition,” CHRIST (Deemed To Be University) Institutional Repository, accessed August 2, 2026, https://archives.christuniversity.in/items/show/22379.

Tri-projection gated cross-modal fusion for robust multilingual emotion recognition

Collection

Citation

Output Formats