Efficient Scene Text Recognition in Noisy Environments Using Fusion-Based Adaptation and Triple-Level Confidence Modeling

Title: Efficient Scene Text Recognition in Noisy Environments Using Fusion-Based Adaptation and Triple-Level Confidence Modeling
Creator: Binjumah, Weam M.; Veerasamy, Bala Dhandayuthapani; Kavitha, S.; Mishra, Suchi; Pandit, Shraddha Viraj; Ameerbakhsh, Omair
Description: Scene Text Recognition (STR) involves deciphering textual content embedded within complex, natural scene images, often following detection stages or integrated into end-to-end pipelines. Addressing the challenge of STR in noisy target domains, characterized by inter-domain and intra-domain noise, cluttered backgrounds, and irregular text shapes, this study proposes a robust and understandable framework titled Fusion-Based Adaptation for Scene Text Recognition (FASTR). The framework integrates a primary classifier with an epistemically aware auxiliary classifier to model uncertainty, supported by a novel Adaptive Scale Feature Module (ASFM) that enhances localisation through pixel-level mask prediction and multi-scale fusion. A Triple-Level Confidence (TLC) strategycategorized into high, medium, and low consistency thresholdsis introduced to enforce consistency loss and improve generalisation across domains. Additionally, a pseudo-labelling scheme refines the adaptation process through self-training under structured domain noise. FASTR is trained and evaluated on both synthetic (SynthText, MJSynth) and real-world (ICDAR 2013, SVT, and IIIT5K) datasets. It achieves a word recognition accuracy of 92.4% on IIIT5K, 89.7% on SVT, and 93.1% on ICDAR 2013, outperforming state-of-the-art baselines by an average margin of 2.8%. On cross-domain benchmarks with added noise, FASTR maintains high performance, achieving an average F1-score of 90.5%, with precision and recall values of 91.2% and 89.9%, respectively. Hyperparameters, training configurations, and evaluation metrics are transparently documented to ensure reproducibility. The findings demonstrate superior scale robustness, effective domain adaptation, and resilience to cluttered backgrounds, with explainability preserved through interpretable confidence maps and visual cues. The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2025.
Source: SN Computer Science;Volume;6;Issue;8;Article No.;946;
Date: 01-01-2025
Publisher: Springer
Subject: Adaptive scale feature module (ASFM); Domain adaptation; Noise-robust recognition; Scene text recognition (STR); Triple-level confidence (TLC)
Coverage: Binjumah W.M., Applied College, Taibah University, Madina, 42353, Saudi Arabia; Veerasamy B.D., Department of Computing and Information Sciences, University of Technology and Applied Sciences, Shinas Campus, Shinas, Oman; Kavitha S., Department of Computer Science, Christ University, Karnataka, Bengaluru, 560029, India; Mishra S., Department of Electronics Engineering, Samrat Ashok Technological Institute, Madhya Pradesh, Vidisha, 464001, India; Pandit S.V., PES Modern College of Engineering, Shivajinagar, Maharashtra, Pune, 411005, India; Ameerbakhsh O., College of Computer Science and Engineering. Information Systems Department, Taibah University, Medina, Saudi Arabia
Rights: Restricted Access; Hardcopy may be available in the library
Relation: ISSN: 2662995X;
Format: online
Language: English
Type: Article
Identifier: https://doi.org/10.1007/s42979-025-04489-x

https://www.scopus.com/pages/publications/105021120662?origin=resultslist

Collection

Citation

Binjumah, Weam M.; Veerasamy, Bala Dhandayuthapani; Kavitha, S.; Mishra, Suchi; Pandit, Shraddha Viraj; Ameerbakhsh, Omair, “Efficient Scene Text Recognition in Noisy Environments Using Fusion-Based Adaptation and Triple-Level Confidence Modeling,” CHRIST (Deemed To Be University) Institutional Repository, accessed July 8, 2026, https://archives.christuniversity.in/items/show/22141.

Efficient Scene Text Recognition in Noisy Environments Using Fusion-Based Adaptation and Triple-Level Confidence Modeling

Collection

Citation

Output Formats