Multi-stage spatial temporal ensemble model with integrated learning methods for robust deepfake detection
- Title
- Multi-stage spatial temporal ensemble model with integrated learning methods for robust deepfake detection
- Creator
- Yassin, Warusia; Abdollah, Mohd Faizal; Ismail, Anuar; Kamis, Noor Hisham; Razak, Siti Fatimah Abdul; Joy, Helen K.
- Description
- Deepfake detection remains a significant challenge as modern generative models increasingly minimize visible artefacts, and many existing approaches rely solely on either spatial or temporal cues, which limits their robustness and generalization. Many existing hybrid approaches integrate mature learning models in linear or stacked pipelines, which often suffer from error propagation, reduced interpretability, and suboptimal generalization. Unlike prior hybrid approaches that primarily stack spatialtemporal learners, the proposed multi-stage hybrid Integrated Learning Method (ILM) introduces a validation-aware dual-detection mechanism, an independent dual-path spatial-temporal learning design, and a decision-level nonlinear ensemble fusion strategy, explicitly mitigating face mislocalization, temporal dilution, and false-positive propagation observed in existing deepfake detection pipelines. The ILM framework structurally coordinates facial region localization and validation using YOLOv5 and Haar Cascade, deep spatial feature extraction using ResNet-50, frame-level spatial classification via LightGBM, and temporal sequence modeling using LSTM networks. The outputs from the spatial and temporal pathways are subsequently fused using a Random Forest classifier, enabling nonlinear aggregation of complementary evidence while preserving interpretability. Experimental results on the FaceForensics + + and Celeb-DF (v2) benchmark datasets show that ILM achieves 98.30% accuracy, 97.90% precision, and 98.70% recall, outperforming recent state-of-the-art CNNLSTM, ViT-based, and CNNTransformer models by 16%. Ablation studies confirm that each module contributes incrementally to performance stability and false-positive reduction, demonstrating the importance of ILMs multi-stage architecture rather than the individual algorithms alone. Overall, ILM provides a modular, accurate, and computationally efficient solution suitable for deployment in digital forensics, media authentication, and AI governance. Future work will extend ILM with transformer-based global encoders and explainable AI techniques to further improve interpretability and robustness against emerging deepfake models. The Author(s) 2026.
- Source
- Discover Computing;Volume;29;Issue;1;Article No.;291;
- Date
- 01-01-2026
- Publisher
- Springer Science and Business Media B.V.
- Subject
- Deepfake detection; Ensemble learning; Integrated learning; LightGBM; LSTM; Multi-stage framework; ResNet-50; Spatial-temporal features; Video forensics; YOLOv5
- Coverage
- Yassin W., Faculty of Artificial Intelligence and Cyber Security, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia; Abdollah M.F., Faculty of Artificial Intelligence and Cyber Security, Universiti Teknikal Malaysia Melaka, Melaka, Malaysia; Ismail A., Ask-Pentest Sdn Bhd, Kuala Lumpur, Malaysia; Kamis N.H., Faculty of Information Science and Technology, Multimedia Universiti, Melaka, Malaysia; Razak S.F.A., Faculty of Information Science and Technology, Multimedia Universiti, Melaka, Malaysia; Joy H.K., Department of Computer Science, Christ University, Bangalore, India
- Rights
- All Open Access; Gold Open Access; Green Open Access
- Relation
- ISSN: 29482992;
- Format
- online
- Language
- English
- Type
- Article
Collection
Citation
Yassin, Warusia; Abdollah, Mohd Faizal; Ismail, Anuar; Kamis, Noor Hisham; Razak, Siti Fatimah Abdul; Joy, Helen K., “Multi-stage spatial temporal ensemble model with integrated learning methods for robust deepfake detection,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 19, 2026, https://archives.christuniversity.in/items/show/21896.
