A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy

Title: A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy
Creator: Chandramohan, J.; Karthick, K.; K, Aruna S; Ponkumar, G.
Description: Power transformers are critical infrastructure assets where unexpected failures incur severe technical and economic penalties. This study proposes a robust, explainable machine-learning (ML) pipeline for predicting the transformer Health Index (HI) using routinely collected dissolved gas analysis (DGA) and dielectric measurements. To ensure model reliability, the pipeline specifically addresses data pathologiesnamely extreme skewness and heavy tailsusing YeoJohnson transformations, while mitigating multicollinearity through hierarchical correlation clustering (|r| ? 0.85) followed by a Variance Inflation Factor (VIF) screening (VIF ? 5). Four high-performance ensemblesRandom Forest, XGBoost, LightGBM, and CatBoostwere optimized via randomized cross-validation. Experimental results on a dataset of 470 records demonstrate consistent generalization across all models (RMSE ? 0.022), with Random Forest providing superior accuracy (MAPE ? 1.24%). A Taylor diagram confirmed consistent generalization (correlation ? 0.730.78 and matched variance), while residual analysis showed minimal bias. SHAP explanations indicated that dibenzyl disulfide (DBDS) and interfacial tension (Interfacial V) were the most influential positive drivers of HI; water content tended to depress HI; and several gases (e.g., methane, hydrogen, acetylene, CO) contributed positively at higher concentrations. The proposed workflow was robust to skew/heavy tails and multicollinearity, required no feature scaling, and produced transparent, practitioner-ready insights that support condition-based maintenance at fleet scale. 2026 Elsevier B.V.
Source: Electric Power Systems Research;Volume;259;Issue;;Article No.;113275;
Date: 01-01-2026
Publisher: Elsevier Ltd
Subject: Dissolved gas analysis; Health index prediction; Machine learning; Multicollinearity; Power transformer
Coverage: Chandramohan J., Department of Electrical and Electronics Engineering, Gnanamani College of Technology, Pachal, Namakkal, 637018, India; Karthick K., Department of Electrical and Electronics Engineering, GMR Institute of Technology (GMRIT) (Deemed to be University), Andhra Pradesh, Rajam, 532127, India; K A.S., Department of AI and Data Science Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Kengeri Campus, Bangalore, 560074, India; Ponkumar G., Department of Electrical and Electronics Engineering, Panimalar Engineering College, Chennai, 600123, India
Rights: Restricted Access; Hardcopy may be available in the library
Relation: ISSN: 3787796; CODEN: EPSRD
Format: online
Language: English
Type: Article
Identifier: https://doi.org/10.1016/j.epsr.2026.113275

https://www.scopus.com/pages/publications/105038222544?origin=resultslist

Collection

Citation

Chandramohan, J.; Karthick, K.; K, Aruna S; Ponkumar, G., “A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy,” CHRIST (Deemed To Be University) Institutional Repository, accessed July 11, 2026, https://archives.christuniversity.in/items/show/22246.

A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy

Collection

Citation

Output Formats