A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy
- Title
- A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy
- Creator
- Chandramohan, J.; Karthick, K.; K, Aruna S; Ponkumar, G.
- Description
- Power transformers are critical infrastructure assets where unexpected failures incur severe technical and economic penalties. This study proposes a robust, explainable machine-learning (ML) pipeline for predicting the transformer Health Index (HI) using routinely collected dissolved gas analysis (DGA) and dielectric measurements. To ensure model reliability, the pipeline specifically addresses data pathologiesnamely extreme skewness and heavy tailsusing YeoJohnson transformations, while mitigating multicollinearity through hierarchical correlation clustering (|r| ? 0.85) followed by a Variance Inflation Factor (VIF) screening (VIF ? 5). Four high-performance ensemblesRandom Forest, XGBoost, LightGBM, and CatBoostwere optimized via randomized cross-validation. Experimental results on a dataset of 470 records demonstrate consistent generalization across all models (RMSE ? 0.022), with Random Forest providing superior accuracy (MAPE ? 1.24%). A Taylor diagram confirmed consistent generalization (correlation ? 0.730.78 and matched variance), while residual analysis showed minimal bias. SHAP explanations indicated that dibenzyl disulfide (DBDS) and interfacial tension (Interfacial V) were the most influential positive drivers of HI; water content tended to depress HI; and several gases (e.g., methane, hydrogen, acetylene, CO) contributed positively at higher concentrations. The proposed workflow was robust to skew/heavy tails and multicollinearity, required no feature scaling, and produced transparent, practitioner-ready insights that support condition-based maintenance at fleet scale. 2026 Elsevier B.V.
- Source
- Electric Power Systems Research;Volume;259;Issue;;Article No.;113275;
- Date
- 01-01-2026
- Publisher
- Elsevier Ltd
- Subject
- Dissolved gas analysis; Health index prediction; Machine learning; Multicollinearity; Power transformer
- Coverage
- Chandramohan J., Department of Electrical and Electronics Engineering, Gnanamani College of Technology, Pachal, Namakkal, 637018, India; Karthick K., Department of Electrical and Electronics Engineering, GMR Institute of Technology (GMRIT) (Deemed to be University), Andhra Pradesh, Rajam, 532127, India; K A.S., Department of AI and Data Science Engineering, School of Engineering and Technology, CHRIST (Deemed to be University), Kengeri Campus, Bangalore, 560074, India; Ponkumar G., Department of Electrical and Electronics Engineering, Panimalar Engineering College, Chennai, 600123, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 3787796; CODEN: EPSRD
- Format
- online
- Language
- English
- Type
- Article
Collection
Citation
Chandramohan, J.; Karthick, K.; K, Aruna S; Ponkumar, G., “A robust explainable machine learning pipeline for transformer health index prediction addressing data pathologies and redundancy,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 21, 2026, https://archives.christuniversity.in/items/show/22246.
