A shap-enhanced PCA-DBSCAN framework for interpretable retail customer segmentation and strategic insight
- Title
- A shap-enhanced PCA-DBSCAN framework for interpretable retail customer segmentation and strategic insight
- Creator
- Sucharitha, M. Martha; Kumar, J. P. Senthil; Sateesha, G.; Prasad, M. V. Ram; Basha, Md. Shaik Amzad
- Description
- The rapid expansion of online retail underscores the critical need for precise customer segmentation to drive personalized marketing, reduce churn, and boost lifetime value. This study develops an end-to-end, highly interpretable segmentation pipeline encompassing advanced feature engineering, dimensionality reduction, exhaustive hyperparameter tuning, and robust validation to reveal stable, actionable customer groups in a large, real-world UK online-retail dataset (541,909 records). We augment the classic RFM (Recency, Frequency, Monetary) framework with: TPAC TF-IDF embeddings of item descriptions, holiday-purchase flags, and exponential recency decay; CACV net monetary value and cancellation ratios. After outlier filtering on RFM scores, we apply PCA (230 dimensions) and compare ten clustering methods (selected to represent major algorithmic paradigms: centroid-based [K-Means], probabilistic [GMM], hierarchical [BIRCH, Agglomerative], density-based [DBSCAN, OPTICS, HDBSCAN], graph-based [Spectral], message-passing [Affinity Propagation], and mode-seeking [Mean Shift]). We perform a full grid search per algorithm using a 'safe' silhouette scorer (ignoring noise) and also report Davies-Bouldin and Calinski-Harabasz indices. Temporal stability is assessed via adjusted Rand indices across time splits, and cluster interpretability is enhanced through SHAP-based feature importance analyses. By integrating textual, temporal, and cancellation behaviors into segmentation followed by systematic tuning and multi-metric validation our pipeline delivers superior cluster quality and actionable business insights compared to prior work. Segments directly enable strategic interventions: 'High-Decay Loyalists' (precision = 0.92) receive VIP retention offers yielding 2231% ROI lift, while 'At-Risk Cancellers' (recall = 0.89) trigger targeted win-back campaigns. We also demonstrate a reproducible framework for selecting both model and feature set. DBSCAN (? = 0.3, min_samples = 3 on 10 PCA components) achieved the best silhouette score (0.986), markedly exceeding the 0.72 benchmark in the literature. Agglomerative clustering (average linkage, 2 clusters) scored 0.776, while OPTICS and Spectral Clustering also outperformed classical Gaussian- or centroid-based models. A temporal ARI above 0.8 confirms cluster stability. The Author(s) under exclusive licence to The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden 2025.
- Source
- International Journal of System Assurance Engineering and Management;Volume;17;Issue;2;pp.543-607
- Date
- 01-01-2026
- Publisher
- Springer
- Subject
- Customer segmentation; DBSCAN; Hyperparameter tuning; PCA; RFM + TPAC + CACV; SHAP interpretability; Silhouette score; Temporal stability
- Coverage
- Sucharitha M.M., Christ (Deemed to Be University), Bangalore, India; Kumar J.P.S., GITAM School of Business, Gandhi Institute of Technology and Management (Deemed to Be University), Bangalore, India; Sateesha G., International Institute of Business Study, Bangalore, India; Prasad M.V.R., GITAM School of Business, Gandhi Institute of Technology and Management (Deemed to Be University), Bangalore, India; Basha M.S.A., GITAM School of Business, Gandhi Institute of Technology and Management (Deemed to Be University), Bangalore, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 9756809;
- Format
- online
- Language
- English
- Type
- Article
Collection
Citation
Sucharitha, M. Martha; Kumar, J. P. Senthil; Sateesha, G.; Prasad, M. V. Ram; Basha, Md. Shaik Amzad, “A shap-enhanced PCA-DBSCAN framework for interpretable retail customer segmentation and strategic insight,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 19, 2026, https://archives.christuniversity.in/items/show/22031.
