Balancing the Cart: Evaluating Imbalance-Aware Machine-Learning Pipelines for Predicting E-Commerce Purchases
- Title
- Balancing the Cart: Evaluating Imbalance-Aware Machine-Learning Pipelines for Predicting E-Commerce Purchases
- Creator
- Mishra, Anchal; Chandan; Shetty, Chetan; Malhotra, Amit; Maheshwari, Abhishek; Basha, Md Shaik Amzad
- Description
- We present a comprehensive investigation into predicting purchase conversions in e-commerce sessions, addressing the challenges of severe class imbalance and complex user behavior signals. Using a real-world dataset of 12,330 user sessions described by 24 features (interaction counts, durations, bounce/exit rates, page values, temporal and device metadata), we first conduct exploratory analysis to reveal seasonal peaks in conversion and strong correlations between page value metrics and purchase likelihood. To mitigate the low positive-class rate (10.8%), we embed SMOTE oversampling within our training pipelines, ensuring balanced learning for all classifiers. We then perform a head-to-head comparison of twelve algorithmsranging from linear and generative methods (Logistic Regression, LDA, Gaussian NB), instance-based learners (KNN, SVM), bagging ensembles (Random Forest, Extra Trees, AdaBoost), gradient boosters (XGBoost, LightGBM, CatBoost), to a feed-forward neural network (MLP). Evaluation on a stratified 80/20 holdout set uses overall accuracy plus precision, recall, and F1-score for the purchase class, alongside ROC AUC. Our results demonstrate that ensemble tree methods dramatically outperform simpler models: LightGBM achieves the highest F1 (0.694) and ROC AUC (0.924), with Extra Trees closely following (F1 0.678, AUC 0.926). Simpler classifiers, despite SMOTE, lag markedly in recall and F1, underscoring the importance of powerful nonlinear learners. These findings establish a new benchmark for imbalance-aware conversion prediction and recommend SMOTE-augmented gradient boosting and randomized tree ensembles as the methods of choice for future research and practical deployments. 2025 IEEE.
- Source
- 2025 IEEE 6th Global Conference for Advancement in Technology, GCAT 2025;
- Date
- 01-01-2025
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Subject
- E-Commerce Conversion Prediction; Ensemble Learning; Gradient Boosting (LightGBM); Imbalanced Classification; Purchase Intent Modeling; Randomized Trees (Extra Trees); SMOTE Oversampling
- Coverage
- Mishra A., Institute of Management Studies, Ghaziabad, India; Chandan, Symbiosis International (Deemed University), Symbiosis Centre for Management Studies, Bengaluru, India; Shetty C., Dayanand Sagar College of Arts, Science and Commerce, Bengaluru, India; Malhotra A., School of Commerce, Finance and Accountancy, Christ (Deemed to be University), Ghaziabad, India; Maheshwari A., School of Commerce, Finance and Accountancy, Christ (Deemed to be University), Ghaziabad, India; Basha M.S.A., Gandhi Institute of Technology and Managemet (Deemed to be University), Gitam School of Business, Bangalore, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISBN: 979-833151458-7;
- Format
- online
- Language
- English
- Type
- Conference paper
Collection
Citation
Mishra, Anchal; Chandan; Shetty, Chetan; Malhotra, Amit; Maheshwari, Abhishek; Basha, Md Shaik Amzad, “Balancing the Cart: Evaluating Imbalance-Aware Machine-Learning Pipelines for Predicting E-Commerce Purchases,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/25841.
