Data-Driven Drug Discovery Optimization for Breast Cancer Using Interpretable Machine Learning Models
- Title
- Data-Driven Drug Discovery Optimization for Breast Cancer Using Interpretable Machine Learning Models
- Creator
- Banerjee, Dyuti; Krishnan, Sivaneasan Bala; Upreti, Kamal; Tharewal, Sumegh Shrikant; Shankar, Uma; Kshirsagar, Pravin; Kumar, Manoj
- Description
- Breast cancer remains one of the most prevalent malignancies worldwide, posing significant therapeutic challenges due to tumor heterogeneity and drug resistance. This study presents a reproducible, data-driven machine learning protocol for predicting drug sensitivity in breast cancer cell lines, with the dual objective of identifying potent single agents and synergistic drug combinations. Using curated datasets from the Genomics of Drug Sensitivity in Cancer (GDSC), two predictive approaches were implemented: a standalone XGBoost regressor and a hybrid Autoencoder-XGBoost pipeline. Preprocessing included label encoding, one-hot encoding, Z-score standardization, missing value imputation, and dimensionality reduction via PCA. Model evaluation demonstrated that XGBoost achieved superior performance (MSE = 1.3789, R2 = 0.8145) compared to the hybrid model (MSE = 4.0322, R2 = 0.4577). Interpretability was addressed using SHapley Additive exPlanations (SHAP), which identified TARGET_PATHWAY, DRUG_ID, TARGET, and CELL_LINE_NAME as key predictive features, aligning with established pharmacological mechanisms. Predicted synergy scores, derived from combining model outputs with DrugComb and SynergyDB data, highlighted promising drug pairs such as Bortezomib + Romidepsin and Paclitaxel + Bortezomib. These findings were further supported by PCA-based pharmacological clustering, revealing biologically relevant groupings of drugs with similar mechanisms of action. The proposed protocol provides a transparent and adaptable framework for precision oncology research, enabling both predictive accuracy and biological interpretability. By integrating rigorous preprocessing, model validation, explainability, and drug synergy analysis, this workflow offers a scalable foundation for translational drug discovery and repurposing in breast cancer treatment. 2025 JoVE Journal of Visualized Experiments.
- Source
- Journal of Visualized Experiments;Volume;2025;Issue;223;
- Date
- 01-01-2025
- Publisher
- MyJoVE Corporation
- Coverage
- Banerjee D., Koneru Lakshmaiah Education Foundation, India; Krishnan S.B., Singapore Institute of Technology, Singapore; Upreti K., Christ University, India; Tharewal S.S., DBS Global University, India; Shankar U., Qaiwan International University, Iraq; Kshirsagar P., J D College of Engineering & Management, India; Kumar M., Gurukula Kangri University, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 1940087X;
- Format
- online
- Language
- English
- Type
- Article
Collection
Citation
Banerjee, Dyuti; Krishnan, Sivaneasan Bala; Upreti, Kamal; Tharewal, Sumegh Shrikant; Shankar, Uma; Kshirsagar, Pravin; Kumar, Manoj, “Data-Driven Drug Discovery Optimization for Breast Cancer Using Interpretable Machine Learning Models,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/23570.
