Browse Items (14421 total)
Sort by:
-
Stride Insights: AI-Powered Field Position Forecasting System
This research discovers an AI-predictive model that uses a variety of machine learning algorithms to predict the top five finishers in a race. Horse racing is one of those paradigms that presents a challenging dataset against which the race outcomes can be predicted. Horse racing involves numerous variables: horse performance metrics, race conditions, and the jockey, together, decide the outcome of a race. To tackle such complexity, we test several algorithms, including CatBoost, Random Forest, k-Nearest Neighbors, Logistic Regression, Decision Trees, Support Vector Machines, Linear Regression, Naive Bayes, and Gradient Boosting, relating to the incorporation of categorical and continuous data. Our experiments demonstrate that the highest accuracy was achieved with CatBoost, which allows the model to handle categorical features well and is resistant to overfitting. The game theory component supplies useful elements in the strategic interaction between competing horses, thereby further increasing predictive accuracy. Performance metricsaccuracy, precision, and recall were used to estimate each model. The accuracy of CatBoost was found to be 74.1, while others were less accurate. This research provides an important resource for racing stakeholders, from trainers to punters. The research will be valuable in delineating race strategy and which horses are likely to win. This is an advancement in horse racing analytics and lays the foundation for predictive modeling to be explored in similar competitive environments in the future. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
CMSFE: Cross-Model SSL Feature Extraction for Enhanced Remote Sensing Data Representation
Automatic Labeling of Remote Sensing Data fastens analysis in various applications such as environmental monitoring, urban planning, and disaster management. Supervised machine learning approaches rely on labeled datasets created through time-consuming processes. Creation of labeled datasets requires higher resources and such datasets are harder to obtain in most of the domains, and especially in Remote Sensing. This study proposes Cross-Model Self-Supervised Feature Extraction (CMSFE), a novel approach that enhances representation learning in unlabeled remote sensing datasets by integrating features from multiple pre-trained models and refining them through self-supervised learning (SSL). The extracted features are integrated to form a comprehensive and robust feature set that aids in separating different cluster of imagery. Experimental results with EuroSAT dataset demonstrate the quality of feature extraction in separating various classes without any manual intervention or labeling. Dimensionality Reduction and Manifold Learning is applied for visual interpretation of extracted feature space. These features can be further reused for analysis or modeling, highlighting the potential of SSL-based feature extraction methods in remote sensing to enhance representation learning and reduce dependency on labeled data. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Machine Learning and Ensemble Models for Hazardous Asteroids Prediction
The prediction of hazardous asteroids near Earth is critical for planetary defense and avoiding any possible impacts. This study investigates the use of five ensemble models, XGBoost, Gradient Boost, CatBoost, Voting Classifier, and Random Forest, as well as four standalone machine learning models, K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Logistic Regression, and Decision Tree, to improve the prediction accuracy of identifying potentially hazardous asteroids. With 92% accuracy and 91% precision, Random Forest performed better than other models. It was the preferred choice for predicting hazardous asteroids because of its capacity to handle the hugedatasetwith efficiency and its ability tomanage non-linear data patterns. Additionally, XGBoost and CatBoost providedhigh accuracy at lowcomputational costs, making them suitable for real-time monitoring. KNN, on the other hand, did not perform well, and SVM's high processing time made it less useful. In particular, Random Forest ensemble modelperformed better at predicting hazardous asteroids. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Optimizing Car Recommendations: Power Analysis of Machine Learning Algorithms
The growing demand for efficient automobile recommendation systems has called for the need of algorithms that can proficiently assess and predict user preferences. This research focuses on the assessment of various machine learning algorithms, K-Nearest Neighbors (KNN), Decision Trees, Linear Regression, Weighted Scoring, and Content-Based Filtering. One of the main concerns of this study is to identify which recommendation algorithm is best suited for vehicle suggestions from an application perspective based on cost, mileage, engine size, fuel category, and user reviews. A dataset of 100 records was utilized to perform preliminary analyses so that algorithms were tested. Preprocessing procedures involved missing data handling, normalization of numerical features, and categorical variables encoding so that full precision predictions were obtained. Performances of algorithms were tested in terms of accuracy, scalability, and computational efficiency. Based on results, the highest accuracy was realized by Decision Trees with 85%, followed by Weighted Scoring at 82% and Linear Regression at 78%. Although KNN has an excellent accuracy of 74%, it is less scalable for very large datasets that are needed for an automobile recommendation system. The experimental results of this paper add to the evolving knowledge on the application of machine learning in the automobile world, again reinforcing the adequacy of Decision Trees as a valid technique for car recommendation systems. Recommendations for future studies include enhancing the database and exploring contemporary approaches to improve the accuracy of recommendations. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Boosting Competitiveness Through Data: How Online Procurement Drives Data-Driven Decision-Making in Traditional Kirana Shops
Integrating traditional and modern elements presents significant challenges, yet when successful, the synergy can be immense. Onboarding Indian Kirana shopssmall, unorganized mom-and-pop storesinto a comprehensive digital infrastructure is crucial given the current retail landscape and evolving consumer demands. These shops are vital to the countrys food and grocery ecosystem but face disruption from the rise of e-commerce and organized retail. By adapting new business models and technologies, Kirana shops can enhance their competitiveness. This study highlights the critical role of digitalization and data-driven decision-making in small scale retail formats. Researchers collected primary data from Kirana shops doing online procurement and those relying on traditional methods like purchasing from distributors. Analysis of primary data shows that shops utilizing online procurement platforms demonstrate superior performance, attributed to factors like competitive pricing and timely delivery. Most importantly, the insights and analytics provided by eB2B platforms are game changers. Data emerges as the key differentiator; digitalization enables access to critical analytics, allowing for informed business decisions that improve success rates and provide a competitive edge. Consequently, this study propose an ideal digital end-to-end model designed to enhance operational efficiency and drive growth for unorganized Kirana shops. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
A Novel Ridge Estimator for the Liu-Type Logistic Regression Model and Its Application to Demographic Data from Urban Slums in Karnataka
This study introduces new ridge estimators for the Liu-type logistic regression model which helps to improve the model performance if multicollinearity is present in the independent variables. Logistic regression is the regression that helps to model binary outcomes but it provides inaccurate and unstable regression coefficients in the presence of multicollinearity. As a result of this, the variance might increase and the predictive accuracy of the model gets reduced. To overcome this issue, the Liu-type logistic regression is used which uses ridge and Liu parameters to provide stable and accurate regression coefficients. Several ridge estimators are proposed in this study based on the Liu-type logistic model which can handle multicollinearity and give better predictive performance of the model. The proposed estimators have been tested on the demographic dataset from Urban Slums in Karnataka and through the empirical analysis it is observed that one among the new ridge estimators give the lowest Mean Square Error (MSE) when compared to the existing ridge estimators. The results show the usefulness of the new estimators to improve the performance of the model and also contribute to the betterment of the logistic regression techniques. This work highlights the critical need to handle multicollinearity in regression analysis and sets the path for researchers to further improve the estimators in the future. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
A Study on the Efficacy of Homogeneous and Heterogeneous Stacking in Machine Learning
This study addresses the crucial issue of early and accurate plant disease diagnosis by comparing the performance of homogeneous and heterogeneous stacking models. The study seeks to introduce a novel homogeneous multi-layered stacking model that combines the Light Gradient Boosting Method (LightGBM) and Extreme Gradient Boosting (XGBoost) for plant disease detection and compare it with a heterogeneous stacking model that employs diverse classifiers. While traditional methods typically use basic stacking techniques, this research explores the complexities of various model architectures. By leveraging the strengths of both LGBM and XGB classifiers, the approach aims to deliver a highly accurate and efficient disease detection system. A comprehensive evaluation reveals that the homogeneous stacking model achieves superior performance, with a ROC AUC of 85.12%, compared to 83.09% for the single LGBM model. The study utilizes metrics such as AUC-ROC curves, accuracy, and precision-recall curves to assess performance. Future work will focus on integrating these models with real-time monitoring systems and extending their applications to a wider range of crops and environmental conditions. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Robust Regression Approaches for the FamaFrench 5-Factor Model: A Real Data Study
The FamaFrench five-factor model (FF-5) is one of the advancements of capital asset pricing models (CAPM). Along with other FF-Models, it aims to understand companies and portfolios over a period, Analyzing better return capacity over the five factors such as SMBBusiness Size, HML-Spread between high and low book to market ratio, RMW- Robustness in operating profitability and CMA-investment style to be conservative or aggressive. FF-5-factor regression model widely uses Ordinary Least Squares estimator to estimate the parameter. However, due to the volatility of the markets over the years and not-normal periods, OLS estimators face setbacks due to the assumption violations that are a pre-requisite. This article presents an effort made to improve the performance of the FF-5-factor model using the Robust Dawoud-Kibria estimator. The performance of the FF-5-factor model is compared with other robust estimators such as M, MM, and MMS with MSE criteria. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Optimizing Fake News Classification Using Data Fusion and NLP-Based Machine Learning Techniques
In this research, the performance of different machine learning algorithms for identifying fake news using a dataset of news articles labeled as fake or real. The dataset was preprocessed to remove stop words, punctuation, digits, and special characters, and text normalization was applied. Two feature extraction methods, BOW (Bag-of-Words) and TF-IDF, were utilized to convert text data into numerical features. The dataset was split into training and testing phases to train and evaluate models, including Support Vector Classifier, Logistic Regression, Decision Trees, Gradient Boosting Classifier, Random Forest, and Multinomial Naive Bayes. Ensemble models combining various classifiers were also tested. Performance metrics, including precision, recall, and F1-score, were assessed, and confusion matrices were analyzed. Results showed that TF-IDF generally outperformed BOW. The Random Forest model achieved the highest precision (93%) but had a lower recall (83%). The SVC model showed a balanced performance with a precision of 90%, recall of 87%, and an F1-score of 86%. Ensemble models like GB?+?RF exhibited high precision (99%) but lower recall. These findings highlight the strengths of different algorithms in fake news detection and inform the development of practical classification tools. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
A New Approach to Robust Weighted Support Vector Regression and Its Applications in Medical MRI Image Processing
In recent years, the field of machine learning has experienced significant growth, with the emergence of various advanced technologies leveraging its principles. Among these, Support Vector Regression (SVR) has established itself as a widely recognized and robust regression technique. This article introduces a novel approach, Robust Hampel Weight-Based Support Vector Regression (RH-SVR), designed to enhance the resilience and efficiency of traditional SVR. The study investigates and compares several regression methods, including the Robust Linear Model (RLM), SVR, RH-SVR, and Least Squares Regression (LS). An experimental analysis was conducted using MRI images of the human heart and brain, both in their original form and with added noise at varying levels (10, 20, and 30%). Performance metrics such as Mean Square Error (MSE), Median Absolute Error (MDAE), Relative Standard Error (RSE), and Peak Signal-to-Noise Ratio (PSNR) were evaluated. The results consistently demonstrate that the proposed RH-SVR method achieves lower error rates and higher PSNR values, showcasing superior accuracy and robustness, particularly when processing noisy images. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Enhancing Diagnostic Accuracy in Familial Alzheimers Disease Through Gene Expression Profiling and Optimized Machine Learning Algorithms
The abstract should summarize the contents of the paper in short terms, i.e. 150250 words Early and accurate diagnosis of the Familial Alzheimers Disease (FAD) is critical for effective treatment of this genetically inherited form of Alzheimers disease. A prediction of FAD from gene expression data is investigated and the performance of various machine learning models on the discovered patterns is evaluated. We compare the output of Linear, Ridge Regression and a LightGBM model with hyper-tuned parameters on data from the Gene Expression Omnibus. The LightGBM model is then hyperparameter tuned to better capture the non-linear complexity of the data. To find the predictive performance, a model is evaluated using MSE, R squared and accuracy. The results show that both the LightGBM model and the traditional models have lower MSE, higher R squared and better accuracy. By examining FAD data on high-dimensional gene expression data these results show that when dealing with high-dimensional gene expression data, sophisticated machine learning models perform better than other approaches, such as LightGBM show higher diagnostic accuracy in FAD. It is shown in this research the power of machine learning is immense and is a powerful tool for the predictive modeling of Alzheimers Disease, as well as possible early detection and personalized treatment. Future work might also aim to further improve model performance with other more complex genetic datasets. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Leveraging Agentic RAG toReduce Hallucinations inLarge Language Models
Large Language Models (LLMs) have revolutionized language generation and comprehension. However, a notable issue remains, which is their sensitivity to hallucination, which may lead them to generate inaccurate or irrelevant content. Context dependency, or the capacity to use and understand ones environment, is crucial for surviving hallucinations. The agentic RAG framework offers a feasible solution, leveraging intelligent agents to strengthen contextual knowledge. Through the evaluation of entity roles and relationships, agentic RAG aids LLMs in understanding the variation of context, spotting inconsistencies and generating more precise and balanced replies. This research explores the establishment of Agentic RAG into LLMs to refine their reliability and efficiency by overcasting hallucinations and lifting contextual awareness. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Comprehensive Maintenance of Data Quality with Software Design and Analysis
Software development activities rely heavily on the maintenance of data quality to ensure that decisions are performed efficiently without glitches. The paper will be oriented toward an in-depth approach to ensuring data accuracy through the structured design of quality-assured software. Cross-functional collaboration, problem resolution, and data validation are some of the techniques that shed light upon the essence of having multiple quality checkpoints at every stage of the software development lifecycle. The methodology is illustrated by use case and sequence diagrams that explain the interaction between system components within data validation workflows. It introduces a case study within a startup environment, which could be applied practically to solve data inconsistencies and improve operational processes on a general level. Some of the solutions developed through this novel approach to overcoming common data management challenges would include error detection automation and real-time feedback mechanisms. This paper discusses the important activity of QA in relation to data integrity. From there, the said paper continues to discuss how that concept may be applied in real life. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Impact of ESG Index on the Stock Return: Empirical Evidence from CRIP Sector
In modern times, investment decisions are significantly influenced by a range of metrics. One widely embraced investment strategy in both developed and developing economies is investment through analysing Environmental, Social Responsibility, and Governance (ESG) factors. Investors rely on ESG scores as a valuable resource to pinpoint companies that are more likely to maintain their growth trajectory while reducing the possibility of encountering negative occurrences such as legal complications, controversies, and unfavourable public attention. This, in turn, facilitates more effective risk management and enhances returns on investment. However, the influence of ESG factors on stock returns within the Construction, Real Estate, Infrastructure, and Project (CRIP) sector is relatively limited. Consequently, the main aim of this study is to assess how ESG aspects influence the returns of stocks in companies operating in the CRIP sector. To conduct this analysis, we employed the Crisil ESG database, which provides comprehensive data on ESG metrics and stock returns. A sample containing 35 companies from the CRIP industry was meticulously chosen for investigation. To quantify the influence of ESG aspects on stock returns within the CRIP sector, a Fixed Effect Panel Regression Model was applied. The study results suggest a favourable and considerable relationship of ESG ratings on the closing stock price. Furthermore, the analysis demonstrates a large and beneficial influence of ESG ratings on stock returns. These results contain substantial implications for investors and stakeholders having a vested interest in making well-informed investment choices within the CRIP industry. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Advancing Brain Tumor Detection with Deep Learning and Machine Learning: A Performance Analysis of Different Deep Learning Models
The current study examines the difficulty of employing a deep learning architecture to diagnose brain tumors quickly and effectively. Our study is built upon a dataset of 253 MRI pictures that have been carefully categorized by medical experts as either positive (Yes) or negative (No) for brain tumors. To guarantee the robustness of model performance, the dataset is carefully divided into training and validation subsets, with 70% set aside for training and 30% for validation. We analyze the diagnostic performance of several machine learning models, including K-Nearest Neighbors (KNNs), Recurrent Neural Networks (RNNs), Support Vector Machines (SVMs), Convolutional Neural Networks (CNNs), and Artificial Neural Networks (ANNs). When these algorithms are applied to MRI scans, brain tumors can be quickly detected, and the increased accuracy makes patient treatment easier. The findings of this study could lead to a rapid and accurate diagnosis of brain tumors, which would greatly enhance patient care and treatment. The results also show how deep learning frameworks can transform medical image processing and diagnosis. This work offers a thorough review of recent findings and techniques for MRI scan-based deep learning-based brain tumor detection. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026. -
Integrating Simple Temporal Attention for Improved Video Summarization
Simple Temporal Attention (STA) in video summarization can improve deep learning model performance while tackling complexity and multi-view dependency problems. Many of the current models are too complex and dependent on multi-view setups to be scalable in single-camera settings. The suggested STA mechanism reduces model complexity without sacrificing accuracy, making it easier to recognize important moments in videos. To further increase the efficacy of summarization, a spatio-temporal mechanism is also introduced to capture crucial dynamics between video frames. The approach is evaluated on two benchmark datasets, UCF50 and TVSum, demonstrating significant improvements in model performance. This study provides a scalable solution for video summarization by highlighting the useful advantages of integrating STA for producing succinct and informative video summaries through a comparison of different deep learning. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026. -
Optimizing the Performance of Nae Bayesian Classification Using RELSUn, a Bi-stage Feature Selection Algorithm
Feature Selection (FS) is an ideal pre-processing stage to make supervised learning more effective and efficient. RELIEF_NCM, a variant of Relief, a non-parametric feature weighting algorithm in the literature developed to overcome the limitations of RELIEF_DISC. It is designed to consider nominal and continuous features and support multi-class problems. The RELIEF_NCM algorithm removes the irrelevant features from the dataset, but there may still be a possibility of redundant features that may hurt the performance of the classifiers. RedunSUn, a method that removes redundant features using Symmetric Uncertainty (SU), has been introduced in the research paper. The research article introduced a bi-stage FS algorithm to remove redundant and irrelevant features in the dataset by combining RELIEF_NCM and RedunSUn called RELSUn. This hybrid approach RELSun has been examined using eight real-time datasets from the UCI machine learning repository. The investigational outcomes reveal that RELSun outperforms RELIEF_NCM and state-of-the-art methods regarding classification accuracy, precision, and speed of Nae Bayesian Classifier (NBC) with minimum selected attributes. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026. -
The Smart Detection of Ovarian Cancer in Complex Medical Images Using Deep Learning
Ovarian cancer is a challenging disease to detect and diagnose, especially in complex medical images where the cancerous lesions may be small and difficult to differentiate from surrounding healthy tissue. The use of deep learning algorithms has shown promising results in computer-aided diagnosis of various cancers. This study aims to develop a smart detection system for ovarian cancer in complex medical images using deep learning techniques. The proposed system will have the ability to accurately and efficiently identify cancerous lesions, leading to earlier detection and improved treatment outcomes. Through the use of advanced computer vision and machine learning methods, the system will be able to learn from a large dataset of medical images and make accurate predictions. This research has the potential to significantly improve the diagnosis and treatment of ovarian cancer, ultimately saving lives. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026. -
An Enhanced RFM Customer Value-Based Customer Segmentation and Evaluation
Machine Learning Algorithms are widely used in the contemporary era of highly compatible technical improvements to provide answers to the challenges of business environment, yet crucial services for a firm to run successfully in this intensely competitive E-commerce sector. Recently, strategies like clustering and classification mechanisms that allow for the classification of both existing and new clients into clusters have also produced positive outcomes. Recency, Frequency, and Monetary (RFM) measures are hugely being used these days to perform these kinds of tasks. In this study, individual one-dimensional clustering on the Recency, Frequency, and Monetary columns was performed, and a weighted average or preferred linear combination of the three features was then used to calculate an overall score. Summing up the result of three individual clusters. Finally, all of the distinct clients were divided into these three segments based on the overall score, which was divided into three categories. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025. -
Key-Based Message Transmission to Avoid Broadcast Storm in VANET
VANET network communicates traffic information to the neighbor vehicles through low cost wireless communication technologies. ITS major task is to share the road information to the vehicles at most on time to minimize the threat of road accidents. The vehicle that receives communication from its neighbor becomes a part of VANET that controls and forward the received information to the neighbor vehicles. In this paper, a design to reduce the broadcasting storm is proposed. The approach named as key-based message broadcast for VANET (KMB-V) to reduce the broadcast storm in VANET. This approach forms a quiet a little amount of nodes (vehicles) to form a cluster with Cluster Head (CH) and creates a novel unique key to transmit before message transmission to avoid broadcast storm. This approach proves a better performance through PDR, network life and throughput parameters in comparison to previous works. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
