Browse Items (11809 total)
Sort by:
-
Decoding Big Data: The Essential Elements Shaping Business Intelligence
In today's Business Intelligence (BI) world, Big Data Analytics integration has become critical, transforming company strategy and decision-making processes. This study investigates the complex influence of Big Data on business intelligence, focusing on important drivers of this transition. It investigates how Big Data's improved data processing capabilities, integration of advanced analytics techniques such as machine learning, and real-time data insights enable businesses to make more informed decisions and achieve a competitive advantage. Furthermore, the paper emphasizes the importance of personalized consumer insights, operational savings, and strategic benefits obtained from predictive analytics when adopting Big Data for BI. 2024 IEEE. -
Harnessing Machine Learning for Mental Health: A Study on Classifying Depression-Related Social Media Posts
This study is of particular relevance in the way it identifies depression-related content on social media using a machine learning model to classify posts and comments. This dataset, encompassing around 6500 entries from various platforms including Facebook, was rigorously annotated by four proficient English-speaking undergraduate students together with the final label which is established via majority voting. Data Preprocessing, initial cleaning, normalization and TF-IDF feature creation through vectorization for the output of POS tags. The different machine learning models that were trained and tested are Logistic Regression, Random Forest, SVM (Support Vector Machine), Naive Bayes Gradient Boosting Algorithm K-NN (K nearest Neighbors) AdaBoost Decision Tree. Authors evaluated the models and measured their accuracy, precision score, recall rate (also known as sensitivity) in addition to F1-score. Gradient Boost, Random Forest, and SVM were top performers among which Gradient boosting was found to be an overall best one with almost 98.5%. They show that machine learning model can successfully predict the label of social media posts, as a way for accurately identifying depression from text data. This detailed model performance evaluation is useful in understanding what each approach does well and poorly, shedding light into whether they are / would be actually suitable for real-world applications. This study not only developed discriminative classifiers, but also included detailed analysis of their performance which should hopefully guide future work and help in practical implementations for real-time mental health monitoring. Through this work, this study aim to facilitate timely identification of depression-related posts, ultimately supporting mental health awareness and intervention efforts on social media platforms. 2024 IEEE. -
Diabetes Mellitus Classification Using Machine Learning Algorithms with Hyperparameter Tuning
Diabetes Mellitus is a prevalent condition globally, marked by elevated blood sugar levels resulting from either insufficient production of insulin or the body cells' inability to respond appropriately to released insulin. For people with diabetes to lead healthy, normal lives, early identification and treatment of the condition are essential. With the need to move away from current traditional procedures, towards a noninvasive methodology, machine learning and data mining technologies can be very useful in the classification of diabetes. Creating an effective machine learning model for the classification of diabetes mellitus was the primary goal of this research. This work is primarily carried out on combined Pima Indian diabetes dataset and German Frankfurt diabetes dataset. The class imbalance issue has been resolved using Synthetic Minority Oversampling Technique. One-hot encoding is applied to convert categorial features to numerical and various single and ensemble classifiers with the best hyperparameters obtained using GridSearchCV method were employed on the pre-processed dataset. With an AUC of 0.98 and maximum accuracy of 98.79%, the Random Forest ensemble technique outperformed the other models, according to the experimental results. As a result, the algorithm might be used to predict diabetes and alert doctors to serious cases that call for emergency care. 2024 IEEE. -
Advanced Sentiment Analysis: From Lexicon-Enhanced BERT to Dimensionality Reduction Using NLP
Social media platforms serve as vital connections for communication, generating massive quantities of data that represent an array of perspectives. Efficient sentiment analysis is necessary for understanding public opinion, particularly in domains such as product reviews and socio-political discussion. This paper develops a novel sentiment analysis model that is customized for social media data by integrating machine learning algorithms, language processing techniques with part-of-speech tagging, and dimensionality reduction methods. The model will improve sentiment analysis performance by tackling challenges like noise and data domain variations. To further improve sentiment representation, it includes convolutional neural networks (CNNs), BERT embeddings, N-grams, and sentiment lexicons. The model's effectiveness is determined on a variety of datasets, which enhances sentiment analysis in social media discussion. This paper goes beyond sentiment analysis in code-mixed, multilingual text and highlights the importance of careful data before treatment and an extensive variety of ML algorithms. This study attempts to explain the nuances of sentiment analysis and its use in social media discussions through methodical research. 2024 IEEE. -
Predicting and Analyzing Early Onset of Stroke Using Advanced Machine Learning Classification Technique
Around the world, stroke is the leading cause of death. When blood vessels in the brain rupture, they cause damage. Alternatively, blockage in a blood vessel that supplies oxygen and other nutrients may also lead to this disease. This study uses various machine learning models to predict whether someone will have a stroke or not. Different physiological features were taken into account by this study while using Logistic Regression; Decision Tree Classification; Random Forest Classification; K-Nearest Neighbors (KNN); Support Vector Machine (SVM); Nae Bayes classifier algorithm; and XGBoost classification algorithm - these were used for six different models to ensure accurate predictions are made. We will accomplish the finest exactness with Bayes cv look which may be a hyper-tuning classifier with 92.87%. This consideration can be utilized for future work by doing the increase and include designing on the dataset. It is constrained to literary information, so it might not continuously be right for foreseeing stroke. so utilize the datasets that contain pictures and work on those datasets. 2024 IEEE. -
Leveraging Ensemble Methods for Accurate Prediction of Customer Spending Scores in Retail
This study primarily aims to estimate consumer spending trends in a retail context. The goal is to identify the best model for predicting Purchasing Scores, which indicate customer loyalty and potential income, using demographic and financial data. The dataset included information about customers' age, gender, and annual income, and the objective was to analyze their Spending Scores. Several regression models were tested, including Linear Regression, Random Forest, Gradient Boosting, K-Nearest Neighbors (KNN), and Lasso Regression. To improve the models, we engineered features like Age Squared, Income per Age, and Spending Score per Income. Each model was trained and tested using 3fold cross-validation. We evaluated their performance with Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2) metrics. The results showed significant differences in model performance. The Random Forest model stood out, with the lowest Mean Absolute Error (MAE) of 0.33, Root Mean Square Error (RMSE) of 0.52, and the highest R-squared (R22) score of 0.9997. Gradient Boosting also performed well, achieving a Mean Absolute Error (MAE) of 1.77, Root Mean Square Error (RMSE) of 2.41, and an Rsquared (R2) score of 0.9930. While Linear Regression showed moderate accuracy, KNN and Lasso Regression had higher errors and lower R2 values, indicating less reliable predictions. The findings suggest that ensemble methods, particularly Random Forest, excel at predicting customer Spending Scores. The high accuracy and reliability of this model point to its potential for customer segmentation and targeted marketing strategies, ultimately enhancing customer relationship management and boosting business value. Further refinement and exploration of additional features could further improve these prediction capabilities. 2024 IEEE. -
Cross-Modal Ingredient Recognition and Recipe Suggestion using Computer Vision and Predictive Modeling
This paper is focused on the development of a novel system known as 'IngredEye.' It involves various approaches that can be grouped into categories, such as computer vision, including YOLOv8, a KNN prediction model, and a Flutter framework that hosts all of them in a mobile application environment. Previous studies have analyzed the application of computer vision and OpenCV recognition in cooking and proved that such approaches could enhance the level of convenience in the culinary field. This paper addresses issues like changes in lighting, occlusions, and other factors that have to be solved by the algorithms envisaged for real applications. The objective of this paper solely relies on integrating the OpenCV object detection method with comprehensive machine learning techniques specialized for the culinary field. Presenting the end-user with recipe recommendations based on the visual input they have given. 2024 IEEE. -
Predicting Player Engagement in Online Gaming: A Machine Learning Approach
The aim of this research is to make precise forecasts on player participation in online game using state-of-the-art machine learning algorithms. Player engagement plays a crucial element in determining the success of online games because it affects player retention, satisfaction and monetization. By understanding and predicting engagement levels, game developers and marketers can enhance the gaming experience and develop strategies to keep players invested. This research involves a comprehensive analysis of player behavior data from an online gaming platform. The dataset includes various demographic and behavioral features such as age, gender, location, game genre, playtime hours, in-game purchases, game difficulty, sessions per week, average session duration, player level, achievements unlocked, and engagement level. The data was preprocessed through handling missing values, normalizing numerical features, and encoding categorical variables. Exploratory Data Analysis (EDA) was conducted to understand the distribution and relationships between different features. Multiple machine learning models were evaluated to predict player engagement levels, including Random Forest, Gradient Boosting, XGBoost, and Support Vector Machine (SVM). These models were then compared through the accuracy, precision, recall, and F1-score metrics. In the comparison, XGBoost emerged as the best model. Since it is the best-performing model, we can make the feature importance analysis to identify the best factors for predicting engagement in the next step. The XGBoost model achieved the highest accuracy of 91%, demonstrating superior precision, recall, and F1-scores across all engagement levels (High, Medium, Low). Ensemble methods like XGBoost, Gradient Boosting, and Random Forest outperformed the SVM model, highlighting their effectiveness in handling complex datasets. 2024 IEEE. -
Comparative Study on GANs and VAEs in Credit Card Fraud Detection
In today's world, the major issue credit card sectors encounter is fraud. This comparative study deals with how GANs and VAEs detect fraudulent transactions. The dataset comprised 284807 transactions, of which 492 were fraudulent. These two models, GANs and VAEs, are trained on this dataset, during which, in the training process, the models are learned to deal with the imbalance in the dataset. VAEs are trained so that fraud transactions are considered anomalies, and only legitimate transactions are passed onto the model for training. Conversely, GANs generate synthetic data of fraud by addressing the problem of data imbalance and passed on to the ML model for classification. We can observe that Both the models have very good AUC-ROC scores of around 96%, which indicates their distinguishing capability between the classes. In all other aspects, GANs outperformed VAEs, which makes GANs a better option for fraud detection. 2024 IEEE. -
Sub-Optimization based Random Forest Algorithm for Accurate and Efficient Land use and Land Cover Classification using Landsat Time Series Data
The land use and land cover (LULC) play an essential role to investigate the impacts of environmental factors and socio-economic development in the Earth's surface. Extracting the hidden information from the remote sensing images in the observed earth environment is the challenging process. In this research, implemented a model that uses Landsat data to investigate the LULC changes. Utilized the Landsat 5,7 and 8 as inputs for the 1985 to 2019 by Google Earth Engine (GEE) is applied for the robust classification. This paper proposed a Sub-forest optimization based Random forest (SO-RF) classifier with faster diagnosis speed for LULC classification. Moreover, to increase the multispectral Landsat band's resolution from 30 m to 15 m, the pan-sharpening algorithm is utilized. In addition, analyzed the various image configurations grounded numerous spectral indices and other supplementary data such as land surface temperature (LST) and digital elevation model (DEM) on final classification accuracy. The proposed SO-RF produced higher accuracy (0.97 for kappa, 96.78% Overall accuracy (OA), 0.94 for f1-score) than Copernicus Global Land Cover Layers (CGLCL) map and state of art methods like K-Nearest Neighbor (KNN), Decision Tree (DT), and Multi-class Support Vector machine (MSVM). 2024 IEEE. -
Depth Wise Separable Convolutional Neural Network with Context Axial Reverse Attention Based Sentiment Analysis on Movie Reviews
Sentiment Analysis (SA) in movie reviews involves using natural language processing techniques to determine the sentiment expressed in reviews. This analysis helps in understanding the overall audience sentiment towards a movie, categorizing reviews as positive, negative, or neutral. It's useful for filmmakers, marketers, and audiences. The existing methods does not provide sufficient accuracy, error rate and complexity was increased. To overcome the aforementioned problem, Depth Wise Separable Convolutional Neural Networks with Context Axial Reverse Attention Network (DWSCNN-CARAN) is proposed for accurately classifying SA in movie reviews. In this input image is taken from two datasets such as IMDB dataset and Polarity dataset. The pre-processing is done using six steps namely, Cleaning, Tokenization, Case Folding, Normalization, Stop Word Elimination, and Stemming for the purpose of removing noises. Following that feature extraction are done using Bag-Of-Words and Term Frequency-Inverse Document Frequency (BOW-TF-IDF). After that classification are done using Depth Wise Separable Convolutional Neural Networks with Context Axial Reverse Attention Network (DWSCNN-CARAN)for classifying the AS in movie reviews. The efficiency of the proposed DWSCNN-CARAN-BOA is analyzed using a dataset and attains 99.94% accuracy, 98.76% recall and attains better results compared with the existing methods. In the future, this approach will use the adversarial instances it generated to conduct adversarial training and assess the potential improvement in classification performance. It also looks into the possibilities of creating adversarial examples at the word and sentence levels by combining structured knowledge from high-quality knowledge bases. 2024 IEEE. -
Improving Groundwater Forecasting Accuracy with a Hybrid ARIMA-XGBoost Approach.
In addressing the critical challenge of accurate groundwater level prediction, this study explores the comparative performance of various machine learning models. We implement a novel hybrid model combining ARIMA and Extreme Gradient Boosting (XGB) for the prediction of groundwater levels, and compare it against traditional models including ARIMA, XGBoost, LightGBM, Random Forest, and Decision Trees. Traditional approaches often rely on single models; however, our research seeks to delve into the intricacies of hybrid model architectures. Combining the strengths of ARIMA and XGB, we aim to build a highly accurate and efficient groundwater level prediction system. Comprehensive evaluations were conducted using metrics such as Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), The future scope of machine learning in water resource management includes integrating such models with real-time monitoring systems and expanding their applications to diverse environmental conditions and regions. 2024 IEEE. -
Seismic Performance Assessment of Reinforced Concrete Frames: Insights from Pushover Analysis
This paper offers a comprehensive exploration of the seismic response of Reinforced Concrete (RC) frames examined through pushover analysis. The frames analyzed are designed as per IS 13920 and IS 456 for different levels of earthquake intensities and different levels of axial loads. Nonlinear analysis techniques have gained prominence in assessing the response of RC frames, especially when subjected to extreme loading events or when accurate predictions of structural behavior are required beyond the linear elastic range. The study aims to delve into the structural behavior of RC frames under seismic influences, employing pushover analysis as the principal analytical tool. With a focus on assessing the effectiveness and reliability of pushover analysis, the research endeavors to elucidate the seismic performance of RC frames while considering their response to different seismic zones and axial loading scenarios. The methodology involves conducting a series of pushover analyses on RC frames using advanced structural analysis software. The results obtained are meticulously analyzed to discern the shear capacities and ultimate displacements of the frames, by investigating the displacement versus shear capacity relationship across varying seismic zones and axial loading scenarios. Through this comprehensive investigation, the paper aims to enhance our understanding of the seismic behavior of RC frames and will provide valuable insights for seismic design. The Author(s), under exclusive license to Springer Nature Switzerland AG 2024. -
An Innovative Method for Brain Stroke Prediction based on Parallel RELM Model
Strokes occur when blood supply to the brain is suddenly cut off or severely impaired. Stroke victims may experience cell death as a result of oxygen and food shortages. The effectiveness of various predictive data mining algorithms in illness prediction has been the subject of numerous studies. The three stages that make up this suggested method are feature selection, model training, and preprocessing. Missing value management, numeric value conversion, imbalanced dataset handling, and data scaling are all components of data preparation. The chi-square and RFE methods are utilized in feature selection. The former assesses feature correlation, while the latter recursively seeks for ever-smaller feature sets to choose features. The whole time the model was being trained, a Parallel RELM was used. This new method outperforms both ELM and RELM, achieving an average accuracy of 95.84%. 2024 IEEE. -
Automated Single Responsibility Principle Enforcement: A Step Toward Reusable and Maintainable Code
In this study, we delve into the sphere of automated code scrutiny, specifically concentrating on compliance with the single responsibility principle (SRP), a key principle in software architecture. The SRP proposes that a class should have a singular reason for modification, thereby enhancing code cohesion and facilitating its maintenance and reusability. The study presents a pioneering system that utilizes a holistic strategy to ascertain SRP compliance within code. This system rigorously inspects code interfaces, the interaction points among various software components. Through this process, we extract critical insights into the codes maintainability and reusability. An optimally designed interface can significantly improve code management and foster its reuse, leading to superior software design efficiency. Beyond interface inspection, our system also explores complexity metrics such as cyclomatic complexity and hassel volume. Cyclomatic complexity offers a numerical indicator of the count of linearly independent paths traversing a programs source code, serving as a measure of code complexity. Hassel volume is an additional metric that can quantify code complexity. Moreover, our system employs code smell detection methodologies to identify instances of high interdependence between classes, often a sign of SRP breaches. High interdependence, or tight coupling, complicates code modification and maintenance. The system integrates the conclusions from these varied analyses to determine SRP compliance. The outcomes of this investigation highlight a hopeful trajectory toward automated SRP detection. This could provide developers with tools that proactively foster the development of well-organized and maintainable code, thereby enhancing software design quality. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024. -
Towards an Improved Model for Stability Score Prediction: Harnessing Machine Learning in National Stability Forecasting
In our increasingly interconnected world, national stability holds immense significance, impacting global economics, politics, and security. This study leverages machine learning to forecast stability scores, essential for understanding the intricate dynamics of country stability. By evaluating various regression models, our research aims to identify the most effective methods for predicting these scores, thus deepening our insight into the determinants of national stability. The field of machine learning has seen remarkable progress, with regression models ranging from conventional Linear Regression (LR) to more complex algorithms like Support Vector Regression (SVR), Random Forest (RF), and Gradient Boosting (GB). Each model has distinct strengths and weaknesses, necessitating a comparative analysis to determine the most suitable model for specific predictive tasks. Our methodology involves a comparative examination of models such as LR, Polynomial Regression (PR), Lasso, Ridge, Elastic Net (ENR), Decision Tree (DT), RF, GB, K-Nearest Neighbors (KNN), and SVR. Performance metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Mean Squared Error (MSE), and R-squared (R2) assess each model's predictive accuracy using a diverse dataset of country stability indicators. This study's comprehensive model comparison adds novelty to predictive analytics literature. Our findings reveal significant variations in the performance of different regression models, with certain models exhibiting exceptional predictive accuracy, as indicated by high R2 values and low error metrics. Notably, models such as LR, SVR, and Elastic Net demonstrate outstanding performance, suggesting their suitability for stability score prediction. 2024 IEEE. -
Unveiling Powerful Machine Learning Strategies for Detecting Malware in Modern Digital Environment
Machine learning has emerged as formidable instrument in realm of malware detection exhibiting capacity to dynamically adapt to ever-shifting topography of digital hazards. This study presents an exhaustive comparative analysis of four intricate machine learning algorithms namely XGBoost Classifier, K-Nearest Neighbors (KNN) Classifier, Binomial Logistic Regression and Random Forest with primary objective of assessing their effectiveness in domain of malware detection. Conventional signature-based detection methodologies have struggled to synchronize with rapid mutations exhibited by malware variants. In sharp contrast machine learning algorithms proffer data-centric approach adept at unraveling intricate data patterns thereby enabling identification of both well-known and hitherto uncharted threats. To meticulously appraise efficacy of these machine learning models we employ stringent set of evaluation metrics. Precision, recall, F1 Score, testing accuracy and training accuracy are meticulously scrutinized to ascertain distinctive strengths and frailties of these algorithms. By providing comparative analysis of machine learning algorithms within milieu of malware detection this research engenders significant contribution to ongoing endeavor of fortifying cybersecurity. Resultant analysis elucidates that each algorithm possesses its unique competencies. XGBoost Classifier showcases remarkable precision (Benign files: 99%, Malicious files: 99%), recall (Benign files: 97%, Malicious files: 99%) and F1 Score (Benign files: 98%, Malicious files: 99%) implying its aptitude for precise malware identification. KNN Classifier excels in discerning benign software exhibiting precision (Benign files: 90%) and recall (Benign files: 91%) to mitigate likelihood of erroneous positives. The Author(s), under exclusive license to Springer Nature Switzerland AG 2024. -
Application of XAI in Integrating Democratic and Servant Leadership to Enhance the Performance of Manufacturing Industries in Ethiopia
This study tests the conceptual model theorizing democratic leadership, servant leadership, learning organization, and performance of manufacturing industries using Structural Equation Modeling (SEM). The impact of democratic and servant leadership on learning organizations and the performance of manufacturing industries in Ethiopia is analyzed, and the role of learning organizations as a mediating variable is examined. Confirmatory Factor Analysis was performed, which includes a well-established Chi-square test, the Chi-square ratio to degrees of freedom, the goodness-of-fit index, the TuckerLewis index, the comparative fit index, the adjusted goodness-of-fit, and the root mean square error of approximation. Further, the performance of manufacturing industries has been assessed using XAI which helps in having a higher clarity on understanding the complexities in production. Based on linear regression, two methods SHAP and LIME have been used for precise predictions and forecast for future production plans in the manufacturing industry. This research contributes to the existing body of knowledge by dissecting the nuanced relationships between the two leadership styles and learning organization and further, their implications for an organizations performance. The findings of the study would provide insights for policymakers and practitioners to improve the performance of manufacturing industries. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024. -
Advancing Image Security Through Deep Learning and Cryptography in Healthcare and Industry
Securing electronic health records (EHRs) in the Internet of Medical Things (IoMT) ecosystem is a key concern in healthcare due to the sector's differed environment. As the evolution of technology continues, ensuring the confidentiality, integrity, and accessibility of EHRs becomes more and more challenging. To enhance the confidentiality of healthcare picture data, this study explores the combined use of deep learning and cryptography methods. Through the utilization of weight analysis for improving encryption strength and the combination of chaotic systems to generate undetectable encryption patterns, it explores how deep neural networks can be modified for use in encryption. It also provides a survey of the present scenario of deep learning-based image detection of anomalies methods in working environments, such as network typologies, supervision levels, and assessment norms. Techniques in cryptography provide an effective means to protect confidential medical picture data while it's being transmitted and stored. Deep learning, on the other hand, has the ability to entirely change cryptography by providing robust encryption, resolution augmentation, and detection capabilities for medical image security. The paper outlines future research approaches to overcome these problems and tackles the opportunities and obstacles in medical image cryptography and industrial picture anomaly detection. Through this work, picture privacy in the healthcare and industrial sectors is advanced, opening the door to enhanced privacy, integrity, and availability of vital image data by overcoming the gap between deep learning and cryptography. 2024 IEEE. -
Alpha-Bit: An Android App for Enhancing Pattern Recognition using CNN and Sequential Deep Learning
This research paper introduces Alpha-Bit, an Android application pioneering Optical Character Recognition (OCR) through cutting-edge deep learning models, including Convolutional Neural Networks (CNNs) and Sequential networks. With a core focus on enhancing educational accessibility and quality, Alpha-Bit specifically targets foundational elements of the English language - alphabets and numbers. Beyond conventional OCR applications, Alpha-Bit distinguishes itself by offering guided instruction and individual progress reports, providing a nuanced and tailored educational experience. Significantly, this work extends beyond technological innovation; Alpha-Bit's potential impact encompasses addressing educational inequalities, contributing to sustainability goals, and advancing the achievement of Sustainable Development Goal 4 (SDG 4). By democratizing education through innovative OCR technologies, Alpha-Bit emerges as a transformative force with the capacity to revolutionize learning experiences, making quality education universally accessible and empowering learners across diverse socio-economic backgrounds. 2024 ITU.