The medical sector has advanced in an imposing way, and are coming up with lifesaving models and wearable devices for disease predictions and patient monitoring. The prediction models and wearable devices will lead to immense amount of data collection leading to the dimensionality issues, overfitting and inaccurate results. From the pool of data that we use for our prediction model, we should be able to identify the required information and parameters which gives a positive contribution to the decision making model. Every dataset with higher number of parameters and high dimensionality will tend to the problems of overfitting. Here, we have a dataset of demented and non-demented patients with five conventional features and other physical parameters. Along with these parameters, we are adding three new prediction parameters like glyhb, BMI and Cholesterol, for proving the association of Diabetics and Dementia. After the addition of these parameters, the dataset will have thirty parameters, and dimensionality reduction is done to avoid the condition of overfitting. The work uses Principal Component Analysis(PCA)for reducing the dimensionality, t-SNE for visualization and K means clustering is used to cluster the target variable. The cluster mean of each variable is used to understand the performance of each variable in each cluster. Later, a basic feature ranking method is also implemented which can be further used for the prediction model. The performance metric used in this research work is Silhouette score, Inertia and Inter-Cluster Distance map. The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
The rise of online platforms has led to a growing trend of people expressing their thoughts and emotions in their native languages. Movies have been a predominant topic of discussion on online platforms where people reflect on various aspects of movies. Aspect-based Sentiment Analysis (ABSA), a computational technique, assists in examining the sentiments hidden in these discussions. Two challenges arise when attempting to use ABSA to identify sentiments in movie reviews written in the Indian regional language Tamil; the former being the unavailability of potential Tamil movie review datasets and the latter being the difficulty that arises due to the agglutinative nature of Tamil Language. This work addresses the first challenge by curating an annotated movie review dataset in Tamil, MADTRAS (Dataset for Aspect-based Sentiment Analysis of Movie Reviews in Tamil). The quality of the dataset is ensured through content and annotation evaluation. To prove the efficiency of the dataset, the multilingual BERT (mBERT) was used, and the performance was compared with other Deep Learning(DL) models. 2025 The Authors