Challenges in preprocessing and normalization of heterogenous data
- Title
- Challenges in preprocessing and normalization of heterogenous data
- Creator
- Nanjundan, Preethi; Singh, Supreet; George, Jossy; Eslamian, Saeid
- Description
- In todays information-driven landscape, the exposure to information from a wide range of sources, such as social media, financial transactions, healthcare records, and Internet of Things (IoT) sensors. This variety, while providing valuable insights, also brings significant issues. Heterogeneous information, which varies in formats, structures, and scales, requires careful management to ensure it can be effectively used in analytics and machine learning. Key steps in this process include prehandling and standardization. Prehandling involves cleaning and preparing the information by tackling issues like missing values, identifying and eliminating noise (inaccurate or irrelevant information), and addressing inconsistencies. Normalization, in contrast, converts the information into a uniform format and scale, facilitating easier comparison and analysis. However, managing diverse information effectively comes with several issues. Missing information is a frequent problem, and accurately filling in these gaps can be complicated, especially for complex information types. Noise and inconsistencies can greatly affect the accuracy and reliability of any analysis that follows. Additionally, merging information from various sources with differing formats and structures can be a challenging and time-consuming task. This chapter explores the specific issues faced when prehandling and normalizing different information types, including numerical, categorical, textual, and image information. Real-world examples from India, such as the Aadhaar information base and IoT-enabled smart cities, highlight the practical implications of these issues. By grasping best practices and emerging AI-driven trends, organizations can improve information reliability and enhance decision-making. 2026 Elsevier Inc. All rights reserved.
- Source
- Multimodal Learning Using Heterogeneous Data;pp.75-88
- Date
- 01-01-2025
- Publisher
- Elsevier
- Subject
- data normalization techniques; Data preprocessing; data quality challenges; feature scaling; heterogeneous data integration; multisource data handling
- Coverage
- Nanjundan P., Department of Commerce, CHRIST University, Karnataka, Bengaluru, India; Singh S., Department of Data Science, CHRIST University, Lavasa Campus, Maharashtra, Pune, India; George J., Department of Computer Science, CHRIST University, Karnataka, Bengaluru, India; Eslamian S., Department of Water Science and Engineering, College of Agriculture, Isfahan University of Technology, Isfahan, Iran
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISBN: 978-044327528-9; 978-044327529-6;
- Format
- online
- Language
- English
- Type
- Book chapter
Collection
Citation
Nanjundan, Preethi; Singh, Supreet; George, Jossy; Eslamian, Saeid, “Challenges in preprocessing and normalization of heterogenous data,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 17, 2026, https://archives.christuniversity.in/items/show/24208.
