Preprocessing Big Data using Partitioning Method for Efficient Analysis
- Title
- Preprocessing Big Data using Partitioning Method for Efficient Analysis
- Creator
- Reena M.J.
- Description
- Big data collection is the process of gathering unprocessed and unstructured data from disparate sources. As data deluge, the large volume of data collected and integrated consist missing values, outliers, and redundant records. This makes the big dataset insignificant for processing and mining knowledge. Also, it unnecessarily consumes large amount of valuable storage for storing redundant data and meaningless data. The result obtained after applying mining techniques in this insignificant data lead to wrong inferences. This makes it inevitable to preprocess data in order to store and process big dataset effectively and draw correct inferences. When data is preprocessed before analytics the storage consumption is less and computation and communication complexity is reduced. The analytics result is of high quality and the needed time for processing is considerably reduced. Preprocessing data is inevitable for applying any analytics algorithm to obtain valuable pattern. The quality of knowledge mined from large volume of big data depends on the quality of input data used for processing. The major steps in big data preprocessing include data integration from disparate sources, missing value imputation, outlier detection and treatment, and handling redundant data. The process of integration includes steps such as extraction, transformation, and loading. The data extraction step gathers useful data used for analytics and the transformation process organize the collected data in structured format suitable for analytics. The role of load process is to store transformed data into secured storage so that data can be obtained and processed effectively in future. This work provides preprocessing techniques for big data that deals with missing values and outliers and results in obtaining quality data partitions. 2023 IEEE.
- Source
- Proceedings of IEEE InC4 2023 - 2023 IEEE International Conference on Contemporary Computing and Communications
- Date
- 2023-01-01
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Subject
- Data Analytics; Partitioning; Preprocessing; Smart Data
- Coverage
- Reena M.J., CHRIST (Deemed to Be University), Department of Computer Science and Engineering, Bangalore, India
- Rights
- Restricted Access
- Relation
- ISBN: 979-835033577-4
- Format
- Online
- Language
- English
- Type
- Conference paper
Collection
Citation
Reena M.J., “Preprocessing Big Data using Partitioning Method for Efficient Analysis,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 25, 2025, https://archives.christuniversity.in/items/show/19801.