File Validation intheData Ingestion Process Using Apache NiFi
- Title
- File Validation intheData Ingestion Process Using Apache NiFi
- Creator
- Irfan M.; Gangadhar A.; George J.
- Description
- In the industries of today, development and maintenance of data pipelines is of paramount importance. With large volumes of data being generated across industries on a continuous basis, there is a growing need to process and store this ingested data in a fast, and efficient manner. Apache NiFi is one such tool which possesses crucial capabilities that can be used to enhance, modify, and automate data pipelines. However, automation of the ingestion process creates certain inherent issues which, without being resolved, tend to be detrimental to the entire ingestion process. These issues vary in nature, ranging from corrupted data to changes in the file schema, to name a few. In this paper, a solution to this problem is proposed. By exploiting Apache NiFis custom processor development capabilities, problem-specific processors can be designed and deployed which can ensure accurate validation of the ingestion process on a real-time basis. To demonstrate this, two processors were developed as a proof-of-concept, which tackle specific file-related validation issues in the ingestion processthat of the file size, and, the ingestion frequency. These custom-built processors are designed to be inserted into the pipeline at key points to ensure that the ingested data is validated against certain standards and requirements. Having successfully demonstrated its capabilities, the paper presents the exploitation of Apache NiFis custom processor capabilities as a potential way forward to resolve the plethora of ingestion issues in industry, today. The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
- Source
- Lecture Notes in Networks and Systems, Vol-922 LNNS, pp. 299-310.
- Date
- 2024-01-01
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Subject
- Apache NiFi; Custom processor; Data ingestion; File validation; Frequency validation
- Coverage
- Irfan M., CHRIST (Deemed to be University), Bangalore, 560029, India; Gangadhar A., Binghamton University, State University of New York, Binghamton, 13902, NY, United States; George J., CHRIST (Deemed to be University), Bangalore, 560029, India
- Rights
- Restricted Access
- Relation
- ISSN: 23673370; ISBN: 978-981970974-8
- Format
- Online
- Language
- English
- Type
- Conference paper
Collection
Citation
Irfan M.; Gangadhar A.; George J., “File Validation intheData Ingestion Process Using Apache NiFi,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 24, 2025, https://archives.christuniversity.in/items/show/19424.