IIRM: Intelligent Information Retrieval Model for Structured Documents by One-Shot Training Using Computer Vision
- Title
- IIRM: Intelligent Information Retrieval Model for Structured Documents by One-Shot Training Using Computer Vision
- Creator
- Guha A.; Samanta D.; Islam S.H.
- Description
- Various information retrieval algorithms have matured in recent years to facilitate data extraction from structured (with a predefined template) digital document images, primarily to manage and automate different organizations invoice and bill reimbursement processes. The algorithms are designated either rule-based or machine-learning-based. Both approaches have respective advantages and disadvantages. The rule-based algorithms struggle to generalize and need periodic adjustments, whereas machine learning-based supervised approaches need extensive data for training and substantial time and effort for manual annotation. The proposed system attempts to address both problems by providing a one-shot training approach using image processing, template matching, and optical character recognition. The model is extensible for any structured documents such as closing disclosure, bill, tax receipt, besides invoices. The model is validated against six different structured document types obtained from a reputed title insurance (TI) company. The comprehensive analysis of the experimental results confirms entity-wise extraction accuracy between 73.91 and 100% and straight through pass 81.81%, which is within business acceptable precision for a live environment. Out of total 32 tested entities, 17 outperformed all state-of-the-art techniques, where max accuracy has been 93 % with only invoices or sales receipts. The system has been set operational to assist the robotic process automation of the TI mentioned above based on the experimental results. 2022, King Fahd University of Petroleum & Minerals.
- Source
- Arabian Journal for Science and Engineering, Vol-48, No. 2, pp. 1285-1301.
- Date
- 2023-01-01
- Publisher
- Institute for Ionics
- Subject
- Best match region; Digital image processing; Information extraction; One-shot training; Structured document; Template matching; Title insurance
- Coverage
- Guha A., Department of Computer Science, Christ (Deemed to be) University, Karnataka, Bangalore, 560029, India, First American India Private Limited, Karnataka, Bangalore, 560038, India; Samanta D., Department of Computer Science, Christ (Deemed to be) University, Karnataka, Bangalore, 560029, India; Islam S.H., Department of Computer Science and Engineering, Indian Institute of Information Technology Kalyani, West Bengal, Kalyani, 741235, India
- Rights
- Restricted Access
- Relation
- ISSN: 2193567X
- Format
- Online
- Language
- English
- Type
- Article
Collection
Citation
Guha A.; Samanta D.; Islam S.H., “IIRM: Intelligent Information Retrieval Model for Structured Documents by One-Shot Training Using Computer Vision,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 26, 2025, https://archives.christuniversity.in/items/show/14397.