Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource Articles Article Faculty Publications -Articles Dublin Core The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/. Title A name given to the resource A Multi-Modal Approach to Digital Document Stream Segmentation for Title Insurance Domain Subject The topic of the resource BERT; binary classification; multi modal training; Page stream segmentation; title insurance; VGG16 Description An account of the resource In the twenty-first century, storing and managing digital documents has become commonplace for all corporate and public sectors around the world. Physical documents are scanned in batches and stored in a digital archive as a heterogeneous document stream, referred to as a digital package. To make Robotic Process Automation (RPA) easier, it's necessary to automatically segment the document stream into a subset of independent, coherent multi-page documents by detecting the appropriate document boundary. It's a common requirement of a TI company's Automated Document Management Systems (ADMS), where business operations are automated using RPA and the goal is to extract information from digital documents with minimal user intervention. The current study proposes, evaluates, and compares a multi-modal binary classification network incorporating text and picture aspects of digital document pages to state-of-the-art baseline methodologies. Image and textual features are extracted simultaneously from the input document image by passing them through Visual Geometry Group 16 - Convolutional Neural Network (VGG16-CNN) and pre-trained Bidirectional Encoder Representations from Transformers (Legal-BERT {}_{base} ) model through transfer learning respectively. Both features are finally fused and passed through a fully connected layer of Multi Layered Perceptron (MLP) to obtain the binary classification of the pages as the First Page (FP) and Other Page (OP). Real-time document image streams from production business process archive were obtained from a reputed Title Insurance (TI) company for the study. The obtained F_{1} score of 97.37% and 97.15% are significantly higher than the accuracies of the considered two baseline models and well above the expected Straight Through Pass (STP) threshold defined by the process admin. 2013 IEEE. Creator An entity primarily responsible for making the resource Guha A.; Alahmadi A.; Samanta D.; Khan M.Z.; Alahmadi A.H. Source A related resource from which the described resource is derived IEEE Access, Vol-10, pp. 11341-11353. Publisher An entity responsible for making the resource available Institute of Electrical and Electronics Engineers Inc. Date A point or period of time associated with an event in the lifecycle of the resource 2022-01-01 Identifier An unambiguous reference to the resource within a given context <a href="https://doi.org/10.1109/ACCESS.2022.3144185" target="_blank" rel="noreferrer noopener">https://doi.org/10.1109/ACCESS.2022.3144185</a> <br /><br /><a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123371503&doi=10.1109%2FACCESS.2022.3144185&partnerID=40&md5=6998ef3231722b8bc2dc89ce6c6fba1f" target="_blank" rel="noreferrer noopener">https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123371503&doi=10.1109%2fACCESS.2022.3144185&partnerID=40&md5=6998ef3231722b8bc2dc89ce6c6fba1f</a> Rights Information about rights held in and over the resource All Open Access; Gold Open Access Relation A related resource ISSN: 21693536 Format The file format, physical medium, or dimensions of the resource Online Language A language of the resource English Type The nature or genre of the resource Article Coverage The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant Guha A., Department of Data Science, CHRIST (Deemed to be University), Karnataka, Bengaluru, 560029, India, First American India Private Ltd., Karnataka, Bengaluru, 560038, India; Alahmadi A., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia; Samanta D., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bengaluru, 560029, India; Khan M.Z., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia; Alahmadi A.H., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia