<?xml version="1.0" encoding="UTF-8"?>
<item xmlns="http://omeka.org/schemas/omeka-xml/v5" itemId="15490" public="1" featured="0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://omeka.org/schemas/omeka-xml/v5 http://omeka.org/schemas/omeka-xml/v5/omeka-xml-5-0.xsd" uri="https://archives.christuniversity.in/items/show/15490?output=omeka-xml" accessDate="2026-05-01T19:42:16+00:00">
  <collection collectionId="5">
    <elementSetContainer>
      <elementSet elementSetId="1">
        <name>Dublin Core</name>
        <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
        <elementContainer>
          <element elementId="50">
            <name>Title</name>
            <description>A name given to the resource</description>
            <elementTextContainer>
              <elementText elementTextId="64">
                <text>Articles</text>
              </elementText>
            </elementTextContainer>
          </element>
        </elementContainer>
      </elementSet>
    </elementSetContainer>
  </collection>
  <itemType itemTypeId="19">
    <name>Article</name>
    <description>Faculty Publications -Articles</description>
  </itemType>
  <elementSetContainer>
    <elementSet elementSetId="1">
      <name>Dublin Core</name>
      <description>The Dublin Core metadata element set is common to all Omeka records, including items, files, and collections. For more information see, http://dublincore.org/documents/dces/.</description>
      <elementContainer>
        <element elementId="50">
          <name>Title</name>
          <description>A name given to the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113609">
              <text>A Multi-Modal Approach to Digital Document Stream Segmentation for Title Insurance Domain</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="49">
          <name>Subject</name>
          <description>The topic of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113610">
              <text>BERT; binary classification; multi modal training; Page stream segmentation; title insurance; VGG16</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="41">
          <name>Description</name>
          <description>An account of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113611">
              <text>In the twenty-first century, storing and managing digital documents has become commonplace for all corporate and public sectors around the world. Physical documents are scanned in batches and stored in a digital archive as a heterogeneous document stream, referred to as a digital package. To make Robotic Process Automation (RPA) easier, it's necessary to automatically segment the document stream into a subset of independent, coherent multi-page documents by detecting the appropriate document boundary. It's a common requirement of a TI company's Automated Document Management Systems (ADMS), where business operations are automated using RPA and the goal is to extract information from digital documents with minimal user intervention. The current study proposes, evaluates, and compares a multi-modal binary classification network incorporating text and picture aspects of digital document pages to state-of-the-art baseline methodologies. Image and textual features are extracted simultaneously from the input document image by passing them through Visual Geometry Group 16 - Convolutional Neural Network (VGG16-CNN) and pre-trained Bidirectional Encoder Representations from Transformers (Legal-BERT {}_{base} ) model through transfer learning respectively. Both features are finally fused and passed through a fully connected layer of Multi Layered Perceptron (MLP) to obtain the binary classification of the pages as the First Page (FP) and Other Page (OP). Real-time document image streams from production business process archive were obtained from a reputed Title Insurance (TI) company for the study. The obtained F_{1} score of 97.37% and 97.15% are significantly higher than the accuracies of the considered two baseline models and well above the expected Straight Through Pass (STP) threshold defined by the process admin.   2013 IEEE.</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="39">
          <name>Creator</name>
          <description>An entity primarily responsible for making the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113612">
              <text>Guha A.; Alahmadi A.; Samanta D.; Khan M.Z.; Alahmadi A.H.</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="48">
          <name>Source</name>
          <description>A related resource from which the described resource is derived</description>
          <elementTextContainer>
            <elementText elementTextId="113613">
              <text>IEEE Access, Vol-10, pp. 11341-11353.</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="45">
          <name>Publisher</name>
          <description>An entity responsible for making the resource available</description>
          <elementTextContainer>
            <elementText elementTextId="113614">
              <text>Institute of Electrical and Electronics Engineers Inc.</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="40">
          <name>Date</name>
          <description>A point or period of time associated with an event in the lifecycle of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113615">
              <text>2022-01-01</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="43">
          <name>Identifier</name>
          <description>An unambiguous reference to the resource within a given context</description>
          <elementTextContainer>
            <elementText elementTextId="113616">
              <text>&lt;a href="https://doi.org/10.1109/ACCESS.2022.3144185" target="_blank" rel="noreferrer noopener"&gt;https://doi.org/10.1109/ACCESS.2022.3144185&lt;/a&gt;
&lt;br /&gt;&lt;br /&gt;&lt;a href="https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123371503&amp;amp;doi=10.1109%2FACCESS.2022.3144185&amp;amp;partnerID=40&amp;amp;md5=6998ef3231722b8bc2dc89ce6c6fba1f" target="_blank" rel="noreferrer noopener"&gt;https://www.scopus.com/inward/record.uri?eid=2-s2.0-85123371503&amp;amp;doi=10.1109%2fACCESS.2022.3144185&amp;amp;partnerID=40&amp;amp;md5=6998ef3231722b8bc2dc89ce6c6fba1f&lt;/a&gt;</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="47">
          <name>Rights</name>
          <description>Information about rights held in and over the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113617">
              <text>All Open Access; Gold Open Access</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="46">
          <name>Relation</name>
          <description>A related resource</description>
          <elementTextContainer>
            <elementText elementTextId="113618">
              <text>ISSN: 21693536</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="42">
          <name>Format</name>
          <description>The file format, physical medium, or dimensions of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113619">
              <text>Online</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="44">
          <name>Language</name>
          <description>A language of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113620">
              <text>English</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="51">
          <name>Type</name>
          <description>The nature or genre of the resource</description>
          <elementTextContainer>
            <elementText elementTextId="113621">
              <text>Article</text>
            </elementText>
          </elementTextContainer>
        </element>
        <element elementId="38">
          <name>Coverage</name>
          <description>The spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant</description>
          <elementTextContainer>
            <elementText elementTextId="113622">
              <text>Guha A., Department of Data Science, CHRIST (Deemed to be University), Karnataka, Bengaluru, 560029, India, First American India Private Ltd., Karnataka, Bengaluru, 560038, India; Alahmadi A., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia; Samanta D., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bengaluru, 560029, India; Khan M.Z., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia; Alahmadi A.H., Department of Computer Science and Information, Taibah University, Medina, 42353, Saudi Arabia</text>
            </elementText>
          </elementTextContainer>
        </element>
      </elementContainer>
    </elementSet>
  </elementSetContainer>
</item>
