Siamese-Based Architecture for Cross-Lingual Plagiarism Detection in English-Hindi Language Pairs
- Title
- Siamese-Based Architecture for Cross-Lingual Plagiarism Detection in English-Hindi Language Pairs
- Creator
- Agarwal B.; Gupta M.K.; Sharma H.; Poonia R.C.
- Description
- The cross-lingual plagiarism detection (CLPD) is a challenging problem in natural language processing. Cross-lingual plagiarism is when a text is translated from any other language and used as it is without proper acknowledgment. Most of the existing methods provide good results for monolingual plagiarism detection, whereas the performances of existing methods for the CLPD are very limited. The reason for this is that it is difficult to represent the text from two different languages in a common semantic space. In this article, a novel Siamese architecture-based model is proposed to detect the cross-lingual plagiarism in English-Hindi language pairs. The proposed model combines the convolutional neural network (CNN) and bidirectional long short-term memory (Bi-LSTM) network to learn the semantic similarity among the cross-lingual sentences for the English-Hindi language pairs. In the proposed model, the CNN model learns the local context of words, whereas the Bi-LSTM model learns the global context of sentences in forward and backward directions. The performances of the proposed models are evaluated on the benchmark data set, that is, Microsoft paraphrase corpus, which is converted in the English-Hindi language pairs. The proposed model outperforms other models giving 67%, 72%, and 67% weighted average precision, recall, and F1-measure scores. The experimental results show the effectiveness of the proposed models over the baseline models because the proposed model is very efficient in representing the cross-lingual text very efficiently. Copyright 2023, Mary Ann Liebert, Inc., publishers 2023.
- Source
- Big Data, Vol-11, No. 1, pp. 48-58.
- Date
- 2023-01-01
- Publisher
- Mary Ann Liebert Inc.
- Subject
- Bi-LSTM; CNN; cross-lingual plagiarism detection; deep learning; Siamese architecture
- Coverage
- Agarwal B., Department of Computer Science and Engineering, Indian Institute of Information Technology Kota (IIIT Kota), Rajasthan, Jaipur, India; Gupta M.K., Department of Computer Science and Engineering, Swami Keshvanand Institute of Technology, Management and Gramothan, Rajasthan, Jaipur, India; Sharma H., Department of Computer Science and Engineering, Rajasthan Technical University, Rajasthan, Kota, India; Poonia R.C., Department of Computer Science, Christ (Deemed to Be University), Karnataka, Bangalore, India
- Rights
- Restricted Access
- Relation
- ISSN: 21676461; PubMed ID: 36260373
- Format
- Online
- Language
- English
- Type
- Article
Collection
Citation
Agarwal B.; Gupta M.K.; Sharma H.; Poonia R.C., “Siamese-Based Architecture for Cross-Lingual Plagiarism Detection in English-Hindi Language Pairs,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 25, 2025, https://archives.christuniversity.in/items/show/14438.