Enhancing Regional Language Proficiency in Large Language Models Through Translated Datasets
- Title
- Enhancing Regional Language Proficiency in Large Language Models Through Translated Datasets
- Creator
- Tiwari, Satyam; Kujur, Lawrence; Shanbhog, Manjula
- Description
- Although Large Language Models (LLMs) have made significant progress in Natural Language Processing the lack of high-quality training data frequently limits their ability to perform well in regional languages. To improve LLM competency this study methodically translates an English dataset into the low-resource language of Bhojpuri. On this new dataset we apply a structured translation methodology and then refine an LLM that has already been trained. The models capacity to produce contextually relevant and culturally appropriate responses in Bhojpuri has significantly improved according to a comparison of its performance before and after fine-tuning. Our findings show that this translation-centric approach provides a practical and affordable way to enhance the usefulness and inclusivity of LLMs increasing the effectiveness and accessibility of these potent AI tools for underrepresented linguistic groups globally. For linguistic groups that are marginalized globally. The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
- Source
- Lecture Notes in Networks and Systems;Volume;1741 LNNS;pp.332-347
- Date
- 01-01-2026
- Publisher
- Springer Science and Business Media Deutschland GmbH
- Subject
- Dataset Translation; Fine-tuning; Large Language Models; Machine Translation; Natural Language Processing; Regional Languages
- Coverage
- Tiwari S., Christ (Deemed to Be University), School of Science, NCR, Delhi, India; Kujur L., Christ (Deemed to Be University), School of Science, NCR, Delhi, India; Shanbhog M., Christ (Deemed to Be University), School of Science, NCR, Delhi, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISSN: 23673370; ISBN: 978-303212826-3;
- Format
- online
- Language
- English
- Type
- Conference paper
Collection
Citation
Tiwari, Satyam; Kujur, Lawrence; Shanbhog, Manjula, “Enhancing Regional Language Proficiency in Large Language Models Through Translated Datasets,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 17, 2026, https://archives.christuniversity.in/items/show/25378.
