InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation
- Title
- InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation
- Creator
- Chettri, Saiyam; Kerketta, Abhay Charles; Nizar, Banu P. K.; Goyal, Akash
- Description
- The rise of technology has facilitated access to entertainment media in various formats like audio, images, videos, and memes. This diverse multimedia landscape, however, poses challenges for visually impaired individuals who primarily rely on auditory means and cannot consume the visual content freely available today. InclusiVision addresses this challenge by introducing audio descriptions (AD) generated through advanced technology for images and short videos. These narrated verbal descriptions provide details about visual elements such as people, objects, colors, and settings, making the content more accessible and comprehensible for the visually impaired. To enhance accessibility, InclusiVision offers two essential phases: the Image Description phase, which generates short audio descriptions for images, and the Video Description phase, which employs algorithms to narrate key visual aspects in short videos. Both the image and video captioning generate short captions explaining key points of the visuals. It employs basic encoder-decoder modeling to help achieve the task. Hence, the primary objective of InclusiVision is to promote accessibility and inclusivity in entertainment and educational media by providing contextually relevant audio descriptions. 2026, Bentham Books imprint.
- Source
- Recent Advancements in Computational Intelligence: Concepts, Methodologies and Applications (Part 2);pp.98-119
- Date
- 01-01-2026
- Publisher
- Bentham Science Publishers Ltd
- Subject
- CNN; Image captioning; LSTM; Video captioning; Visually-impaired
- Coverage
- Chettri S., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Kerketta A.C., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Nizar B.P.K., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Goyal A., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India
- Rights
- Restricted Access; Hardcopy may be available in the library
- Relation
- ISBN: 979-889881288-1; 979-889881289-8;
- Format
- online
- Language
- English
- Type
- Book chapter
Collection
Citation
Chettri, Saiyam; Kerketta, Abhay Charles; Nizar, Banu P. K.; Goyal, Akash, “InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/24494.
