InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation

Title: InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation
Creator: Chettri, Saiyam; Kerketta, Abhay Charles; Nizar, Banu P. K.; Goyal, Akash
Description: The rise of technology has facilitated access to entertainment media in various formats like audio, images, videos, and memes. This diverse multimedia landscape, however, poses challenges for visually impaired individuals who primarily rely on auditory means and cannot consume the visual content freely available today. InclusiVision addresses this challenge by introducing audio descriptions (AD) generated through advanced technology for images and short videos. These narrated verbal descriptions provide details about visual elements such as people, objects, colors, and settings, making the content more accessible and comprehensible for the visually impaired. To enhance accessibility, InclusiVision offers two essential phases: the Image Description phase, which generates short audio descriptions for images, and the Video Description phase, which employs algorithms to narrate key visual aspects in short videos. Both the image and video captioning generate short captions explaining key points of the visuals. It employs basic encoder-decoder modeling to help achieve the task. Hence, the primary objective of InclusiVision is to promote accessibility and inclusivity in entertainment and educational media by providing contextually relevant audio descriptions. 2026, Bentham Books imprint.
Source: Recent Advancements in Computational Intelligence: Concepts, Methodologies and Applications (Part 2);pp.98-119
Date: 01-01-2026
Publisher: Bentham Science Publishers Ltd
Subject: CNN; Image captioning; LSTM; Video captioning; Visually-impaired
Coverage: Chettri S., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Kerketta A.C., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Nizar B.P.K., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India; Goyal A., Department of Computer Science, CHRIST (Deemed to be University), Karnataka, Bangalore, India
Rights: Restricted Access; Hardcopy may be available in the library
Relation: ISBN: 979-889881288-1; 979-889881289-8;
Format: online
Language: English
Type: Book chapter
Identifier: https://doi.org/10.2174/9798898812881126010007

https://www.scopus.com/pages/publications/105033315352?origin=resultslist

Collection

Citation

Chettri, Saiyam; Kerketta, Abhay Charles; Nizar, Banu P. K.; Goyal, Akash, “InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation,” CHRIST (Deemed To Be University) Institutional Repository, accessed June 18, 2026, https://archives.christuniversity.in/items/show/24494.

InclusiVision: Exploring Deep Learning Techniques for Enhanced Audio Description Generation

Collection

Citation

Output Formats