Content area

Abstract

Speech impediments affect verbal and nonverbal communication, leading individuals to rely on sign language and alternative methods. However, non-signers struggle to communicate due to a lack of sign language knowledge. Recent advancements in deep learning and computer vision have improved gesture recognition, enabling the development of innovative solutions for sign language translation. This project proposes a computer vision-based deep learning application that translates sign language gestures into text, enhancing communication between signers and non-signers. It uses video sequences to extract spatial and temporal information, employing a Convolutional Neural Network (CNN) for depth and point data processing, along with a Gated Recurrent Unit (GRU) for improved temporal feature extraction. Temporal tokenization further refines feature representation, ensuring efficient resource utilization. The system is trained on the Word-Level American Sign Language (WLASL) dataset, the largest publicly available ASL dataset, containing over 2,000 words signed by more than 100 individuals. The model accurately recognizes 20 gestures with 94% accuracy. The final implementation is a web application that delivers real-time text translation, fostering seamless communication between signers and non-signers and addressing accessibility challenges for individuals with speech impairments.

Full text

Turn on search term navigation

Copyright Kohat University of Science and Technology (KUST) 2025