It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Sign language serves as a crucial mode of communication for individuals with hearing impairments, and its principles extend to hand gesture systems used in various technological applications, including machine operation and virtual reality. This dissertation presents the development and evaluation of a static American Sign Language (ASL) recognition system using Convolutional Neural Networks (CNNs), also known as ConvNets. The accuracy of such a system relies heavily on the availability of labeled training samples. This study addresses this challenge by proposing and investigating an effective method for generating labeled images for training ASL recognition models.
Initially, CNN models were trained and tested using a benchmark dataset to establish baseline performance metrics. Subsequently, an approach utilizing MediaPipe hand-tracking technology was developed to generate labeled samples, and the impact of incorporating these samples alongside benchmark data on model performance was evaluated. Furthermore, the study explored the potential benefits of applying existing data augmentation methods and Non-Local Means (NLM) denoising algorithms to enhance the combined dataset.
This dissertation makes three main contributions to the literature: (1) It identifies a CNN model that performs better than extant models for ASL recognition. (2) It demonstrates how labeled samples may be generated to train the model using MediaPipe. (3) It identifies a data augmentation method that further improves the performance of the model. The top-performing CNN model achieved a perfect accuracy of 100% when trained on the base dataset. When trained on the enhanced dataset, it maintained a high accuracy of 99.92%.