Vivek Kumar

Date of Award


First Advisor

Harold Hastings

Second Advisor

Michael Bergman


People with speech impediments continue to feel the e ects of their inability to use speech technologies such as dialogue systems, transcription services, and virtual assistants. The first step towards improving these technologies is establishing better paradigms for identifying and removing speech impediments in audio files. In this study, we explore various machine learning and neural network-based approaches for automatic stutter recognition and correction. Unlike previous studies, we focus on using a larger initial data set and augment our data with \real world" background noise to test model robustness and performance in everyday settings. We tested a variety of basic machine learning models, all of which received test detection accuracies of above 69%. The best accuracy we achieved was 72% with a neural network. Moreover, after introducing real-world noise to our best classifier, we found that the accuracy dropped to about 65%, indicating the need for more \real world" focused training/testing approaches. We then ran various frame size optimization techniques using our detection model, and found the optimal frame size for analysis to be 500ms. From this point, we were able to implement a full stutter removal algorithm and succesfully remove stutters from an audio clip. Our corrections reduced human-recognized stutters by 80-87.5% and increased digital assistant translation by 30%, a significant improvement.


Ask at the Alumni Library circulation desk for the companion piece that accompanies this thesis.

Simon's Rock Off-campus Download

Simon's Rock students and employees can log in from off-campus by clicking on the Off-campus Download button and entering their Simon's Rock username and password.