Date of Submission
Academic Programs and Concentrations
Computer Science; Philosophy
Project Advisor 1
Senior Project submitted to The Division of Science, Mathematics and Computing of Bard College.
Emotion recognition in speech using deep learning begins as a problem of translating raw auditory data into an informationally rich feature set that can be trained on by a neural network and, ideally, result in a machine learning system capable of accurately classifying the paralinguistic content of speech. We performed feature extraction using Praat, a tool for phonetic analysis, and obtained a variety of harmonic, intensity, and spectral characteristics that together formed the basis for the training vectors in our machine learning system.
While a number of different machine learning approaches have proved successful, there has been a strong resurgence in the application of so-called ‘deep learning’ systems to machine learning problems due to the striking degree of success that has been achieved with them using modern hardware . After empirically validating a network architecture and learning parameters we trained six neural networks for the problem of emotion classification. We used six independent networks, each with the same network architecture and learning parameters, because this allowed us to obtain empirical data about the relative efficacy of different training features. Our six training sets represent the pairwise extraction of different subsets of features from a larger pool of features ranging from spectral and energy characteristics to periodicity. This allowed us to paint a more general picture of which facets of human speech are most relevant to and indicative of emotionality in speech.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Segal, Eli Ridley, "Paralinguistic Speech Recognition: Classifying Emotion in Speech with Deep Learning Neural Networks" (2016). Senior Projects Spring 2016. 363.