Date of Submission
Spring 2016
Academic Programs and Concentrations
Computer Science; Philosophy
Project Advisor 1
Sven Anderson
Abstract/Artist's Statement
Senior Project submitted to The Division of Science, Mathematics and Computing of Bard College.
Abstract
Emotion recognition in speech using deep learning begins as a problem of translating raw auditory data into an informationally rich feature set that can be trained on by a neural network and, ideally, result in a machine learning system capable of accurately classifying the paralinguistic content of speech. We performed feature extraction using Praat, a tool for phonetic analysis, and obtained a variety of harmonic, intensity, and spectral characteristics that together formed the basis for the training vectors in our machine learning system.
While a number of different machine learning approaches have proved successful, there has been a strong resurgence in the application of so-called ‘deep learning’ systems to machine learning problems due to the striking degree of success that has been achieved with them using modern hardware [8]. After empirically validating a network architecture and learning parameters we trained six neural networks for the problem of emotion classification. We used six independent networks, each with the same network architecture and learning parameters, because this allowed us to obtain empirical data about the relative efficacy of different training features. Our six training sets represent the pairwise extraction of different subsets of features from a larger pool of features ranging from spectral and energy characteristics to periodicity. This allowed us to paint a more general picture of which facets of human speech are most relevant to and indicative of emotionality in speech.
Access Agreement
On-Campus only
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Recommended Citation
Segal, Eli Ridley, "Paralinguistic Speech Recognition: Classifying Emotion in Speech with Deep Learning Neural Networks" (2016). Senior Projects Spring 2016. 363.
https://digitalcommons.bard.edu/senproj_s2016/363
This work is protected by a Creative Commons license. Any use not permitted under that license is prohibited.
Bard Off-campus DownloadBard College faculty, staff, and students can login from off-campus by clicking on the Off-campus Download button and entering their Bard username and password.