Date of Submission

Spring 2022

Academic Program

Computer Science

Project Advisor 1

Sven Anderson

Abstract/Artist's Statement

While great strides have been made with natural language processing (NLP) techniques in the last few decades, there has been a notable lack of research into utilizing NLP for the genre of fiction. This project seeks to address this gap by considering the use of NLP techniques for the summarization of European fairy tales. This subgenre of fiction is an appropriate starting point for investigation due to its archetypal characters and relatively simple story arcs. My approach is to extract the main characters of texts, along with key descriptors in the form of modifying adjectives and verbal actions the characters take part in. Through this method, I suggest how we may parse characters into Proppian archetypes by tracking their probabilistic association with certain linguistic occurrences. This classification schema in turn makes possible the broader classification of fairy tales into types. The model has an overall F1 score of 0.77, the individual parts having F1 scores of 0.89, 0.75, and 0.66 for character retrieval, adjective extraction, and verb extraction, respectively. This project may also be extended further, laying key groundwork for further automatization of categorization of characters and ultimately stories themselves.

Open Access Agreement

Open Access

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

This work is protected by a Creative Commons license. Any use not permitted under that license is prohibited.