Author

Sayed Srosh

Date of Award

2025

First Advisor

Prof. Myo Thida

Second Advisor

Prof. Daniel H. Neilson

Abstract

This thesis investigates the development of a sentiment analysis model for Dari-language news content using both machine (ML) and deep learning (DL) methodologies. Dari, despite being spoken by over 50 million people across Afghanistan, Iran, and Pakistan, remains a low-resource language in the field of artificial intelligence. The goals of this project are twofold: first, to build a reliable, manually labeled dataset for future research; and second, to develop an AI-based sentiment analysis model specifically tailored to Dari-language news articles. Titles and content from 161,000 news articles across 152 categories were collected from five major Afghan media outlets and manually annotated by a team of seven Kabul University students using a 10-point sentiment scoring system.

Following thorough preprocessing, the data was used to train and evaluate four machine learning models (Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine) and two deep learning architectures (Convolution Neural Networks and Recurrent Neural Networks with LSTM layers). The experimental results indicate that the SVM model outperformed all others, achieving 78% accuracy, suggesting its strong potential for further research, optimization, and deployment in low-resource language sentiment analysis.

Simon's Rock Off-campus Download

Simon's Rock students and employees can log in from off-campus by clicking on the Off-campus Download button and entering their Simon's Rock username and password.

Share

COinS