Date of Award
2025
First Advisor
Prof. Myo Thida
Second Advisor
Prof. Daniel H. Neilson
Abstract
This thesis investigates the development of a sentiment analysis model for Dari-language news content using both machine (ML) and deep learning (DL) methodologies. Dari, despite being spoken by over 50 million people across Afghanistan, Iran, and Pakistan, remains a low-resource language in the field of artificial intelligence. The goals of this project are twofold: first, to build a reliable, manually labeled dataset for future research; and second, to develop an AI-based sentiment analysis model specifically tailored to Dari-language news articles. Titles and content from 161,000 news articles across 152 categories were collected from five major Afghan media outlets and manually annotated by a team of seven Kabul University students using a 10-point sentiment scoring system.
Following thorough preprocessing, the data was used to train and evaluate four machine learning models (Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine) and two deep learning architectures (Convolution Neural Networks and Recurrent Neural Networks with LSTM layers). The experimental results indicate that the SVM model outperformed all others, achieving 78% accuracy, suggesting its strong potential for further research, optimization, and deployment in low-resource language sentiment analysis.
Recommended Citation
Srosh, Sayed, "Dari Language News Sentiment Analysis Using Machine Learning and Deep Learning" (2025). Senior Theses. 1703.
https://digitalcommons.bard.edu/sr-theses/1703
Simon's Rock students and employees can log in from off-campus by clicking on the Off-campus Download button and entering their Simon's Rock username and password.