Iman Anwarzai

Date of Award


First Advisor

Michael Bergman

Second Advisor

Amanda Landi


This thesis aims to report on research I collaborated on with the Financial Instruments Sector Team at Columbia University. This study provides a spatio-temporal analysis of drought patterns in the Amhara region of Ethiopia. In order to identify whether a specific village is experiencing drought, rainfall data can be aggregated over that village using satellite rainfall estimates from the Climate Hazards Group InfraRed Precipitation with Station (CHIRPS) dataset. Since this data is noisy and imprecise, it may be more beneficial to aggregate over a larger area, but not too large, since this would overlook important variations in weather patterns between different places. Unsupervised learning can be used to determine which places have historically had similar precipitation patterns, while finding the optimal scale to cluster over. This study uses the X-means algorithm which builds upon k-means by finding the optimal setting of k over the data. In addition, I use the G-means algorithm which converges when the data is Gaussian relative to their centers. It was found that X-means can result in many clusters when the dimensionality of the data is low, while the number of G-means clusters rapidly increases with dimensionality. Additionally, there exists an optimal value for the number of clusters in G-means, but due to the scoring method used, X-mean is scored as most optimal when there are the most clusters.

Simon's Rock Off-campus Download

Simon's Rock students and employees can log in from off-campus by clicking on the Off-campus Download button and entering their Simon's Rock username and password.