Assessing the drivers of PM 2.5 concentration: A case of Kampala district.
Abstract
Air pollution is considered to be the World’s largest environmental threat. Air pollution is the fourth leading risk factor causing cardiovascular, Metabolic and respiratory diseases such as Lung cancer, asthma, heart disease, diabetes, pneumonia, Chronic Obstructive Pulmonary Disease (COPD). According to WHO, 4.2 million deaths occur every year due to ambient air pollution while 3.8 million deaths occur annually due to household exposure. Kampala ranks 26th as the world’s most polluted City and in Africa being the 5th most polluted city. Uganda has the some of the worst air quality in Africa with 95% of the households using charcoal and firewood for cooking. Furthermore, many vehicles are way over 15 years old. Vehicle traffic is considered to be one of the main sources of urban pollution. Recent studies have used low-cost Air quality sensors but have mainly dealt with only measuring the pollution concentration but not identifying the pollution sources and the relative influence of these sources on the overall pollution exposure. This study aimed at assessing the drivers or predictors of PM2.5 concentration using Land Use Regression with Machine Learning using the Random Forest Algorithm. Air quality data from 26 AirQo Stations located with in Kampala was obtained for the entire year of 2020. Monthly averaged PM2.5 data was used to build a regression model using the Random Forest Algorithm. 10-fold Cross validation was used to evaluate the resultant model. The annual average of PM2.5 in 2020 was found to be 36.1 µg/m3. The R2, RMSE and MAE of the Random Forest model were 0.541, 10.286 and 7.288 respectively. The most significant spatial predictors or drivers of PM2.5 were identified to be Latitude, Relative Humidity, Precipitation, Soil Temperature and Elevation. The model performs averagely and there is still more room for improvement. Kampala is experiencing worse PM2.5 pollution levels. The inclusion of more predictor variables like traffic data can boost model performance in future studies.