Transcript for:
Forecasting Air Pollution with Weather Data

hello everyone uh i'm karthik and my partner for this project is jack and we are going to walk you through our project which is forecasting bm 2.5 pollution using weather data so the problem which we're dealing with has grown in significant scale over the previous few years as the world bank estimates about 7 million deaths globally attributed to air pollution and it's estimated to cost the global economy about 225 billion dollars annually so most of these uh air pollution related health hazards are a consequence of pm 2.5 which includes all particulate matter which is lesser than the diameter of 2.5 microns so data driven solutions are inevitable for dealing with the problem of the scale and a really well trained algorithm which makes robust predictions can be cost effective and also provide granular data which can be a game changer in the battle against air pollution so uh we found in our background research that meteorological parameters are closely linked with bm 2.5 levels because of their role in dispersion and dilution so we downloaded the data sets of the weather data and the bm 2.5 data and our input features in the weather data set included relative humidity and temperature air pressure wind speed wind run and precipitation so as you can see from the map we had three locations namely sebastopol san rafael and santa cruz as a train and the test sites were oakland richmond and napa so we picked the test sites to be significantly separated from the train sites to see the efficacy of our algorithm thanks karthik so now i'll talk about the methods and the results so for our model we went with a lstm architecture which is a version of a recurrent recurrent neural network for its ability to process sequential data such as air pollution and for the loss function we went with a mean squared error just to penalize the the the wrong predictions greatly and train faster so for our experiment we saved four versions of our model after tuning hyper parameters and the structure of the of the the model so you can see here for each version which features uh or sorry which type of parameters and structure structures were updated and looking at the results we have the root mean squared error as our evaluation metric so after version two the oakland prediction we we saw that the red line which is the predicted versus the true line which is the green there was uh it seemed to be picking up on the general trend of the data but it didn't quite match up with the the true value points and it looked like there was a bias problem as if the the red line was just shifted up from the true values so from our lecture we knew that we needed to either add data or deepen the model architecture to address this bias problem and you can see in version 4 after making the model deeper we were able to remove some of the bias and then we applied the the same version 4 final model to other test sites and received similar results so what's next for this model we think that there's great capability for using more data to expand uh the prediction area and then also include other features for for predicting such as the emission source data and then develop some type of prediction map so thank you so much