Forecasting Air Pollution with Weather Data

Aug 11, 2024

Forecasting PM 2.5 Pollution Using Weather Data

Introduction

  • Presenter: Karthik
  • Partner: Jack
  • Focus: Forecasting PM 2.5 pollution using weather data

Background

  • Air pollution is a significant global issue:
    • 7 million deaths globally attributed to air pollution (World Bank estimate)
    • Costs the global economy $225 billion annually
  • PM 2.5 defined as particulate matter with a diameter of less than 2.5 microns
  • Need for data-driven solutions to combat air pollution
  • A well-trained algorithm can provide robust predictions and granular data

Data Collection

  • Meteorological parameters linked to PM 2.5 levels:
    • Role in dispersion and dilution
  • Data sets collected:
    • Weather data: Relative humidity, temperature, air pressure, wind speed, wind run, precipitation
    • PM 2.5 data
  • Locations:
    • Training sites: Sebastopol, San Rafael, Santa Cruz
    • Test sites: Oakland, Richmond, Napa
    • Test sites significantly separated from training sites to assess model efficacy

Methods and Results

  • Model architecture: LSTM (Long Short-Term Memory)
    • A recurrent neural network suitable for sequential data like air pollution
  • Loss function: Mean Squared Error (MSE)
    • Penalizes wrong predictions to train faster
  • Experiment:
    • Saved four versions of the model after tuning hyperparameters and structure
  • Evaluation metric: Root Mean Squared Error (RMSE)

Version Analysis

  • Version 2:
    • Oakland predictions showed a trend but were biased (predicted line shifted from true values)
  • Need for model improvement:
    • Either add more data or deepen model architecture
  • Version 4:
    • Deepened model architecture, reducing bias
    • Applied final model to other test sites with similar results

Future Work

  • Explore capabilities of using more data to expand prediction area
  • Include other features such as emission source data
  • Develop a prediction map for better visualization and analysis

Conclusion

  • Emphasis on the importance of data-driven approaches in tackling air pollution
  • Acknowledgment of potential improvements and next steps for the model
  • Thank you for your attention!