Air Quality Index Prediction Using Machine Learning Algorithms on the Beijing PM2.5 Dataset

Authors

  • Tuti Susilawati Universitas Mahakarya Asia_Universitas Mahakarya Asia Author
  • Bustomi Bustomi Institut Pertanian Bogor Author

Keywords:

PM2.5, Air Quality Index, Machine Learning, LSTM, Beijing.

Abstract

Accurate prediction of Air Quality Index (AQI) is critical for mitigating public health risks associated with urban air pollution. This study presents an empirical analysis of PM2.5 concentration forecasting in Beijing using advanced machine learning algorithms, integrating high-resolution atmospheric data and meteorological variables. A multi-stage pipeline was implemented, including data preprocessing, feature selection, and model training with Random Forest, Gradient Boosting, Support Vector Regression, and Long Short-Term Memory (LSTM) networks. Predictive performance was evaluated using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R², while spatial and temporal fidelity was assessed across districts, diurnal cycles, and seasonal periods. LSTM models consistently achieved superior accuracy, capturing short-term pollution spikes, seasonal variability, and spatial heterogeneity, whereas ensemble methods provided stable baseline predictions with moderate sensitivity to extreme events. Sensitivity analysis identified wind speed, humidity, and neighboring PM2.5 measurements as key predictors. The results demonstrate that integrating recurrent neural networks with ensemble approaches enables reliable, operationally relevant AQI forecasts, offering both theoretical validation of sequential modeling for urban air quality and practical guidance for environmental monitoring, public health interventions, and city-level policy implementation.

 

Downloads

Download data is not yet available.

References

Alrashidi, H., Sibai, F. N., Abonamah, A., Alrashidi, M., & Alsaber, A. (2025). PM2. 5: Air Quality Index Prediction Using Machine Learning: Evidence from Kuwait’s Air Quality Monitoring Stations. Sustainability, 17(20), 9136.

Aram, S. A., Nketiah, E. A., Saalidong, B. M., Wang, H., Afitiri, A. R., Akoto, A. B., & Lartey, P. O. (2024). Machine learning-based prediction of air quality index and air quality grade: a comparative analysis. International Journal of Environmental Science and Technology, 21(2), 1345-1360.

Ban, W., & Shen, L. (2022). PM2. 5 prediction based on the CEEMDAN algorithm and a machine learning hybrid model. Sustainability, 14(23), 16128.

Guo, Z., Jing, X., Ling, Y., Yang, Y., Jing, N., Yuan, R., & Liu, Y. (2024). Optimized air quality management based on air quality index prediction and air pollutants identification in representative cities in China. Scientific Reports, 14(1), 17923.

Hanafi, S., & Niam, M. I. F. (2025). Declining Arctic Cryosphere Extent: A Two-Decade Assessment Using Sentinel-1 SAR and NASA NSIDC Climate Indicators. Journal of Nature, Plants, and Animals Studies, 1(2), 27-34.

Karimian, H., Li, Q., Wu, C., Qi, Y., Mo, Y., Chen, G., ... & Sachdeva, S. (2019). Evaluation of different machine learning approaches to forecasting PM2. 5 mass concentrations. Aerosol and Air Quality Research, 19(6), 1400-1410.

Li, X., & Zhang, X. (2019). Predicting ground-level PM2. 5 concentrations in the Beijing-Tianjin-Hebei region: A hybrid remote sensing and machine learning approach. Environmental pollution, 249, 735-749.

Liu, B., Yan, S., Li, J., Li, Y., Lang, J., & Qu, G. (2021). A spatiotemporal recurrent neural network for prediction of atmospheric PM2. 5: A case study of Beijing. IEEE Transactions on Computational Social Systems, 8(3), 578-588.

Liu, C., Pan, G., Song, D., & Wei, H. (2023). Air quality index forecasting via genetic algorithm-based improved extreme learning machine. IEEE Access, 11, 67086-67097.

Ma, X., Chen, T., Ge, R., Cui, C., Xu, F., & Lv, Q. (2022). Time series-based PM2. 5 concentration prediction in Jing-Jin-Ji area using machine learning algorithm models. Heliyon, 8(9).

Ma, X., Chen, T., Ge, R., Xv, F., Cui, C., & Li, J. (2023). Prediction of PM2. 5 concentration using spatiotemporal data with machine learning models. Atmosphere, 14(10), 1517.

Makhdoomi, A., Sarkhosh, M., & Ziaei, S. (2025). PM2. 5 concentration prediction using machine learning algorithms: an approach to virtual monitoring stations. Scientific Reports, 15(1), 8076.

Maltare, N. N., & Vahora, S. (2023). Air Quality Index prediction using machine learning for Ahmedabad city. Digital Chemical Engineering, 7, 100093.

Maltare, N. N., & Vahora, S. (2023). Air Quality Index prediction using machine learning for Ahmedabad city. Digital Chemical Engineering, 7, 100093.

Méndez, M., Merayo, M. G., & Núñez, M. (2023). Machine learning algorithms to forecast air quality: a survey. Artificial intelligence review, 56(9), 10031-10066.

Mohammadi, F., Teiri, H., Hajizadeh, Y., Abdolahnejad, A., & Ebrahimi, A. (2024). Prediction of atmospheric PM2. 5 level by machine learning techniques in Isfahan, Iran. Scientific Reports, 14(1), 2109.

Pak, U., Ma, J., Ryu, U., Ryom, K., Juhyok, U., Pak, K., & Pak, C. (2020). Deep learning-based PM2. 5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Science of the Total Environment, 699, 133561.

Qu, Y., Qian, X., Song, H. Q., He, J., Li, J. H., & Xiu, H. (2019). Machine-learning-based model and simulation analysis of PM2. 5 concentration prediction in Beijing. Chinese Journal of Engineering, 41(3), 401-407.

Shi, X., Li, B., Gao, X., Yabo, S. D., Wang, K., Qi, H., ... & Zhang, W. (2024). An Evaluation of the Influence of Meteorological Factors and a Pollutant Emission Inventory on PM2. 5 Prediction in the Beijing–Tianjin–Hebei Region Based on a Deep Learning Method. Environments, 11(6), 107.

Vu, T. V., Shi, Z., Cheng, J., Zhang, Q., He, K., Wang, S., & Harrison, R. M. (2019). Assessing the impact of clean air action on air quality trends in Beijing using a machine learning technique. Atmospheric Chemistry and Physics, 19(17), 11303-11314.

Wang, Z., Chen, P., Wang, R., An, Z., & Qiu, L. (2023). Estimation of PM2. 5 concentrations with high spatiotemporal resolution in Beijing using the ERA5 dataset and machine learning models. Advances in Space Research, 71(8), 3150-3165.

Wei, Q., Zhang, H., Yang, J., Niu, B., & Xu, Z. (2025). PM2. 5 concentration prediction using a whale optimization algorithm based hybrid deep learning model in Beijing, China. Environmental Pollution, 371, 125953.

Xiang, X., Fahad, S., Han, M. S., Naeem, M. R., & Room, S. (2023). Air quality index prediction via multi-task machine learning technique: spatial analysis for human capital and intensive air quality monitoring stations. Air Quality, Atmosphere & Health, 16(1), 85-97.

Xiao, F., Yang, M., Fan, H., Fan, G., & Al-Qaness, M. A. (2020). An improved deep learning model for predicting daily PM2. 5 concentration. Scientific reports, 10(1), 20988.

Xu, X. (2020). Forecasting air pollution PM2. 5 in Beijing using weather data and multiple kernel learning. Journal of Forecasting, 39(2), 117-125.

Yang, J., Yan, R., Nong, M., Liao, J., Li, F., & Sun, W. (2021). PM2. 5 concentrations forecasting in Beijing through deep learning with different inputs, model structures and forecast time. Atmospheric Pollution Research, 12(9), 101168.

Zhang, B., Duan, M., Sun, Y., Lyu, Y., Hou, Y., & Tan, T. (2023). Air quality index prediction in six major Chinese urban agglomerations: A comparative study of single machine learning model, ensemble model, and hybrid model. Atmosphere, 14(10), 1478.

Zhang, B., Zhang, Y., Zhang, K., Zhang, Y., Ji, Y., Zhu, B., ... & Ge, X. (2023). Machine learning assesses drivers of PM2. 5 air pollution trend in the Tibetan Plateau from 2015 to 2022. Science of the Total Environment, 878, 163189.

Downloads

Published

2026-03-03

How to Cite

Air Quality Index Prediction Using Machine Learning Algorithms on the Beijing PM2.5 Dataset. (2026). Technema: Journal of Intelligent Engineering and Computing, 1(1), 20-29. https://sovereignresearch.org/technema/article/view/33