Miniature air quality sensors are widely used in urban grid-based monitoring due to their flexibility in deployment and low cost. However, the raw data collected by these devices often suffer from low accuracy caused by environmental interference and sensor drift, highlighting the need for effective calibration methods to improve data reliability. This study proposes a data correction method based on Bayesian Optimization Support Vector Regression (BO-SVR), which combines the nonlinear modeling capability of Support Vector Regression (SVR) with the efficient global hyperparameter search of Bayesian Optimization. By introducing cross-validation loss as the optimization objective and using Gaussian process modeling with an Expected Improvement acquisition strategy, the approach automatically determines optimal hyperparameters for accurate pollutant concentration prediction. Experiments on real-world micro-sensor datasets demonstrate that BO-SVR outperforms traditional SVR, grid search SVR, and random forest (RF) models across multiple pollutants, including PM2.5, PM10, CO, NO2, SO2, and O3. The proposed method achieves lower prediction residuals, higher fitting accuracy, and better generalization, offering an efficient and practical solution for enhancing the quality of micro-sensor air monitoring data.
Castell N, Dauge FR, Schneider P, et al., 2017, Can Commercial Low-Cost Sensor Platforms Contribute to Air Quality Monitoring and Exposure Estimates? Environment International, 99: 293–302.
Zimmerman N, Presto AA, Kumar SPN, et al., 2018, A Machine Learning Calibration Model Using Random Forests to Improve Sensor Performance for Lower-Cost Air Quality Monitoring. Atmospheric Measurement Techniques, 11(1): 291–313.
Jiao W, Hagler G, Williams R, et al., 2016, Community Air Sensor Network (CAIRSENSE) Project: Evaluation of Low-Cost Sensor Performance in a Suburban Environment. Atmospheric Measurement Techniques, 9: 5281–5292.
Liu W, Zhang Z, Zhou S, 2021, A Review of Low-Cost Sensor Data Calibration for Air Quality Monitoring. Journal of Instrumentation, 42(11): 161–169.
Shahriari B, Swersky K, Wang Z, et al., 2016, Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proceedings of the IEEE, 104(1): 148–175.
Snoek J, Larochelle H, Adams RP, 2012, Practical Bayesian Optimization of Machine Learning Algorithms. Advances in Neural Information Processing Systems, 25: 2951–2959.
Smola AJ, Scholkopf B, 2004, A Tutorial on Support Vector Regression. Statistics and Computing, 14(3): 199–222.
Duan KB, Keerthi SS, 2005, Which Is the Best Multiclass SVM Method? An Empirical Study. Pattern Recognition, 38(12): 2097–2109.
Zhang Y, Ding Y, Hao H, et al., 2019, Air Quality Forecasting Using a SVR Model Based on Hybrid Optimization Algorithm. Atmospheric Pollution Research, 10(5): 1529–1539.
Liu Z, Liu Y, Yu M, et al., 2022, Improved Support Vector Regression with Multiple Kernel Learning for PM2.5 Prediction. Science of The Total Environment, 802: 149801.
Li H, Zhou T, Zhang J, 2022, Research on Air Quality Prediction Based on Improved Support Vector Regression Model. Environmental Science and Technology, 45(5): 132–139.
Zhang X, Han Z, Cui J, et al., 2020, Air Quality Prediction Model Based on Wavelet Analysis and Support Vector Machine. Systems Engineering and Electronic Technology, 42(2): 410–417.
Wei P, Yang Z, Wang Y, et al., 2021, A Hybrid Calibration Approach for Low-Cost PM2.5 Sensors Using Multiple Machine Learning Models. Environmental Research, 200: 111355.
Guo H, Zhang X, Wang Y, et al., 2020, Calibration of Low-Cost Air Quality Sensors Using Transfer Learning and Ensemble Learning. Sensors, 20(12): 3502.
Wang H, Hu Q, Liu C, 2021, PM2.5 Concentration Prediction Based on LSTM-SVR Combination Model. Environmental Science and Management, 46(1): 112–116.