中国媒介生物学及控制杂志 ›› 2024, Vol. 35 ›› Issue (1): 49-55.DOI: 10.11853/j.issn.1003.8280.2024.01.009

• 媒介生物传染病 • 上一篇    下一篇

LASSO回归和SARIMAX模型联合应用对广州市肾综合征出血热发病的预测效果研究

祁娟, 康燕, 陈海燕, 许聪辉, 魏跃红   

  1. 广州市疾病预防控制中心慢性非传染性疾病预防控制部/免疫规划部/寄生虫病与地方病预防控制部, 广东 广州 510440
  • 收稿日期:2023-07-05 出版日期:2024-02-20 发布日期:2024-03-05
  • 通讯作者: 魏跃红,E-mail:wei_yh0928@163.com
  • 作者简介:祁娟,女,主管医师,从事疾病预测预警研究,E-mail:qijuan717@126.com
  • 基金资助:
    广州市卫生健康科技项目(20221A011067)

Predictive performance of LASSO-SARIMAX model for the incidence of hemorrhagic fever with renal syndrome in Guangzhou,China

QI Juan, KANG Yan, CHEN Hai-yan, XU Cong-hui, WEI Yue-hong   

  1. Department of Chronic and Non-communicable Diseases Prevention and Control/Department of Immunization Planning/Department of Parasitic Diseases and Endemic Diseases Prevention and Control, Guangzhou Center for Disease Control and Prevention, Guangzhou, Guangdong 510440, China
  • Received:2023-07-05 Online:2024-02-20 Published:2024-03-05
  • Supported by:
    Guangzhou Health Science and Technology Project (No. 20221A011067)

摘要: 目的 比较3种时间序列模型对肾综合征出血热(HFRS)发病的预测效果,探索最小绝对值收缩与选择算子算法回归(LASSO)联合引入自变量的季节性差分自回归移动平均(SARIMAX)模型对HFRS的预测效果。方法 系统收集2006-2022年广州市HFRS发病数、鼠密度、气象及社会经济学数据,采用指数平滑法、SARIMAX以及通过LASSO-SARIMAX模型进行发病预测,通过自相关函数(ACF)、平均百分比误差(MPE)和平均绝对百分比误差(MAPE)评价模型的预测效果,通过MAPE对比3种模型不同预测时长的预测效果。结果 2006-2022年广州市HFRS年均发病率0.06/10万,指数平滑法(ETS)模型训练集的MAPE为45.066,SARIMA模型训练集的MAPE为51.403,LASSO-SARIMAX模型训练集的MAPE为39.466,除预测24月时低于ETS模型外,LASSO-SARIMAX模型训练数据集、预测12月的MAPE均最低。结论 LASSO回归联合SARIMAX模型在广州市HFRS发病的中短期预测中有较好效果。

关键词: 肾综合征出血热, 预测, 指数平滑法, 季节性差分自回归滑动平均模型, 最小绝对值收缩与选择算子算法回归

Abstract: Objective To compare the performance of three time series models in predicting the incidence of hemorrhagic fever with renal syndrome (HFRS),and to explore the predictive performance of a modified seasonal autoregressive integrated moving average (SARIMAX) model with independent variables introduced from a least absolute shrinkage and selection operator (LASSO) model. Methods The information on HFRS incidence, rodent density, meteorological and socio-economic data in Guangzhou,China from 2006 to 2022 were systematically collected. Exponential smoothing (ETS), SARIMAX, and LASSO-SARIMAX models were constructed to predict the incidence of HFRS. Autocorrelation function (ACF), mean percentage error (MPE), and mean absolute percentage error (MAPE) were used to evaluate the predictive effects of the models. MAPE was used to compare the prediction effects of the three models in different prediction times. Results The mean annual incidence rate of HFRS in Guangzhou from 2006 to 2022 was 0.06/100 000. The MAPE for the training set was 45.066 for the ETS model, 51.403 for the SARIMA model,and 39.466 for the LASSO-SARIMAX model. The LASSO-SARIMAX model had the lowest MAPE in the training data set at a prediction length of 12 months,with a lower MAPE compared with the ETS model at a length of 24 months. Conclusion The LASSO-SARIMAX model shows good performance in predicting the incidence of HFRS in Guangzhou in the short and medium term.

Key words: Hemorrhagic fever with renal syndrome (HFRS), Predict, Exponential smoothing (ETS) method, Seasonal differential autoregressive integrated moving average (SARIMA) model, Least absolute shrinkage and selection operator (LASSO)

中图分类号: