中国媒介生物学及控制杂志

• 论著 • 上一篇    下一篇

基于支持向量机的肾综合征出血热疫情预测

黄德生1;沈铁峰2;吴伟3; 关鹏3; 周宝森3   

  1. 1中国医科大学基础医学院数学教研室 沈阳110001;2辽宁省葫芦岛市疾病预防控制中心; 3中国医科大学公共卫生学院流行病学教研室
  • 出版日期:2008-12-20 发布日期:2008-12-20

The prediction of hemorrhagic fever with renal syndrome based on support vector machine

HUANG De-sheng; SHEN Tie-feng; WU Wei; GUAN Peng; ZHOU Bao-sen   

  1. Department of Mathematics, College of Basic Medical Sciences, China Medical University, Shenyang 110001, China
  • Online:2008-12-20 Published:2008-12-20

摘要: 目的 探讨支持向量机(SVM)在肾综合征出血热(HFRS)发病率预测上的优势及应用前景。方法 首先,利用辽宁省葫芦岛市1984-2006的气象资料(包括平均气压、平均气温、平均降雨量、相对湿度、日照时数、日照百分率)和动物疫情资料(包括鼠密度和鼠带病毒率)共8个指标作为解释变量,所有变量均进行归一化到[0,1]区间,将整个数据集分成训练集和检验集,从数据集中随机抽取1/3个体(舍入取整)组成检验集,其余样本作为训练集。其次,利用软件R2.60构造HFRS发病率预测的SVM模型,获得误差平方和。最后,与基于反馈(BP)和径向基函数(RBF)神经网络模型的预测结果进行比较。结果 对于训练集,SVM拟合的误差平方和的-x±s为(0.031±0.009),而BP和RBF神经网络拟合的误差平方和的x-±s分别为(0.074±0.030)和(0.082±0.018);对于检验集,SVM预测的误差平方和的x-±s为(0.067±0.021),而BP和RBF神经网络预测的误差平方和的x-±s分别为(0.073±0.022)和(0.089±0.036)。结论 SVM作为近年来在统计学理论的基础上发展起来的一种新的模式识别方法,在解决小样本、非线性及高维模式识别问题中具有较高的预测精度和较强的泛化能力。该模型对于发病率的预测是可靠的,可以作为HFRS疫情预测的参考方法。

关键词: 支持向量机, 肾综合征出血热, 预测

Abstract: Objective To study the superiority and application prospect of support vector machine(SVM) on the forecast of the incidence of hemorrhagic fever with renal syndrome(HFRS).Methods Firstly,the routine meteorological data of Huludao city including average air pressure,average temperature,relative humidity,precipitation and sunshine time and the epidemiologic information of animal disease including rodent density and rodents borne virus from 1984 to 2006 were used as predictable variables.All the variables were limited to the range from 0 to 1.The whole data atlas were separated into training atlas and test atlas.The test atlas were made up of 1/3 individuals(trunc) randomly sampled from data atlas,and other samples were composed of training atlas.Secondly,SVM was applied to the HFRS incidence prediction and the SVM model was constructed by software R2.60.Finally,the performance of SVM,back-propagation(BP) and radial basis function(RBF) Neural Networks were compared by computing the sum square error(SSE).The above procedures were repeated for 10 replications.Results The mean and standard diviation of SSE of SVM for training atlas was(0.031±0.009),while those of BP and RBF neural network were(0.074±0.030) and(0.082±0.018),respectively.For the test atlas,the mean and standard diviation of SSE of SVM was(0.067±0.021),while those of BP and RBF neural network were(0.073±0.022) and(0.089±0.036),respectively.Conclusion As a new pattern recognition method developed on the basis of statistics theory in recent years,SVM had higher forecast precision and stronger generalization ability to solve the small sample size and the indentification of nonlinear and high-dimension model,SVM was reliable for the prediction of HFRS incidence,which could serve as a reference method for the HFRS prediction.