原创,转载注明出处!
本教程目的:
提供一个ARMA方法预测时间序列的demo,可直接运行,为初学者提供一个直观的认识。通过本教程你可以学会:
1、时间序列建模基本步骤
2、时间序列相关画图操作
3、对时间序列预测有一个感性的认识
4、ARMA预测是dynamic参数的影响通过本教程你还不能掌握:
1、ARMA模型详细优化
2、ARMA模型预测的内部实现细节
话不多说,直接上代码
导入数据
from __future__ import print_function
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
# 数据
dta=[-612,
277,
377,
-3266,
-950,
2336,
1459,
-829,
1191,
238,
-2965,
-1764,
-85,
5312,
3,
-1342,
1733,
-4106,
-1461,
3186,
-92,
411,
-650,
118,
-2676,
-469,
3051,
1122,
-329,
247,
866,
-2548,
-1414,
3125,
371,
274,
533,
-175,
-2332,
-1388,
3060,
1369,
676,
-806,
522,
-2199,
-2511,
3901,
-36,
920,
-1108,
2175,
-2333,
-1105,
3029,
-31,
2305,
1302,
2761,
-4775,
-3201,
7769,
-1214,
1817,
-5271,
971,
-2446,
-3705,
3329,
229,
1952,
-2434,
1130,
-3521,
-503,
5004,
-2211,
2046,
521,
-363,
-2723,
-2609,
4091,
1314,
1050,
-574,
-585,
-3632,
-659,
]
dta=pd.Series(dta)
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('2002','2090'))
dta.plot(figsize=(12,8))
plt.show()
画ACF(自相关函数)和PACF(偏自相关函数)
这两个函数用来经验性的确定ARMA(p, q)中的p和q
# plot acf and pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
sm.graphics.tsa.plot_acf(dta.values,lags=40,ax=ax1) # lags 表示滞后的阶数
ax2 = fig.add_subplot(212)
sm.graphics.tsa.plot_pacf(dta.values,lags=40,ax=ax2)
plt.show()
建模
# 模型1
arma_mod70 = sm.tsa.ARMA(dta, (7, 0)).fit(disp=False)
print(arma_mod70.aic, arma_mod70.bic, arma_mod70.hqic)
# 模型2
arma_mod01 = sm.tsa.ARMA(dta, (0, 1)).fit(disp=False)
print(arma_mod01.aic, arma_mod01.bic, arma_mod01.hqic)
# 模型3
arma_mod71 = sm.tsa.ARMA(dta, (7, 1)).fit(disp=False)
print(arma_mod71.aic, arma_mod71.bic, arma_mod71.hqic)
# 模型4
arma_mod80 = sm.tsa.ARMA(dta, (8, 0)).fit(disp=False)
print(arma_mod80.aic, arma_mod80.bic, arma_mod80.hqic)
# 模型5(只是为了示范,p和q随便取的)
arma_mod32 = sm.tsa.ARMA(dta, (3, 2)).fit(disp=False)
print(arma_mod32.aic, arma_mod32.bic, arma_mod32.hqic)
# 最佳模型可以取aic最小的一个
输出为:
1579.7025547891606 1602.10028211675 1588.7304359212178
1632.3203733166176 1639.786282425814 1635.3296670273035
1581.0916055987627 1605.9779692960842 1591.122584634382
1581.3957835886288 1606.2821472859503 1591.426762624248
模型检验
# 模型检验
# 在指数平滑模型下,观察ARIMA模型的残差是否是平均值为0且方差为常数的正态分布(服从零均值、方差不变的正态分布),
# 同时也要观察连续残差是否(自)相关。
# 对ARMA(7,0)模型所产生的残差做自相关图
resid = arma_mod70.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
plt.show()
观察残差是否符合正态分布
# 观察残差是否符合正态分布
# 这里使用QQ图,它用于直观验证一组数据是否来自某个分布,或者验证某两组数据是否来自同一(族)分布。
# 在教学和软件中常用的是检验数据是否来自于正态分布。
resid = arma_mod70.resid # 残差
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
plt.show()
# Ljung-Box检验
# 略
模型预测-arma_mod70模型画图
模型确定之后,就可以开始进行预测了。
包含训练数据一起预测,dynamic=True:
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=True, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
print(arma_mod70.summary())
ARMA Model Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: ARMA(7, 0) Log Likelihood -780.851
Method: css-mle S.D. of innovations 1524.852
Date: Tue, 27 Aug 2019 AIC 1579.703
Time: 16:10:20 BIC 1602.100
Sample: 12-31-2002 HQIC 1588.730
- 12-31-2090
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 42.5366 72.269 0.589 0.558 -99.108 184.181
ar.L1.y -0.2975 0.093 -3.213 0.002 -0.479 -0.116
ar.L2.y -0.3757 0.094 -3.976 0.000 -0.561 -0.191
ar.L3.y -0.2901 0.100 -2.889 0.005 -0.487 -0.093
ar.L4.y -0.3143 0.099 -3.169 0.002 -0.509 -0.120
ar.L5.y -0.2210 0.101 -2.192 0.031 -0.419 -0.023
ar.L6.y -0.2347 0.096 -2.451 0.016 -0.422 -0.047
ar.L7.y 0.4670 0.093 5.024 0.000 0.285 0.649
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -0.9414 -0.4833j 1.0583 -0.4245
AR.2 -0.9414 +0.4833j 1.0583 0.4245
AR.3 -0.2297 -1.0284j 1.0537 -0.2850
AR.4 -0.2297 +1.0284j 1.0537 0.2850
AR.5 0.6358 -0.8308j 1.0461 -0.1460
AR.6 0.6358 +0.8308j 1.0461 0.1460
AR.7 1.5733 -0.0000j 1.5733 -0.0000
-----------------------------------------------------------------------------
包含训练数据一起预测,dynamic=False:
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=False, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
注意:
dynamic
参数只影响in-sample(可以理解为训练数据,或内插)的预测,不影响out-of-sample(可以理解为测试数据,或外插)的预测。如果预测的数据全部为out-of-sample类型,则dynamic
参数不会影响预测结果。如果预测数据含有in-sample类型(如上面两图,预测是从2009开始的),则dynamic
参数影响预测结果。
在 arima_model.py
中的解释为
"""
...
dynamic : bool, optional
The `dynamic` keyword affects in-sample prediction. If dynamic
is False, then the in-sample lagged values are used for
prediction. If `dynamic` is True, then in-sample forecasts are
used in place of lagged dependent variables. The first forecasted
value is `start`.
...
"""
另外从上面两图可以看出,当时间向后推移时,预测的值将趋于收敛(一个常数),所以ARMA或ARIMA等方法只适用于短时间的预测,不适用于长时间的预测!
arma_mod32模型画图
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=True, ax=ax, plot_insample=False)
plt.show()
print(arma_mod32.summary())
ARMA Model Results
==============================================================================
Dep. Variable: y No. Observations: 89
Model: ARMA(3, 2) Log Likelihood -804.123
Method: css-mle S.D. of innovations 2016.787
Date: Tue, 27 Aug 2019 AIC 1622.245
Time: 15:02:58 BIC 1639.666
Sample: 12-31-2002 HQIC 1629.267
- 12-31-2090
==============================================================================
coef std err z P>|z| [0.025 0.975]
------------------------------------------------------------------------------
const 44.5054 46.003 0.967 0.336 -45.660 134.670
ar.L1.y -0.3571 0.162 -2.209 0.030 -0.674 -0.040
ar.L2.y 0.2441 0.181 1.349 0.181 -0.111 0.599
ar.L3.y -0.0450 0.135 -0.334 0.739 -0.309 0.219
ma.L1.y 0.0228 0.130 0.175 0.862 -0.233 0.278
ma.L2.y -0.8108 0.109 -7.439 0.000 -1.024 -0.597
Roots
=============================================================================
Real Imaginary Modulus Frequency
-----------------------------------------------------------------------------
AR.1 -1.3198 -0.0000j 1.3198 -0.5000
AR.2 3.3714 -2.3381j 4.1028 -0.0965
AR.3 3.3714 +2.3381j 4.1028 0.0965
MA.1 -1.0966 +0.0000j 1.0966 0.5000
MA.2 1.1247 +0.0000j 1.1247 0.0000
-----------------------------------------------------------------------------
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2002', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
完整代码如下
from __future__ import print_function
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.graphics.api import qqplot
# 数据
dta=[-612,
277,
377,
-3266,
-950,
2336,
1459,
-829,
1191,
238,
-2965,
-1764,
-85,
5312,
3,
-1342,
1733,
-4106,
-1461,
3186,
-92,
411,
-650,
118,
-2676,
-469,
3051,
1122,
-329,
247,
866,
-2548,
-1414,
3125,
371,
274,
533,
-175,
-2332,
-1388,
3060,
1369,
676,
-806,
522,
-2199,
-2511,
3901,
-36,
920,
-1108,
2175,
-2333,
-1105,
3029,
-31,
2305,
1302,
2761,
-4775,
-3201,
7769,
-1214,
1817,
-5271,
971,
-2446,
-3705,
3329,
229,
1952,
-2434,
1130,
-3521,
-503,
5004,
-2211,
2046,
521,
-363,
-2723,
-2609,
4091,
1314,
1050,
-574,
-585,
-3632,
-659,
]
dta=pd.Series(dta)
dta.index = pd.Index(sm.tsa.datetools.dates_from_range('2002','2090'))
dta.plot(figsize=(12,8))
plt.show()
# plot acf and pacf
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
sm.graphics.tsa.plot_acf(dta.values,lags=40,ax=ax1) # lags 表示滞后的阶数
ax2 = fig.add_subplot(212)
sm.graphics.tsa.plot_pacf(dta.values,lags=40,ax=ax2)
plt.show()
# 模型1
arma_mod70 = sm.tsa.ARMA(dta, (7, 0)).fit(disp=False)
print(arma_mod70.aic, arma_mod70.bic, arma_mod70.hqic)
# 模型2
arma_mod01 = sm.tsa.ARMA(dta, (0, 1)).fit(disp=False)
print(arma_mod01.aic, arma_mod01.bic, arma_mod01.hqic)
# 模型3
arma_mod71 = sm.tsa.ARMA(dta, (7, 1)).fit(disp=False)
print(arma_mod71.aic, arma_mod71.bic, arma_mod71.hqic)
# 模型4
arma_mod80 = sm.tsa.ARMA(dta, (8, 0)).fit(disp=False)
print(arma_mod80.aic, arma_mod80.bic, arma_mod80.hqic)
# 模型5(只是为了示范,p和q随便取的)
arma_mod32 = sm.tsa.ARMA(dta, (3, 2)).fit(disp=False)
print(arma_mod32.aic, arma_mod32.bic, arma_mod32.hqic)
# 最佳模型可以取aic最小的一个
# 模型检验
# 在指数平滑模型下,观察ARIMA模型的残差是否是平均值为0且方差为常数的正态分布(服从零均值、方差不变的正态分布),
# 同时也要观察连续残差是否(自)相关。
# 对ARMA(7,0)模型所产生的残差做自相关图
resid = arma_mod70.resid
fig = plt.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
fig = sm.graphics.tsa.plot_acf(resid.values.squeeze(), lags=40, ax=ax1)
ax2 = fig.add_subplot(212)
fig = sm.graphics.tsa.plot_pacf(resid, lags=40, ax=ax2)
plt.show()
# 观察残差是否符合正态分布
# 这里使用QQ图,它用于直观验证一组数据是否来自某个分布,或者验证某两组数据是否来自同一(族)分布。
# 在教学和软件中常用的是检验数据是否来自于正态分布。
resid = arma_mod70.resid # 残差
fig = plt.figure(figsize=(12,8))
ax = fig.add_subplot(111)
fig = qqplot(resid, line='q', ax=ax, fit=True)
plt.show()
# Ljung-Box检验
# 略
# 模型预测
# 模型确定之后,就可以开始进行预测了,我们对未来十年的数据进行预测。
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=True, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
print(arma_mod70.summary())
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod70.plot_predict('2009', '2250', dynamic=False, ax=ax, plot_insample=False, alpha=0.05)
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=True, ax=ax, plot_insample=False)
print(arma_mod32.summary())
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2091', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
fig, ax = plt.subplots(figsize=(12, 8))
ax = dta.loc['2001':].plot(ax=ax)
fig = arma_mod32.plot_predict('2002', '2100', dynamic=False, ax=ax, plot_insample=False)
plt.show()
欢迎留言、指正、吐槽。。。