背景:
某品牌销售额随时间呈现周期性、有规律的变化。为了预测未来销售额,需要建立时间序列模型。模型预测结果可为门店设立业绩考核基准、为配货提供参考、为营销节点的选取提供参考。当通过营销活动提高业绩后,基于无营销事件预测的模型结果也可作为基础业绩,更准确地衡量营销活动的实际效果。
数据:
某连续五年的销售额数据,以天为单位,有门店维度的分类。
趋势:
初步假定销售额显示出增长趋势(trend)、季节性(seasonality)和变化幅度差异(variance in variance)的特性。
上图为十家门店分别的销售额,可以看到变化规律与总销售额趋势一致,且各门店间统一。所以时间序列模型可以按总销售额来建模,模型可应用于门店层级。
建模:
- 数据转换
由于SARIMA模型要求平稳性数据序列,所以需要对数据进行转换。
dat$month <- format(dat$date, "%Y-%m")
month <- dat %>% group_by(month) %>% summarise(sales = sum(sales)) %>% ungroup
month <- ts(month$sales, start = c(2013, 1), frequency = 12)
#生成月度销售额变化折线图
plot(month, main='Monthly Total Sales', ylab='Sales ($)', col='blue', type = 'o')
#去幅度差异
plot(log(month), main='Log-transormed monthly sales', ylab='', col='blue',type = 'o')
#去除总体趋势
plot(diff(log(month)), main='Differenced Log-transorm of monthly sales', ylab='', col='blue', type = 'o')
#去除季节性
plot(diff(diff(log(month)),12), main='Differenced Log-transorm of monthly sales without seasonaliy', ylab='', col='blue', type = 'o')
#存储数据
data <- diff(diff(log(month), 12))
从上图步骤可以看出增长趋势(trend)、季节性(seasonality)和变化幅度差异(variance in variance)都是存在的。
- 选取参数
观察差分后数据的acf和pacf结果。
acf(diff(log(month)), 48)
pacf(diff(log(month)),48)
ACF在5、6、7、9有显著结果,PACF在5和9有显著结果,经验上讲该情况可测试p和q为0或1的情况,严谨起见测试p≤9,q≤9。同时PACF表现出季节性,延续1个显著周期后截尾,测试P=1或0,Q=1或0。结合图像拟合多个模型,通过模型的AIC BIC值以及残差分析结果来选择合适的模型。
d=1
DD=1
per=12
for(p in 1:10){
for(q in 1:10){
for(i in 1:2){
for(j in 1:2){
if(p+d+q+i+DD+j<=10){
model<-arima(x = log(month), order = c((p-1),d,(q-1)), seasonal = list(order=c((i-1),DD,(j-1)), period=per))
pval<-Box.test(model$residuals, lag=log(length(model$residuals)))
sse<-sum(model$residuals^2)
cat(p-1,d,q-1,i-1,DD,j-1,per, 'AIC=', model$aic, ' SSE=',sse,' p-VALUE=', pval$p.value,'\n')
}
}
}
}
}
根据结果,(0,1,1,1,1,1)和(1,1,0,1,1,1) AIC和SSE值最低,且p值都不显著,都可接受。由于(0,1,1,1,1,1)的AIC更低,所以选择该模型。
模型结果及预测:
最终模型:SARIMA(0,1,1,1,1,1), period = 12
> model <- arima(x = log(month), order = c(1, 1, 0), seasonal = list(order = c(1, 1, 1), period = 12))
> summary(model)
Call:
arima(x = log(month), order = c(1, 1, 0), seasonal = list(order = c(1, 1, 1),
period = 12))
Coefficients:
ar1 sar1 sma1
-0.4300 -0.9978 0.8933
s.e. 0.1419 0.0242 0.5515
sigma^2 estimated as 0.0001707: log likelihood = 124.57, aic = -241.14
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set -0.002019815 0.01309709 0.009138861 -0.01531318 0.06755693 0.08382672 -0.03892862
#使用模型进行预测
> forecast(model)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 2018 13.43510 13.41698 13.45321 13.40740 13.46279
Feb 2018 13.45151 13.43077 13.47226 13.41979 13.48324
Mar 2018 13.71681 13.69213 13.74149 13.67906 13.75456
Apr 2018 13.85279 13.82536 13.88022 13.81084 13.89475
May 2018 13.93156 13.90138 13.96174 13.88540 13.97771
Jun 2018 13.97723 13.94463 14.00982 13.92737 14.02708
Jul 2018 14.07672 14.04183 14.11161 14.02337 14.13007
Aug 2018 13.93595 13.89893 13.97297 13.87934 13.99257
Sep 2018 13.84321 13.80417 13.88225 13.78350 13.90292
Oct 2018 13.79931 13.75834 13.84027 13.73666 13.86196
Nov 2018 13.84058 13.79779 13.88337 13.77514 13.90602
Dec 2018 13.54646 13.50189 13.59103 13.47830 13.61462
Jan 2019 13.47399 13.42220 13.52577 13.39479 13.55319
Feb 2019 13.48081 13.42515 13.53647 13.39569 13.56594
Mar 2019 13.76130 13.70109 13.82151 13.66921 13.85339
Apr 2019 13.89347 13.82941 13.95753 13.79550 13.99144
May 2019 13.97700 13.90916 14.04485 13.87324 14.08076
Jun 2019 14.01916 13.94780 14.09053 13.91002 14.12831
Jul 2019 14.11474 14.04000 14.18949 14.00043 14.22905
Aug 2019 13.98259 13.90462 14.06056 13.86335 14.10183
Sep 2019 13.88960 13.80854 13.97067 13.76562 14.01358
Oct 2019 13.84131 13.75726 13.92536 13.71277 13.96985
Nov 2019 13.88272 13.79579 13.96965 13.74977 14.01567
Dec 2019 13.59293 13.50321 13.68265 13.45571 13.73015
>
> plot(forecast(model))