学习自中国大学MOOC TensorFlow学习课程
一、目的
时间序列是数据科学中最常用的技术之一,它具有广泛的应用——天气预报、销售预测、趋势分析等。本部分主要介绍时间序列的生成,并主要使用RNN、双向LSTM对时间序列进行预测。
二、模拟生成一个时间序列
时序信号
= 周期信号
+ 趋势信号
+ 随机信号(随机噪声)
2.1 趋势线
#生成时间序列数据D:
#绘制趋势线函数
def plot_series(time, series):
plt.figure(figsize=(10, 6))
plt.plot(time, series)
plt.xlabel("time")
plt.ylabel("value")
plt.grid(True)
plt.show()
#生成趋势线
def trend(time, slope=0):
return slope * time #slope为斜率
time = np.arange(4 * 365 + 1, dtype='float32')
baseline = 10
series = trend(time, 0.1)
plot_series(time, series)
2.2 周期信号
# 生成季节性的时间序列
# 分段函数
def seasonal_pattern(season_time):
return np.where(season_time < 0.4,
np.cos(season_time * 2 * np.pi),
1 / np.exp(3 * season_time))
#振幅amplitude,相位phase
def seasonality(time, period, amplitude=1, phase=0):
season_time = ((time + phase) % period) / period
return amplitude * seasonal_pattern(season_time)
baseline = 10
amplitude = 40
series = seasonality(time, period=365, amplitude=amplitude)
plot_series(time, series)
2.3 周期信号加上趋势线信号
slope = 0.05
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
plot_series(time, series)
2.4 加入白噪声
#叠加白噪音
def noise(time, noise_level=1):
return np.random.randn(len(time)) * noise_level
noise_level = 15
noisy_series = series + noise(time, noise_level)
plot_series(time, noisy_series)
增加噪声水平
趋势变得更加不明显了
noise_level = 40
noisy_series = series + noise(time, noise_level)
plot_series(time, noisy_series)
三、噪声平滑处理(非深度学习的方法)
3.1 平滑噪音的方法1
用加权和来替代原本的取值
两个权重:
- rho1 = 0.5
- rho2 = -0.1
def autocorrelation(time, amplitude):
rho1 = 0.5
rho2 = -0.1
#生成一个正态分布的
ar = np.random.randn(len(time) + 50)
ar[:50] = 100
for step in range(50, len(time) + 50):
ar[step] += rho1 * ar[step - 50] #该位置点的加上(前面50个位置的点*权重rho1 的值)
ar[step] += rho2 * ar[step - 33] #该位置点的加上(前面33个位置的点*权重rho2的值)
return ar[50:] * amplitude #再乘强度amplitude,使得这个信号和原来的信号振幅尽量一样
series = autocorrelation(time, 10)
plot_series(time[:200], series[:200])
3.1 平滑噪音的方法2
某个时间点的取值 = 前一个点的值 + 前一个点的值*rho权重
#平滑噪音方法2
def autocorrelation(time, amplitude):
rho = 0.8
ar = np.random.randn(len(time) + 1)
for step in range(1, len(time) + 1):
ar[step] += rho * ar[step - 1]
return ar[1:] * amplitude
series = autocorrelation(time, 10)
plot_series(time[:200], series[:200])
测试
生成噪声
series = noise(time)
plot_series(time[:200], series[:200])
将平滑函数曲线和趋势线进行叠加
series = autocorrelation(time, 10) + trend(time, 2)
plot_series(time[:200], series[:200])
再加上周期信号
series = autocorrelation(time, 10) + seasonality(time, period=50, amplitude=150) + trend(time, 2)
plot_series(time[:200], series[:200])
四、时间序列的非深度学习的预测方法
try:
# %tensorflow_version only exists in Colab.
%tensorflow_version 2.x
except Exception:
pass
import tensorflow as tf
print(tf.__version__)
2.4.0
The next code block will set up the time series with seasonality, trend and a bit of noise.
# 生成有季节成分和趋势成分的时间序列
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
def plot_series(time, series, format="-", start=0, end=None):
plt.plot(time[start:end], series[start:end], format)
plt.xlabel("Time")
plt.ylabel("Value")
plt.grid(True)
def trend(time, slope=0):
return slope * time
def seasonal_pattern(season_time):
"""Just an arbitrary pattern, you can change it if you wish"""
return np.where(season_time < 0.4,
np.cos(season_time * 2 * np.pi),
1 / np.exp(3 * season_time))
def seasonality(time, period, amplitude=1, phase=0):
"""Repeats the same pattern at each period"""
season_time = ((time + phase) % period) / period
return amplitude * seasonal_pattern(season_time)
def noise(time, noise_level=1, seed=None):
rnd = np.random.RandomState(seed)
return rnd.randn(len(time)) * noise_level
time = np.arange(4 * 365 + 1, dtype="float32") #1461
baseline = 10
series = trend(time, 0.1)
baseline = 10
amplitude = 40
slope = 0.05
noise_level = 5
# Create the series
series = baseline + trend(time, slope) + seasonality(time, period=365, amplitude=amplitude)
# Update with noise
series += noise(time, noise_level, seed=42)
plt.figure(figsize=(10, 6))
plot_series(time, series)
plt.show()
Now that we have the time series, let's split it so we can start forecasting
将序列前1000个切分作为训练集,后1001~1600作为测试集
#前100个时序为训练集,后面的为测试集
split_time = 1000
time_train = time[:split_time]
x_train = series[:split_time]
time_valid = time[split_time:]
x_valid = series[split_time:]
plt.figure(figsize=(10, 6))
plot_series(time_train, x_train)
plt.show()
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plt.show()
4.1 Naive Forecast 朴素预测法预测时间序列
# 使用朴素预测法进行预测
#即取上一个时间点的值,作为下一个时间点的预测值,
naive_forecast = series[split_time - 1:-1]
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, naive_forecast) #相当于平移了一个单位
Let's zoom in on the start of the validation period:
#对图进行扩展
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid, start=0, end=150)
plot_series(time_valid, naive_forecast, start=1, end=151) #在垂直和横向方向改动比例
You can see that the naive forecast lags 1 step behind the time series.
Now let's compute the mean squared error and the mean absolute error between the forecasts and the predictions in the validation period:
# 计算误差
tf.compat.v1.enable_eager_execution()
print(keras.metrics.mean_squared_error(x_valid, naive_forecast).numpy()) #均方误差
print(keras.metrics.mean_absolute_error(x_valid, naive_forecast).numpy()) #平均绝对误差
61.827538
5.9379086
That's our baseline, now let's try a moving average:
4.2 使用移动平均法进行预测
移动平均法是用最近一段区间内的时刻值的均值来预测未来时刻的值。
def moving_average_forecast(series, window_size):
"""Forecasts the mean of the last few values.
If window_size=1, then this is equivalent to naive forecast"""
forecast = []
#窗口每次往后移动一个步长
for time in range(len(series) - window_size):
forecast.append(series[time:time + window_size].mean()) #计算窗口里包含的序列值得平均值,作为这个时间点的取值
return np.array(forecast)
#从分割点前30个时间(训练集)值的平均值开始,作为预测值
moving_avg = moving_average_forecast(series, 30)[split_time - 30:]
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, moving_avg)
# 计算误差
print(keras.metrics.mean_squared_error(x_valid, moving_avg).numpy())
print(keras.metrics.mean_absolute_error(x_valid, moving_avg).numpy())
106.674576
7.142419
移动平均时,周期信号波动很大,如果窗口比较大,用它的平均值来预测某一个时刻的值,误差会非常大
问题根源是,用移动平均做预测去预测周期性信号误差会非常大
That's worse than naive forecast! The moving average does not anticipate trend or seasonality, so let's try to remove them by using differencing. Since the seasonality period is 365 days, we will subtract the value at time t – 365 from the value at time t.
解决方法:把正态分布的随机信号,从周期性信号中提取出来,只针对误差做移动平均的预测,周期性信号不需要做平均;
然后分别对这两个信号做预测并且相加,得到新的预测值。
让校验集的每一个时刻都减掉同一个周期,也就是减去周期365,只保留随机误差信号
优化
去除趋势或周期性
# 去除趋势或周期性
diff_series = (series[365:] - series[:-365]) #提取出随机误差信号
diff_time = time[365:]
plt.figure(figsize=(10, 6))
plot_series(diff_time, diff_series)
plt.show()
Great, the trend and seasonality seem to be gone, so now we can use the moving average:
#只对随机误差做移动平均
#时间窗放大了些,是50个时刻
diff_moving_avg = moving_average_forecast(diff_series, 50)[split_time - 365 - 50:]
plt.figure(figsize=(10, 6))
plot_series(time_valid, diff_series[split_time - 365:])
plot_series(time_valid, diff_moving_avg)
plt.show()
Now let's bring back the trend and seasonality by adding the past values from t – 365:
计算周期信号的预测值 = 用上一个周期的周期值
预测值 = 用上一个周期的周期值 + 平滑后的误差值
diff_moving_avg_plus_past = series[split_time - 365:-365] + diff_moving_avg
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_past)
plt.show()
# 计算误差
print(keras.metrics.mean_squared_error(x_valid, diff_moving_avg_plus_past).numpy())
print(keras.metrics.mean_absolute_error(x_valid, diff_moving_avg_plus_past).numpy())
52.973656
5.8393106
进一步优化
Better than naive forecast, good. However the forecasts look a bit too random, because we're just adding past values, which were noisy. Let's use a moving averaging on past values to remove some of the noise:
对原本的周期信号进行移动平均 但是窗口不能太大
# 使用0均法来消除部分噪声
# 对上一个周期进行移动平均,但是窗口不能太大
diff_moving_avg_plus_smooth_past = moving_average_forecast(series[split_time - 370:-360], 10) + diff_moving_avg
plt.figure(figsize=(10, 6))
plot_series(time_valid, x_valid)
plot_series(time_valid, diff_moving_avg_plus_smooth_past)
plt.show()
print(keras.metrics.mean_squared_error(x_valid, diff_moving_avg_plus_smooth_past).numpy())
print(keras.metrics.mean_absolute_error(x_valid, diff_moving_avg_plus_smooth_past).numpy())
33.45226
4.569442