Notes and solutions for the time series tutorial: Linear Regression With Time Series.

Takeaways

This is not really an interesting tutorial, because it is too simple. But I'll list some take-aways.

Components of time series

Time dependent component: trend and seasonality
- Moving average for trend
- Fourier transform for seasonality
Serially dependent component: cycles
- Autocorrelation for cycles

Winners of Kaggle forecasting competitions have often included moving averages and other rolling statistics in their feature sets. Such features seem to be especially useful when used with GBDT algorithms like XGBoost.

Many time series can be closely described by an additive model of just these three components plus some essentially unpredictable, entirely random error:

$\text{series} = \text{trend} + \text{seasons} + \text{cycles} + \text{error}$

Learn a hybrid model

We could imagine learning the components of a time series as an iterative process: first learn the trend and subtract it out from the series, then learn the seasonality from the detrended residuals and subtract the seasons out, then learn the cycles and subtract the cycles out, and finally only the unpredictable error remains.

Learning the components of Mauna Loa CO2 step by step. Subtract the fitted curve (blue) from its series to get the series in the next step

There are generally two ways a regression algorithm can make predictions: either by transforming the features or by transforming the target.

Feature-transforming algorithms learn some mathematical function that takes features as an input and then combines and transforms them to produce an output that matches the target values in the training set.
- Examples: linear regression and neural networks.
- Can extrapolate target values beyond training set.
Target-transforming algorithms use the features to group the target values in the training set and make predictions by averaging values in a group; a set of feature just indicates which group to average.
- Examples: Decision trees and nearest neighbors.
- Can not extrapolate, so can not predict the trend.

As a result, we need a "hybrid" model, which uses feature-transforming algorithm (e.g., linear regression) to extrapolate the trend, and then apply a target-transforming algorithm (e.g., XGBoost) to model the de-trended residuals.

1 Linear Regression With Time Series

Use two features unique to time series: lags and time steps.

1.1 Interpret linear regression with the time dummy

Hardcover book sales

The linear regression line has an equation of (approximately) Hardcover = 3.33 * Time + 150.5. Over 6 days how much on average would you expect hardcover sales to change?

Easy, it shall be 3.33 * 6 = 19.98.

1.2 Interpret linear regression with a lag feature

image.png

One of these series has the equation target = 0.95 * lag_1 + error and the other has the equation target = -0.95 * lag_1 + error, differing only by the sign on the lag feature. Can you tell which equation goes with each series?

In the first graph, neighbor points tend to have the same sign, so it shall be target = 0.95 * lag_1 + error

1.3 Fit a time-step feature

Complete the code below to create a linear regression model with a time-step feature on the series of average product sales. The target is in a column called 'sales'.

from sklearn.linear_model import LinearRegression

df = average_sales.to_frame()

# YOUR CODE HERE: Create a time dummy
time = range(0, len(average_sales))

df['time'] = time 

# YOUR CODE HERE: Create training data
X = df[['time']]  # features
y = df.sales  # target

# Train the model
model = LinearRegression()
model.fit(X, y)

# Store the fitted values as a time series with the same time index as
# the training data
y_pred = pd.Series(model.predict(X), index=X.index)


# Check your answer
q_3.check()

First, we create a time dummy with range, and then use it as feature and pass to LinearRegression module.
Note that the feature must be a data frame, not a series. So use df[['time']] instead of df.time.

1.4 Fit a lag feature to Store Sales

Complete the code below to create a linear regression model with a lag feature on the series of average product sales. The target is in a column of df called 'sales'.

df = average_sales.to_frame()

# YOUR CODE HERE: Create a lag feature from the target 'sales'
lag_1 = df.sales.shift(1)

df['lag_1'] = lag_1  # add to dataframe

X = df.loc[:, ['lag_1']].dropna()  # features
y = df.loc[:, 'sales']  # target
y, X = y.align(X, join='inner')  # drop corresponding values in target

# YOUR CODE HERE: Create a LinearRegression instance and fit it to X and y.
model = LinearRegression()
model.fit(X, y)

# YOUR CODE HERE: Create Store the fitted values as a time series with
# the same time index as the training data
y_pred = pd.Series(model.predict(X), index=X.index)


# Check your answer
q_4.check()

Easy, it is a complement of 2.3, emphasizes the usage of LinearRegression.

2 Trend

Model long-term changes with moving averages and the time dummy.

2.1 Determine trend with a moving average plot

The US Retail Sales dataset contains monthly sales data for a number of retail industries in the United States. Make a moving average plot to estimate the trend for this series.

US Food and Beverage Sales

We need to use pandas.DataFrame.rolling method to get the "moving average". It has 3 important arguments:

window: size of the moving window. In this case, since it's monthly data, we can set the window to a year (12 months)
center: whether set the labels at the center of the window.
min_periods: Minimum number of observations in window required to have a value (otherwise result is NA). If we don't want NAs in the result, since we have "center" set to True, in the begin and end, we need at least 6 data points to generate a value.

# YOUR CODE HERE: Add methods to `food_sales` to compute a moving
# average with appropriate parameters for trend estimation.
trend = food_sales.rolling(12, center=True, min_periods=6).mean()
# Check your answer
q_1.check()

# Make a plot
ax = food_sales.plot(**plot_params, alpha=0.5)
ax = trend.plot(ax=ax, linewidth=3)

moving average

2.2 Identify trend

What order polynomial trend might be appropriate for the Food and Beverage Sales series? Can you think of a non-polynomial curve that might work even better?

Intuitively, because of the upwards bend, an order 2 polynomial might be appropriate.
(If you've worked with economic time series before, you might guess that the growth rate in Food and Beverage Sales is best expressed as a percent change. Percent change can often be modeled using an exponential curve. )

2.3 Create a Trend Feature

Use DeterministicProcess to create a feature set for a cubic trend model. Also create features for a 90-day forecast.

from statsmodels.tsa.deterministic import DeterministicProcess

y = average_sales.copy()  # the target

# YOUR CODE HERE: Instantiate `DeterministicProcess` with arguments
# appropriate for a cubic trend model
dp = DeterministicProcess(
    index=average_sales.index,  # dates from the training data
    order=3,             # the time dummy (trend)
    drop=True,           # drop terms if necessary to avoid collinearity
)

# YOUR CODE HERE: Create the feature set for the dates given in y.index
X = dp.in_sample()

# YOUR CODE HERE: Create features for a 90-day forecast.
X_fore = dp.out_of_sample(steps=90)


# Check your answer
q_3.check()

A deterministic process is a technical term for a time series that is non-random or completely determined, like the const and trend series are. Features derived from the time index will generally be deterministic

To make a forecast, we apply our model to out_of_sample features. "Out of sample" refers to times outside of the observation period of the training data.

Note that this is not the model. The comment has made it clear: "Create features for...". The X is just features:

DeterministicProcess only creates features

trend = time dummy
trend squared = (time dummy)^2
trend cubed = (time dummy)^3

We can use LinearRegression after DeterministicProcess to obtain the model:

model = LinearRegression()
model.fit(X, y)

y_pred = pd.Series(model.predict(X), index=X.index)
y_fore = pd.Series(model.predict(X_fore), index=X_fore.index)

ax = y.plot(**plot_params, alpha=0.5, title="Average Sales", ylabel="items sold")
ax = y_pred.plot(ax=ax, linewidth=3, label="Trend", color='C0')
ax = y_fore.plot(ax=ax, linewidth=3, label="Trend Forecast", color='C3')
ax.legend();

linear model

2.4 Understand risks of forecasting with high-order polynomials

Overfitting.

3 Seasonality

Create indicators and Fourier features to capture periodic change.

3.1 Determine seasonality

What kind of seasonality do you see evidence of?

periodogram

It suggests a strong weekly seasonality. It appears there may be some biweekly components as well. The background is that it is a store sales data, wages in the public sector are paid out biweekly.

The periodogram tells you the strength of the frequencies in a time series. Specifically, the value on the y-axis of the graph is $(a^2 + b^2) / 2$ , where a and b are the coefficients of the sine and cosine at that frequency after a Fourier transform.

3.2 Create seasonal features

Use DeterministicProcess and CalendarFourier to create:
- indicators for weekly seasons and
- Fourier features of order 4 for monthly seasons.

y = average_sales.copy()

# YOUR CODE HERE
fourier = CalendarFourier(freq="M", order=4)
dp = DeterministicProcess(
    index=y.index,
    constant=True,
    order=1,
    # YOUR CODE HERE
    seasonal=True,
    additional_terms=[fourier],
    drop=True,
)
X = dp.in_sample()

# Check your answer
q_2.check()

The task breaks into two parts. seasonal=True will generate "indicators for weekly seasons" in features (X), and CalendarFourier(freq="M", order=4) generates "Fourier features of order 4 for monthly seasons".

['const', 'trend', 
's(2,7)', 's(3,7)', 's(4,7)', 's(5,7)', 's(6,7)', 's(7,7)',
'sin(1,freq=M)', 'cos(1,freq=M)', 'sin(2,freq=M)', 'cos(2,freq=M)', 'sin(3,freq=M)', 'cos(3,freq=M)', 'sin(4,freq=M)', 'cos(4,freq=M)']

s(?, 7) is an encoding of weekday.

s(?, 7) example

For each i=1, …, order, sin(i, freq=M) can be calculated by:
$\sin (2\pi i \tau)$

Where $\tau$ is the proportion of time in a month. e.g.:

sin(i, freq=M) example

The first day is the origin (0, 0). The second is 1/31 month.

With these features, we can fit our linear model with code below:

model = LinearRegression().fit(X, y)
y_pred = pd.Series(
    model.predict(X),
    index=X.index,
    name='Fitted',
)

seasonal fit

We can remove the seasonal components from the sales, which is called deseasonalizing.

periodogram of the deseasonalized series

3.3 Check for remaining seasonality

Based on these periodograms, how effectively does it appear your model captured the seasonality in Average Sales? Does the periodogram agree with the time plot of the deseasonalized series?

The model was able to capture the seasonal variation in average sales, but it lacks the "peak" values (which are holidays, we'll discuss later).

If we plot the deseasonalized graph with holidays, we get below graph:

National and Regional Holidays

This means the seasonality can not capture the holiday features.

3.4 Create holiday features

What kind of features could you create to help your model make use of holiday information?

# YOUR CODE HERE
X_holidays = pd.get_dummies(holidays)

X2 = X.join(X_holidays, on='date').fillna(0.0)

# Check your answer
q_4.check()

We use pd.get_dummies to encode holidays with one-hot encoding, and use them as extra features by merging them on date.

After that, we train the linear model with the extra holiday features:

model = LinearRegression().fit(X2, y)
y_pred = pd.Series(
    model.predict(X2),
    index=X2.index,
    name='Fitted',
)

predict sales with holiday features

4 Time Series as Features

Predict the future from the past with a lag embedding.

Not every product family has sales showing cyclic behavior, and neither does the series of average sales. Sales of magazines, however, show patterns of growth and decay not well characterized by trend or seasons. In this question and the next, you'll model cycles in magazine sales using lag features.

Trend and seasonality will both create serial dependence that shows up in correlograms and lag plots. To isolate any purely cyclic behavior, we'll start by deseasonalizing the series.

magazine sales

4.1 Plotting cycles

Does this deseasonalized series show cyclic patterns? To confirm our intuition, we can try to isolate cyclic behavior using a moving-average plot just like we did with trend. The idea is to choose a window long enough to smooth over short-term seasonality, but short enough to still preserve the cycles.

Create a seven-day moving average from y, the series of magazine sales. Use a centered window, but don't set the min_periods argument.

# YOUR CODE HERE
y_ma = y.rolling(7, center=True).mean()


# Plot
ax = y_ma.plot()
ax.set_title("Seven-Day Moving Average");

moving average

4.2 Examine serial dependence in Store Sales

Are any of the lags significant according to the correlogram? Does the lag plot suggest any relationships that weren't apparent from the correlogram?

partial autocorrelation and lag plot

We can see the lag 1 and lag 6 are significant. The lag plot suggests some non-linear effect.

4.3 Examine the time series features

Recall from the tutorial that a leading indicator is a series whose values at one time can be used to predict the target at a future time -- a leading indicator provides "advance notice" of changes in the target.

The competition dataset includes a time series that could potentially be useful as a leading indicator -- the onpromotion series, which contains the number of items on a special promotion that day. Since the company itself decides when to do a promotion, there's no worry about "lookahead leakage"; we could use Tuesday's onpromotion value to forecast sales on Monday, for instance.

Does it appear that either leading or lagging values of onpromotion could be useful as a feature?

on promotion

The lag plot indicates that both leading and lagged values of on promotion are correlated to magazine sales.

TODO: explain how it determines the on promotion values useful.

4.4 Create time series features

Create the features indicated in the solution to Question 3. If no features from that series would be useful, use an empty dataframe pd.DataFrame() as your answer.

# YOUR CODE HERE: Make features from `y_deseason`
X_lags = make_lags(y_deseason, lags=1)

# YOUR CODE HERE: Make features from `onpromotion`
# You may want to use `pd.concat`
X_promo = pd.concat([make_lags(onpromotion, lags=1), onpromotion, make_leads(onpromotion, leads=1)], axis=1)

# YOUR CODE HERE: Make features from `oil`
X_oil = pd.DataFrame()
# pd.read_csv(comp_dir / 'oil.csv', parse_dates=['date'], infer_datetime_format=True).set_index('date')

X = pd.concat([X_lags, X_promo, X_oil], axis=1).dropna()
y, X = y.align(X, join='inner')

# Check your answer
q_4.check()

4.5 Create statistical features

Computing rolling statistics to be used as features is similar except we need to take care to avoid lookahead leakage.

First, the result should be set at the right end of the window instead of the center -- that is, we should use center=False (the default) in the rolling method.
Second, the target should be lagged a step.

Edit the code in the next cell to create the following features:
- 14-day rolling median (median) of lagged target
- 7-day rolling standard deviation (std) of lagged target
- 7-day sum (sum) of items "on promotion", with centered window

y_lag = mag_sales.loc[:, 'sales'].shift(1)
onpromo = mag_sales.loc[:, 'onpromotion']

# 28-day mean of lagged target
mean_7 = y_lag.rolling(7).mean()
# YOUR CODE HERE: 14-day median of lagged target
median_14 = y_lag.rolling(14).median()
# YOUR CODE HERE: 7-day rolling standard deviation of lagged target
std_7 = y_lag.rolling(7).std()
# YOUR CODE HERE: 7-day sum of promotions with centered window
promo_7 = onpromo.rolling(7, center=True).sum()


# Check your answer
q_5.check()

We don't lag the onpromo because the company itself decides when to do a promotion, there's no worry about "lookahead leakage"

5 Hybrid Models

Combine the strengths of two forecasters with this powerful technique.

# You'll add fit and predict methods to this minimal class
class BoostedHybrid:
    def __init__(self, model_1, model_2):
        self.model_1 = model_1
        self.model_2 = model_2
        self.y_columns = None  # store column names from fit method

5.1 Define fit method for boosted hybrid

Complete the fit definition for the BoostedHybrid class.

def fit(self, X_1, X_2, y):
    # YOUR CODE HERE: fit self.model_1
    self.model_1.fit(X_1, y)

    y_fit = pd.DataFrame(
        # YOUR CODE HERE: make predictions with self.model_1
        self.model_1.predict(X_1),
        index=X_1.index, columns=y.columns,
    )

    # YOUR CODE HERE: compute residuals
    y_resid = y - y_fit
    y_resid = y_resid.stack().squeeze() # wide to long

    # YOUR CODE HERE: fit self.model_2 on residuals
    self.model_2.fit(X_2, y_resid)

    # Save column names for predict method
    self.y_columns = y.columns
    # Save data for question checking
    self.y_fit = y_fit
    self.y_resid = y_resid


# Add method to class
BoostedHybrid.fit = fit


# Check your answer
q_1.check()

Follow steps below:

train model1 with feature X1, target y
make prediction y1 with model1
compute residuals with y - y1
train model2 with feature X2, target y - y1

5.2 Define predict method for boosted hybrid

Now define the predict method for the BoostedHybrid class.

def predict(self, X_1, X_2):
    y_pred = pd.DataFrame(
        # YOUR CODE HERE: predict with self.model_1
        self.model_1.predict(X_1),
        index=X_1.index, columns=self.y_columns,
    )
    y_pred = y_pred.stack().squeeze()  # wide to long

    # YOUR CODE HERE: add self.model_2 predictions to y_pred
    y_pred += self.model_2.predict(X_2)
    
    return y_pred.unstack()  # long to wide


# Add method to class
BoostedHybrid.predict = predict


# Check your answer
q_2.check()

Follow steps below:

make prediction y1 with feature X1
make prediction y2 with feature X2
add y1 and y2

5.3 Train boosted hybrid

Create the hybrid model by initializing a BoostedHybrid class with LinearRegression() and XGBRegressor() instances.

# YOUR CODE HERE: Create LinearRegression + XGBRegressor hybrid with BoostedHybrid
model = BoostedHybrid(LinearRegression(), XGBRegressor())

# YOUR CODE HERE: Fit and predict
model.fit(X_1, X_2, y)
y_pred = model.predict(X_1, X_2)

y_pred = y_pred.clip(0.0)


# Check your answer
q_3.check()

6 Forecasting With Machine Learning

Apply ML to any forecasting task with these four strategies.

6.1 Match description to dataset

Consider the following three forecasting tasks:

a. 3-step forecast using 4 lag features with a 2-step lead time

b. 1-step forecast using 3 lag features with a 1-step lead time

c. 3-step forecast using 4 lag features with a 1-step lead time

answer: 2-1-3

The number of forecasting steps is the number of columns under Targets.
The number of lags is the number of columns under Features.
The lead time is the smallest lag step possible

6.2 Identify the forecasting task for Store Sales competition

Try to identify the forecast origin and the forecast horizon. How many steps are within the forecast horizon? What is the lead time for the forecast?

define a forecast task

The "forecast origin" is time at which you are making a forecast. You may consider it to be the last timestamp of your training data.
The "forecast horizon" is the time range for which you are making a forecast.
The time between the origin and the horizon is the "lead time".

16 steps are within the forecast horizon, leading time is 1.

The training dataset ends on 2017/08/15 (the forecast origin), the test set comprises the dates 2017/08/16 to 2017/08/31, which includes 16 days (the forecast horizon). There is 1 day between the origin and horizon, so the lead time is 1 day.

6.3 Create multistep dataset for store sales

Create targets suitable for the Store Sales forecasting task. Use 4 days of lag features. Drop any missing values from both targets and features.

# YOUR CODE HERE
y = family_sales.loc[:, 'sales']

# YOUR CODE HERE: Make 4 lag features
X = make_lags(y, lags=4).dropna()

# YOUR CODE HERE: Make multistep target
y = make_multistep_target(y, steps=16).dropna()

y, X = y.align(X, join='inner', axis=0)

# Check your answer
q_3.check()

6.4 Forecast with the DirRec strategy

Instatiate a model that applies the DirRec strategy to XGBoost.

DirRec strategy is a combination of the direct and recursive strategies: train a model for each step and use forecasts from previous steps as new lag features.

from sklearn.multioutput import RegressorChain

# YOUR CODE HERE
model = RegressorChain(XGBRegressor())

# Check your answer
q_4.check()

Kaggle time series tutorial solution