{ "cells": [ { "cell_type": "markdown", "id": "42a0f481", "metadata": {}, "source": [ "# Forecasting With Exogenous Regressors\n", "Consider a multivariate time series $X^{(1)}, \\ldots, X^{(t)}$, where each $X^{(i)} \\in \\mathbb{R}^d$ is a d-dimensional vector. In multivariate forecasting, our goal is to predict the future values of the k'th univariate $X_k^{(t+1)}, \\ldots, X_k^{(t+h)}$. \n", "\n", "Exogenous regressors $Y^{(i)}$ are a set of additional variables whose values we know a priori. The task of forecasting with exogenous regressors is to predict our target univariate $X_k^{(t+1)}, \\ldots, X_k^{(t+h)}$, conditioned on\n", "- The past values of the time series $X^{(1)}, \\ldots, X^{(t)}$\n", "- The past values of the exogenous regressors $Y^{(1)}, \\ldots, Y^{(t)}$\n", "- The *future* values of the exogenous regressors $Y^{(t+1)}, \\ldots, Y^{(t+h)}$\n", "\n", "For example, one can consider the task of predicting the sales of a specific item at a store. Endogenous variables $X^{(i)} \\in \\mathbb{R}^4$ may contain the number of units sold (the target univariate), the temperature outside, the consumer price index, and the current unemployemnt rate. Exogenous variables $Y^{(i)} \\in \\mathbb{R}^6$ are variables that the store has control over or prior knowledge of. They may include whether a particular day is a holiday, and various information about the sort of markdowns the store is running.\n", "\n", "To be more concrete, let's show this with some real data." ] }, { "cell_type": "code", "execution_count": 1, "id": "509b77ea", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Weekly_Sales | \n", "Temperature | \n", "Fuel_Price | \n", "MarkDown1 | \n", "MarkDown2 | \n", "MarkDown3 | \n", "MarkDown4 | \n", "MarkDown5 | \n", "CPI | \n", "Unemployment | \n", "IsHoliday | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2010-02-05 | \n", "39602.47 | \n", "40.19 | \n", "2.572 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "210.752605 | \n", "8.324 | \n", "False | \n", "
2010-02-12 | \n", "37984.44 | \n", "38.49 | \n", "2.548 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "210.897994 | \n", "8.324 | \n", "True | \n", "
2010-02-19 | \n", "38889.43 | \n", "39.69 | \n", "2.514 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "210.945160 | \n", "8.324 | \n", "False | \n", "
2010-02-26 | \n", "41137.74 | \n", "46.10 | \n", "2.561 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "210.975957 | \n", "8.324 | \n", "False | \n", "
2010-03-05 | \n", "39883.50 | \n", "47.17 | \n", "2.625 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "211.006754 | \n", "8.324 | \n", "False | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2012-09-28 | \n", "37104.67 | \n", "79.45 | \n", "3.666 | \n", "7106.05 | \n", "1.91 | \n", "1.65 | \n", "1549.10 | \n", "3946.03 | \n", "222.616433 | \n", "6.565 | \n", "False | \n", "
2012-10-05 | \n", "36361.28 | \n", "70.27 | \n", "3.617 | \n", "6037.76 | \n", "NaN | \n", "10.04 | \n", "3027.37 | \n", "3853.40 | \n", "222.815930 | \n", "6.170 | \n", "False | \n", "
2012-10-12 | \n", "35332.34 | \n", "60.97 | \n", "3.601 | \n", "2145.50 | \n", "NaN | \n", "33.31 | \n", "586.83 | \n", "10421.01 | \n", "223.015426 | \n", "6.170 | \n", "False | \n", "
2012-10-19 | \n", "35721.09 | \n", "68.08 | \n", "3.594 | \n", "4461.89 | \n", "NaN | \n", "1.14 | \n", "1579.67 | \n", "2642.29 | \n", "223.059808 | \n", "6.170 | \n", "False | \n", "
2012-10-26 | \n", "34260.76 | \n", "69.79 | \n", "3.506 | \n", "6152.59 | \n", "129.77 | \n", "200.00 | \n", "272.29 | \n", "2924.15 | \n", "223.078337 | \n", "6.170 | \n", "False | \n", "
143 rows × 11 columns
\n", "\n", " | IsHoliday | \n", "MarkDown1 | \n", "MarkDown2 | \n", "MarkDown3 | \n", "MarkDown4 | \n", "MarkDown5 | \n", "
---|---|---|---|---|---|---|
Date | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
2010-02-05 | \n", "False | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2010-02-12 | \n", "True | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2010-02-19 | \n", "False | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2010-02-26 | \n", "False | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
2010-03-05 | \n", "False | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
2012-09-28 | \n", "False | \n", "7106.05 | \n", "1.91 | \n", "1.65 | \n", "1549.10 | \n", "3946.03 | \n", "
2012-10-05 | \n", "False | \n", "6037.76 | \n", "NaN | \n", "10.04 | \n", "3027.37 | \n", "3853.40 | \n", "
2012-10-12 | \n", "False | \n", "2145.50 | \n", "NaN | \n", "33.31 | \n", "586.83 | \n", "10421.01 | \n", "
2012-10-19 | \n", "False | \n", "4461.89 | \n", "NaN | \n", "1.14 | \n", "1579.67 | \n", "2642.29 | \n", "
2012-10-26 | \n", "False | \n", "6152.59 | \n", "129.77 | \n", "200.00 | \n", "272.29 | \n", "2924.15 | \n", "
143 rows × 6 columns
\n", "