{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "42a0f481",
   "metadata": {},
   "source": [
    "# Forecasting With Exogenous Regressors\n",
    "Consider a multivariate time series $X^{(1)}, \\ldots, X^{(t)}$, where each $X^{(i)} \\in \\mathbb{R}^d$ is a d-dimensional vector. In multivariate forecasting, our goal is to predict the future values of the k'th univariate $X_k^{(t+1)}, \\ldots, X_k^{(t+h)}$. \n",
    "\n",
    "Exogenous regressors $Y^{(i)}$ are a set of additional variables whose values we know a priori. The task of forecasting with exogenous regressors is to predict our target univariate $X_k^{(t+1)}, \\ldots, X_k^{(t+h)}$, conditioned on\n",
    "- The past values of the time series $X^{(1)}, \\ldots, X^{(t)}$\n",
    "- The past values of the exogenous regressors  $Y^{(1)}, \\ldots, Y^{(t)}$\n",
    "- The *future* values of the exogenous regressors  $Y^{(t+1)}, \\ldots, Y^{(t+h)}$\n",
    "\n",
    "For example, one can consider the task of predicting the sales of a specific item at a store. Endogenous variables $X^{(i)} \\in \\mathbb{R}^4$ may contain the number of units sold (the target univariate), the temperature outside, the consumer price index, and the current unemployemnt rate. Exogenous variables $Y^{(i)} \\in \\mathbb{R}^6$ are variables that the store has control over or prior knowledge of. They may include whether a particular day is a holiday, and various information about the sort of markdowns the store is running.\n",
    "\n",
    "To be more concrete, let's show this with some real data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "509b77ea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Weekly_Sales</th>\n",
       "      <th>Temperature</th>\n",
       "      <th>Fuel_Price</th>\n",
       "      <th>MarkDown1</th>\n",
       "      <th>MarkDown2</th>\n",
       "      <th>MarkDown3</th>\n",
       "      <th>MarkDown4</th>\n",
       "      <th>MarkDown5</th>\n",
       "      <th>CPI</th>\n",
       "      <th>Unemployment</th>\n",
       "      <th>IsHoliday</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-02-05</th>\n",
       "      <td>39602.47</td>\n",
       "      <td>40.19</td>\n",
       "      <td>2.572</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.752605</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-12</th>\n",
       "      <td>37984.44</td>\n",
       "      <td>38.49</td>\n",
       "      <td>2.548</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.897994</td>\n",
       "      <td>8.324</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-19</th>\n",
       "      <td>38889.43</td>\n",
       "      <td>39.69</td>\n",
       "      <td>2.514</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.945160</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-26</th>\n",
       "      <td>41137.74</td>\n",
       "      <td>46.10</td>\n",
       "      <td>2.561</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.975957</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-03-05</th>\n",
       "      <td>39883.50</td>\n",
       "      <td>47.17</td>\n",
       "      <td>2.625</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.006754</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-09-28</th>\n",
       "      <td>37104.67</td>\n",
       "      <td>79.45</td>\n",
       "      <td>3.666</td>\n",
       "      <td>7106.05</td>\n",
       "      <td>1.91</td>\n",
       "      <td>1.65</td>\n",
       "      <td>1549.10</td>\n",
       "      <td>3946.03</td>\n",
       "      <td>222.616433</td>\n",
       "      <td>6.565</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-05</th>\n",
       "      <td>36361.28</td>\n",
       "      <td>70.27</td>\n",
       "      <td>3.617</td>\n",
       "      <td>6037.76</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.04</td>\n",
       "      <td>3027.37</td>\n",
       "      <td>3853.40</td>\n",
       "      <td>222.815930</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-12</th>\n",
       "      <td>35332.34</td>\n",
       "      <td>60.97</td>\n",
       "      <td>3.601</td>\n",
       "      <td>2145.50</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33.31</td>\n",
       "      <td>586.83</td>\n",
       "      <td>10421.01</td>\n",
       "      <td>223.015426</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-19</th>\n",
       "      <td>35721.09</td>\n",
       "      <td>68.08</td>\n",
       "      <td>3.594</td>\n",
       "      <td>4461.89</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.14</td>\n",
       "      <td>1579.67</td>\n",
       "      <td>2642.29</td>\n",
       "      <td>223.059808</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-26</th>\n",
       "      <td>34260.76</td>\n",
       "      <td>69.79</td>\n",
       "      <td>3.506</td>\n",
       "      <td>6152.59</td>\n",
       "      <td>129.77</td>\n",
       "      <td>200.00</td>\n",
       "      <td>272.29</td>\n",
       "      <td>2924.15</td>\n",
       "      <td>223.078337</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>143 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Weekly_Sales  Temperature  Fuel_Price  MarkDown1  MarkDown2  \\\n",
       "Date                                                                      \n",
       "2010-02-05      39602.47        40.19       2.572        NaN        NaN   \n",
       "2010-02-12      37984.44        38.49       2.548        NaN        NaN   \n",
       "2010-02-19      38889.43        39.69       2.514        NaN        NaN   \n",
       "2010-02-26      41137.74        46.10       2.561        NaN        NaN   \n",
       "2010-03-05      39883.50        47.17       2.625        NaN        NaN   \n",
       "...                  ...          ...         ...        ...        ...   \n",
       "2012-09-28      37104.67        79.45       3.666    7106.05       1.91   \n",
       "2012-10-05      36361.28        70.27       3.617    6037.76        NaN   \n",
       "2012-10-12      35332.34        60.97       3.601    2145.50        NaN   \n",
       "2012-10-19      35721.09        68.08       3.594    4461.89        NaN   \n",
       "2012-10-26      34260.76        69.79       3.506    6152.59     129.77   \n",
       "\n",
       "            MarkDown3  MarkDown4  MarkDown5         CPI  Unemployment  \\\n",
       "Date                                                                    \n",
       "2010-02-05        NaN        NaN        NaN  210.752605         8.324   \n",
       "2010-02-12        NaN        NaN        NaN  210.897994         8.324   \n",
       "2010-02-19        NaN        NaN        NaN  210.945160         8.324   \n",
       "2010-02-26        NaN        NaN        NaN  210.975957         8.324   \n",
       "2010-03-05        NaN        NaN        NaN  211.006754         8.324   \n",
       "...               ...        ...        ...         ...           ...   \n",
       "2012-09-28       1.65    1549.10    3946.03  222.616433         6.565   \n",
       "2012-10-05      10.04    3027.37    3853.40  222.815930         6.170   \n",
       "2012-10-12      33.31     586.83   10421.01  223.015426         6.170   \n",
       "2012-10-19       1.14    1579.67    2642.29  223.059808         6.170   \n",
       "2012-10-26     200.00     272.29    2924.15  223.078337         6.170   \n",
       "\n",
       "            IsHoliday  \n",
       "Date                   \n",
       "2010-02-05      False  \n",
       "2010-02-12       True  \n",
       "2010-02-19      False  \n",
       "2010-02-26      False  \n",
       "2010-03-05      False  \n",
       "...               ...  \n",
       "2012-09-28      False  \n",
       "2012-10-05      False  \n",
       "2012-10-12      False  \n",
       "2012-10-19      False  \n",
       "2012-10-26      False  \n",
       "\n",
       "[143 rows x 11 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# This is the same dataset used in the custom dataset tutorial\n",
    "import os\n",
    "from ts_datasets.forecast import CustomDataset\n",
    "csv = os.path.join(\"..\", \"..\", \"data\", \"walmart\", \"walmart_mini.csv\")\n",
    "dataset = CustomDataset(rootdir=csv, index_cols=[\"Store\", \"Dept\"], test_frac=0.10)\n",
    "ts, md = dataset[-1]\n",
    "display(ts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f2ea8bed",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The earliest univariate starts at 2010-02-05 00:00:00, but the latest univariate starts at 2011-11-11 00:00:00, a difference of 644 days 00:00:00. This is more than 10% of the length of the shortest univariate (350 days 00:00:00). You may want to check that the univariates cover the same window of time.\n",
      "Stack (most recent call last):\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py\", line 197, in _run_module_as_main\n",
      "    return _run_code(code, main_globals, None,\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py\", line 87, in _run_code\n",
      "    exec(code, run_globals)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel_launcher.py\", line 16, in <module>\n",
      "    app.launch_new_instance()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/traitlets/config/application.py\", line 845, in launch_instance\n",
      "    app.start()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/kernelapp.py\", line 612, in start\n",
      "    self.io_loop.start()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/platform/asyncio.py\", line 199, in start\n",
      "    self.asyncio_loop.run_forever()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py\", line 596, in run_forever\n",
      "    self._run_once()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/base_events.py\", line 1890, in _run_once\n",
      "    handle._run()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/asyncio/events.py\", line 80, in _run\n",
      "    self._context.run(self._callback, *self._args)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/ioloop.py\", line 688, in <lambda>\n",
      "    lambda f: self._run_callback(functools.partial(callback, future))\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/ioloop.py\", line 741, in _run_callback\n",
      "    ret = callback()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 814, in inner\n",
      "    self.ctx_run(self.run)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 775, in run\n",
      "    yielded = self.gen.send(value)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 374, in dispatch_queue\n",
      "    yield self.process_one()\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 250, in wrapper\n",
      "    runner = Runner(ctx_run, result, future, yielded)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 741, in __init__\n",
      "    self.ctx_run(self.run)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 775, in run\n",
      "    yielded = self.gen.send(value)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 358, in process_one\n",
      "    yield gen.maybe_future(dispatch(*args))\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 234, in wrapper\n",
      "    yielded = ctx_run(next, result)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 261, in dispatch_shell\n",
      "    yield gen.maybe_future(handler(stream, idents, msg))\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 234, in wrapper\n",
      "    yielded = ctx_run(next, result)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/kernelbase.py\", line 536, in execute_request\n",
      "    self.do_execute(\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/tornado/gen.py\", line 234, in wrapper\n",
      "    yielded = ctx_run(next, result)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/ipkernel.py\", line 302, in do_execute\n",
      "    res = shell.run_cell(code, store_history=store_history, silent=silent)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/ipykernel/zmqshell.py\", line 539, in run_cell\n",
      "    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 2898, in run_cell\n",
      "    result = self._run_cell(\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 2944, in _run_cell\n",
      "    return runner(coro)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/async_helpers.py\", line 68, in _pseudo_sync_runner\n",
      "    coro.send(None)\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3169, in run_cell_async\n",
      "    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3361, in run_ast_nodes\n",
      "    if (await self.run_code(code, result,  async_=asy)):\n",
      "  File \"/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/IPython/core/interactiveshell.py\", line 3441, in run_code\n",
      "    exec(code_obj, self.user_global_ns, self.user_ns)\n",
      "  File \"<ipython-input-2-f4b6cbd5939f>\", line 9, in <module>\n",
      "    exog = TimeSeries.from_pd(ts[[\"IsHoliday\", \"MarkDown1\", \"MarkDown2\", \"MarkDown3\", \"MarkDown4\", \"MarkDown5\"]])\n",
      "  File \"/Users/abhatnagar/Desktop/Merlion/merlion/utils/time_series.py\", line 794, in from_pd\n",
      "    return cls(df=df, freq=freq, check_aligned=check_aligned)\n",
      "  File \"/Users/abhatnagar/Desktop/Merlion/merlion/utils/time_series.py\", line 493, in __init__\n",
      "    logger.warning(\n"
     ]
    }
   ],
   "source": [
    "from merlion.utils import TimeSeries\n",
    "\n",
    "# Get the endogenous variables X and split them into train & test\n",
    "endog = ts[[\"Weekly_Sales\", \"Temperature\", \"CPI\", \"Unemployment\"]]\n",
    "train = TimeSeries.from_pd(endog[md.trainval])\n",
    "test = TimeSeries.from_pd(endog[~md.trainval])\n",
    "\n",
    "# Get the exogenous variables Y\n",
    "exog = TimeSeries.from_pd(ts[[\"IsHoliday\", \"MarkDown1\", \"MarkDown2\", \"MarkDown3\", \"MarkDown4\", \"MarkDown5\"]])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1b01c639",
   "metadata": {},
   "source": [
    "Here, our task is to predict the weekly sales. We would like our model to also account for variables which may have an impact on consumer demand (i.e. temperature, consumer price index, and unemployment), as knowledge of these variables could improve the quality of our sales forecast. This would be a multivariate forecasting problem, covered [here](2_ForecastMultivariate.ipynb).\n",
    "\n",
    "In principle, we could add markdowns and holidays to the multivariate model. However, as a retailer, we know a priori which days are holidays, and we ourselves control the markdowns. In many cases, we can get better forecasts by providing the future values of these variables in addition to the past values. Moreover, we may wish to model how changing the future markdowns would change the future sales. This is why we should model these variables as exogenous regressors instead. \n",
    "\n",
    "All Merlion forecasters support an API which accepts exogenous regressors at both training and inference time, though only some models actually support the feature. Using the feature is as easy as specifying an optional argument `exog_data` to both `train()` and `forecast()`. We show how to use the feature for the popular `Prophet` model below, and demonstrate that adding exogenous regressors can improve the quality of the forecast."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "36f106f6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "17:50:59 - cmdstanpy - INFO - Chain [1] start processing\n",
      "17:50:59 - cmdstanpy - INFO - Chain [1] done processing\n",
      "17:50:59 - cmdstanpy - INFO - Chain [1] start processing\n",
      "17:50:59 - cmdstanpy - INFO - Chain [1] done processing\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sMAPE (w/o exog) = 3.98\n",
      "sMAPE (w/ exog)  = 3.18\n"
     ]
    }
   ],
   "source": [
    "from merlion.evaluate.forecast import ForecastMetric\n",
    "from merlion.models.forecast.prophet import Prophet, ProphetConfig\n",
    "\n",
    "# Train a model without exogenous data\n",
    "model = Prophet(ProphetConfig(target_seq_index=0))\n",
    "model.train(train)\n",
    "pred, err = model.forecast(test.time_stamps)\n",
    "smape = ForecastMetric.sMAPE.value(test, pred, target_seq_index=model.target_seq_index)\n",
    "print(f\"sMAPE (w/o exog) = {smape:.2f}\")\n",
    "\n",
    "# Train a model with exogenous data\n",
    "exog_model = Prophet(ProphetConfig(target_seq_index=0))\n",
    "exog_model.train(train, exog_data=exog)\n",
    "exog_pred, exog_err = exog_model.forecast(test.time_stamps, exog_data=exog)\n",
    "exog_smape = ForecastMetric.sMAPE.value(test, exog_pred, target_seq_index=exog_model.target_seq_index)\n",
    "print(f\"sMAPE (w/ exog)  = {exog_smape:.2f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39749b73",
   "metadata": {},
   "source": [
    "Before we wrap up this tutorial, we note that the exogenous variables contain a lot of missing data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "5f2690f8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>IsHoliday</th>\n",
       "      <th>MarkDown1</th>\n",
       "      <th>MarkDown2</th>\n",
       "      <th>MarkDown3</th>\n",
       "      <th>MarkDown4</th>\n",
       "      <th>MarkDown5</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-02-05</th>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-12</th>\n",
       "      <td>True</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-19</th>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-26</th>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-03-05</th>\n",
       "      <td>False</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-09-28</th>\n",
       "      <td>False</td>\n",
       "      <td>7106.05</td>\n",
       "      <td>1.91</td>\n",
       "      <td>1.65</td>\n",
       "      <td>1549.10</td>\n",
       "      <td>3946.03</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-05</th>\n",
       "      <td>False</td>\n",
       "      <td>6037.76</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.04</td>\n",
       "      <td>3027.37</td>\n",
       "      <td>3853.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-12</th>\n",
       "      <td>False</td>\n",
       "      <td>2145.50</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33.31</td>\n",
       "      <td>586.83</td>\n",
       "      <td>10421.01</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-19</th>\n",
       "      <td>False</td>\n",
       "      <td>4461.89</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.14</td>\n",
       "      <td>1579.67</td>\n",
       "      <td>2642.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-26</th>\n",
       "      <td>False</td>\n",
       "      <td>6152.59</td>\n",
       "      <td>129.77</td>\n",
       "      <td>200.00</td>\n",
       "      <td>272.29</td>\n",
       "      <td>2924.15</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>143 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            IsHoliday  MarkDown1  MarkDown2  MarkDown3  MarkDown4  MarkDown5\n",
       "Date                                                                        \n",
       "2010-02-05      False        NaN        NaN        NaN        NaN        NaN\n",
       "2010-02-12       True        NaN        NaN        NaN        NaN        NaN\n",
       "2010-02-19      False        NaN        NaN        NaN        NaN        NaN\n",
       "2010-02-26      False        NaN        NaN        NaN        NaN        NaN\n",
       "2010-03-05      False        NaN        NaN        NaN        NaN        NaN\n",
       "...               ...        ...        ...        ...        ...        ...\n",
       "2012-09-28      False    7106.05       1.91       1.65    1549.10    3946.03\n",
       "2012-10-05      False    6037.76        NaN      10.04    3027.37    3853.40\n",
       "2012-10-12      False    2145.50        NaN      33.31     586.83   10421.01\n",
       "2012-10-19      False    4461.89        NaN       1.14    1579.67    2642.29\n",
       "2012-10-26      False    6152.59     129.77     200.00     272.29    2924.15\n",
       "\n",
       "[143 rows x 6 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(exog.to_pd())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3c44fa9",
   "metadata": {},
   "source": [
    "Behind the scenes, Merlion models will apply an optional `exog_transform` to the exogenous variables, and they will then resample the exogenous variables to the same timestamps as the endogenous variables. This resampling is achieved using the `exog_missing_value_policy` and `exog_aggregation_policy`, which can be specified in the config of any model which accepts exogenous regressors. We can see the default values for each of these parameters by inspecting the config:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "f5a3707e",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Default exog_transform:            MeanVarNormalize\n",
      "Default exog_missing_value_policy: MissingValuePolicy.ZFill\n",
      "Default exog_aggregation_policy:   AggregationPolicy.Mean\n"
     ]
    }
   ],
   "source": [
    "print(f\"Default exog_transform:            {type(exog_model.config.exog_transform).__name__}\")\n",
    "print(f\"Default exog_missing_value_policy: {exog_model.config.exog_missing_value_policy}\")\n",
    "print(f\"Default exog_aggregation_policy:   {exog_model.config.exog_aggregation_policy}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25b42747",
   "metadata": {},
   "source": [
    "So in this case, we first apply mean-variance normalization to the exogenous data. Then, we impute missing values by filling them with zeros (`ZFill`), and we downsample the exogenous data by taking the `Mean` of any relevant windows."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}