{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "f32100be",
   "metadata": {},
   "source": [
    "# Loading Custom Datasets\n",
    "\n",
    "This notebook will explain how to load custom datasets saved to CSV files, for either anomaly detection or forecasting."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "91095c9b",
   "metadata": {},
   "source": [
    "## Anomaly Detection Datasets\n",
    "\n",
    "Let's first look at a synthetic anomaly detection dataset. Note that this section just provides an alternative implementation of the dataset `ts_datasets.anomaly.Synthetic`. We begin by listing all the CSV files in the relevant directory. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b4886d69",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../data/synthetic_anomaly/horizontal.csv\n",
      "../data/synthetic_anomaly/horizontal_dip_anomaly.csv\n",
      "../data/synthetic_anomaly/horizontal_level_anomaly.csv\n",
      "../data/synthetic_anomaly/horizontal_shock_anomaly.csv\n",
      "../data/synthetic_anomaly/horizontal_spike_anomaly.csv\n",
      "../data/synthetic_anomaly/horizontal_trend_anomaly.csv\n",
      "../data/synthetic_anomaly/seasonal.csv\n",
      "../data/synthetic_anomaly/seasonal_dip_anomaly.csv\n",
      "../data/synthetic_anomaly/seasonal_level_anomaly.csv\n",
      "../data/synthetic_anomaly/seasonal_shock_anomaly.csv\n",
      "../data/synthetic_anomaly/seasonal_spike_anomaly.csv\n",
      "../data/synthetic_anomaly/seasonal_trend_anomaly.csv\n",
      "../data/synthetic_anomaly/upward_downward.csv\n",
      "../data/synthetic_anomaly/upward_downward_dip_anomaly.csv\n",
      "../data/synthetic_anomaly/upward_downward_level_anomaly.csv\n",
      "../data/synthetic_anomaly/upward_downward_shock_anomaly.csv\n",
      "../data/synthetic_anomaly/upward_downward_spike_anomaly.csv\n",
      "../data/synthetic_anomaly/upward_downward_trend_anomaly.csv\n"
     ]
    }
   ],
   "source": [
    "import glob\n",
    "import os\n",
    "anom_dir = os.path.join(\"..\", \"data\", \"synthetic_anomaly\")\n",
    "csvs = sorted(glob.glob(f\"{anom_dir}/*.csv\"))\n",
    "for csv in csvs:\n",
    "    print(csv)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d319673",
   "metadata": {},
   "source": [
    "Let's visualize what a couple of these CSVs look like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3151334c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../data/synthetic_anomaly/horizontal.csv\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>timestamp</th>\n",
       "      <th>horizontal</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1.928031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>300</td>\n",
       "      <td>-1.156620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>600</td>\n",
       "      <td>-0.390650</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>900</td>\n",
       "      <td>0.400804</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1200</td>\n",
       "      <td>-0.874490</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2998500</td>\n",
       "      <td>0.362724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2998800</td>\n",
       "      <td>2.657373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2999100</td>\n",
       "      <td>1.472341</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2999400</td>\n",
       "      <td>1.033154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2999700</td>\n",
       "      <td>2.950466</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      timestamp  horizontal\n",
       "0             0    1.928031\n",
       "1           300   -1.156620\n",
       "2           600   -0.390650\n",
       "3           900    0.400804\n",
       "4          1200   -0.874490\n",
       "...         ...         ...\n",
       "9995    2998500    0.362724\n",
       "9996    2998800    2.657373\n",
       "9997    2999100    1.472341\n",
       "9998    2999400    1.033154\n",
       "9999    2999700    2.950466\n",
       "\n",
       "[10000 rows x 2 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "../data/synthetic_anomaly/seasonal_level_anomaly.csv\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>timestamp</th>\n",
       "      <th>seasonal</th>\n",
       "      <th>anomaly</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>-0.577883</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>300</td>\n",
       "      <td>1.059779</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>600</td>\n",
       "      <td>1.137609</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>900</td>\n",
       "      <td>0.743360</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1200</td>\n",
       "      <td>1.998400</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9995</th>\n",
       "      <td>2998500</td>\n",
       "      <td>-5.388685</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9996</th>\n",
       "      <td>2998800</td>\n",
       "      <td>-5.017828</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9997</th>\n",
       "      <td>2999100</td>\n",
       "      <td>-4.196791</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9998</th>\n",
       "      <td>2999400</td>\n",
       "      <td>-4.234555</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9999</th>\n",
       "      <td>2999700</td>\n",
       "      <td>-3.111685</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      timestamp  seasonal  anomaly\n",
       "0             0 -0.577883      0.0\n",
       "1           300  1.059779      0.0\n",
       "2           600  1.137609      0.0\n",
       "3           900  0.743360      0.0\n",
       "4          1200  1.998400      0.0\n",
       "...         ...       ...      ...\n",
       "9995    2998500 -5.388685      0.0\n",
       "9996    2998800 -5.017828      0.0\n",
       "9997    2999100 -4.196791      0.0\n",
       "9998    2999400 -4.234555      0.0\n",
       "9999    2999700 -3.111685      0.0\n",
       "\n",
       "[10000 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "from IPython.display import display\n",
    "\n",
    "for csv in [csvs[0], csvs[8]]:\n",
    "    print(csv)\n",
    "    display(pd.read_csv(csv))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4dd0360b",
   "metadata": {},
   "source": [
    "Each CSV in the dataset has the following important characteristics:\n",
    "\n",
    "- a time column `timestamp` (here, a Unix timestamp expressed in units of seconds);\n",
    "- a column `anomaly` indicating whether a timestamp is anomalous or not (though this is absent for time series which don't contain any anomalies);\n",
    "- one or more columns for the actual data values\n",
    "\n",
    "We can create a data loader for all the CSV files in this dataset as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "69bbc96d",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ts_datasets.anomaly import CustomAnomalyDataset\n",
    "dataset = CustomAnomalyDataset(\n",
    "    rootdir=anom_dir,       # where the data is stored\n",
    "    test_frac=0.75,         # use 75% of each time series for testing. \n",
    "                            # overridden if the column `trainval` is in the actual CSV.\n",
    "    time_unit=\"s\",          # the timestamp column (automatically detected) is in units of seconds\n",
    "    assume_no_anomaly=True  # if a CSV doesn't have the \"anomaly\" column, assume it has no anomalies\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "bc2d0778",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 18 time series in this dataset.\n"
     ]
    }
   ],
   "source": [
    "print(f\"There are {len(dataset)} time series in this dataset.\")\n",
    "time_series, metadata = dataset[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9d1f1568",
   "metadata": {},
   "source": [
    "This particular time series is univariate. Its variable is named \"horizontal\". "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c2a87bf9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>horizontal</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>timestamp</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:00:00</th>\n",
       "      <td>1.928031</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:05:00</th>\n",
       "      <td>-1.156620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:10:00</th>\n",
       "      <td>-0.390650</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:15:00</th>\n",
       "      <td>0.400804</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:20:00</th>\n",
       "      <td>-0.874490</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 16:55:00</th>\n",
       "      <td>0.362724</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:00:00</th>\n",
       "      <td>2.657373</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:05:00</th>\n",
       "      <td>1.472341</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:10:00</th>\n",
       "      <td>1.033154</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:15:00</th>\n",
       "      <td>2.950466</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     horizontal\n",
       "timestamp                      \n",
       "1970-01-01 00:00:00    1.928031\n",
       "1970-01-01 00:05:00   -1.156620\n",
       "1970-01-01 00:10:00   -0.390650\n",
       "1970-01-01 00:15:00    0.400804\n",
       "1970-01-01 00:20:00   -0.874490\n",
       "...                         ...\n",
       "1970-02-04 16:55:00    0.362724\n",
       "1970-02-04 17:00:00    2.657373\n",
       "1970-02-04 17:05:00    1.472341\n",
       "1970-02-04 17:10:00    1.033154\n",
       "1970-02-04 17:15:00    2.950466\n",
       "\n",
       "[10000 rows x 1 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(time_series)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ec03b3f1",
   "metadata": {},
   "source": [
    "The metadata has the same timestamps as the time series. It contains \"anomaly\" and \"trainval\" columns. These respectively indicate whether each timestamp is anomalous, and whether each timestamp is for training/validation or testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "3e5eb1d4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>anomaly</th>\n",
       "      <th>trainval</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>timestamp</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:00:00</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:05:00</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:10:00</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:15:00</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-01-01 00:20:00</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 16:55:00</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:00:00</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:05:00</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:10:00</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1970-02-04 17:15:00</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10000 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     anomaly  trainval\n",
       "timestamp                             \n",
       "1970-01-01 00:00:00    False      True\n",
       "1970-01-01 00:05:00    False      True\n",
       "1970-01-01 00:10:00    False      True\n",
       "1970-01-01 00:15:00    False      True\n",
       "1970-01-01 00:20:00    False      True\n",
       "...                      ...       ...\n",
       "1970-02-04 16:55:00    False     False\n",
       "1970-02-04 17:00:00    False     False\n",
       "1970-02-04 17:05:00    False     False\n",
       "1970-02-04 17:10:00    False     False\n",
       "1970-02-04 17:15:00    False     False\n",
       "\n",
       "[10000 rows x 2 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(metadata)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a911fea8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "75.0% of the time series is for testing.\n",
      "19.57% of the time series is anomalous.\n"
     ]
    }
   ],
   "source": [
    "print(f\"{100 - metadata.trainval.mean() * 100}% of the time series is for testing.\")\n",
    "print(f\"{metadata.anomaly.mean() * 100}% of the time series is anomalous.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "63a181a3",
   "metadata": {},
   "source": [
    "## General Purpose (Forecasting) Datasets\n",
    "\n",
    "Next, let's load a more general-purpose dataset for forecasting. We will use this opportunity to show some of the more advanced features as well. Here, our dataset consists of a single CSV file which contains many multivariate time series. These time series are collected from a large retailer, and each individual time series corresonds to a different department within a different store. Let's have a look at the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "2d0809ae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Store</th>\n",
       "      <th>Dept</th>\n",
       "      <th>Date</th>\n",
       "      <th>Weekly_Sales</th>\n",
       "      <th>Temperature</th>\n",
       "      <th>Fuel_Price</th>\n",
       "      <th>MarkDown1</th>\n",
       "      <th>MarkDown2</th>\n",
       "      <th>MarkDown3</th>\n",
       "      <th>MarkDown4</th>\n",
       "      <th>MarkDown5</th>\n",
       "      <th>CPI</th>\n",
       "      <th>Unemployment</th>\n",
       "      <th>IsHoliday</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2010-02-05</td>\n",
       "      <td>24924.50</td>\n",
       "      <td>42.31</td>\n",
       "      <td>2.572</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.096358</td>\n",
       "      <td>8.106</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2010-02-12</td>\n",
       "      <td>46039.49</td>\n",
       "      <td>38.51</td>\n",
       "      <td>2.548</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.242170</td>\n",
       "      <td>8.106</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2010-02-19</td>\n",
       "      <td>41595.55</td>\n",
       "      <td>39.93</td>\n",
       "      <td>2.514</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.289143</td>\n",
       "      <td>8.106</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2010-02-26</td>\n",
       "      <td>19403.54</td>\n",
       "      <td>46.63</td>\n",
       "      <td>2.561</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.319643</td>\n",
       "      <td>8.106</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2010-03-05</td>\n",
       "      <td>21827.90</td>\n",
       "      <td>46.50</td>\n",
       "      <td>2.625</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.350143</td>\n",
       "      <td>8.106</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2855</th>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>2012-09-28</td>\n",
       "      <td>37104.67</td>\n",
       "      <td>79.45</td>\n",
       "      <td>3.666</td>\n",
       "      <td>7106.05</td>\n",
       "      <td>1.91</td>\n",
       "      <td>1.65</td>\n",
       "      <td>1549.10</td>\n",
       "      <td>3946.03</td>\n",
       "      <td>222.616433</td>\n",
       "      <td>6.565</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2856</th>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>2012-10-05</td>\n",
       "      <td>36361.28</td>\n",
       "      <td>70.27</td>\n",
       "      <td>3.617</td>\n",
       "      <td>6037.76</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.04</td>\n",
       "      <td>3027.37</td>\n",
       "      <td>3853.40</td>\n",
       "      <td>222.815930</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2857</th>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>2012-10-12</td>\n",
       "      <td>35332.34</td>\n",
       "      <td>60.97</td>\n",
       "      <td>3.601</td>\n",
       "      <td>2145.50</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33.31</td>\n",
       "      <td>586.83</td>\n",
       "      <td>10421.01</td>\n",
       "      <td>223.015426</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2858</th>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>2012-10-19</td>\n",
       "      <td>35721.09</td>\n",
       "      <td>68.08</td>\n",
       "      <td>3.594</td>\n",
       "      <td>4461.89</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.14</td>\n",
       "      <td>1579.67</td>\n",
       "      <td>2642.29</td>\n",
       "      <td>223.059808</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2859</th>\n",
       "      <td>2</td>\n",
       "      <td>10</td>\n",
       "      <td>2012-10-26</td>\n",
       "      <td>34260.76</td>\n",
       "      <td>69.79</td>\n",
       "      <td>3.506</td>\n",
       "      <td>6152.59</td>\n",
       "      <td>129.77</td>\n",
       "      <td>200.00</td>\n",
       "      <td>272.29</td>\n",
       "      <td>2924.15</td>\n",
       "      <td>223.078337</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2860 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      Store  Dept        Date  Weekly_Sales  Temperature  Fuel_Price  \\\n",
       "0         1     1  2010-02-05      24924.50        42.31       2.572   \n",
       "1         1     1  2010-02-12      46039.49        38.51       2.548   \n",
       "2         1     1  2010-02-19      41595.55        39.93       2.514   \n",
       "3         1     1  2010-02-26      19403.54        46.63       2.561   \n",
       "4         1     1  2010-03-05      21827.90        46.50       2.625   \n",
       "...     ...   ...         ...           ...          ...         ...   \n",
       "2855      2    10  2012-09-28      37104.67        79.45       3.666   \n",
       "2856      2    10  2012-10-05      36361.28        70.27       3.617   \n",
       "2857      2    10  2012-10-12      35332.34        60.97       3.601   \n",
       "2858      2    10  2012-10-19      35721.09        68.08       3.594   \n",
       "2859      2    10  2012-10-26      34260.76        69.79       3.506   \n",
       "\n",
       "      MarkDown1  MarkDown2  MarkDown3  MarkDown4  MarkDown5         CPI  \\\n",
       "0           NaN        NaN        NaN        NaN        NaN  211.096358   \n",
       "1           NaN        NaN        NaN        NaN        NaN  211.242170   \n",
       "2           NaN        NaN        NaN        NaN        NaN  211.289143   \n",
       "3           NaN        NaN        NaN        NaN        NaN  211.319643   \n",
       "4           NaN        NaN        NaN        NaN        NaN  211.350143   \n",
       "...         ...        ...        ...        ...        ...         ...   \n",
       "2855    7106.05       1.91       1.65    1549.10    3946.03  222.616433   \n",
       "2856    6037.76        NaN      10.04    3027.37    3853.40  222.815930   \n",
       "2857    2145.50        NaN      33.31     586.83   10421.01  223.015426   \n",
       "2858    4461.89        NaN       1.14    1579.67    2642.29  223.059808   \n",
       "2859    6152.59     129.77     200.00     272.29    2924.15  223.078337   \n",
       "\n",
       "      Unemployment  IsHoliday  \n",
       "0            8.106      False  \n",
       "1            8.106       True  \n",
       "2            8.106      False  \n",
       "3            8.106      False  \n",
       "4            8.106      False  \n",
       "...            ...        ...  \n",
       "2855         6.565      False  \n",
       "2856         6.170      False  \n",
       "2857         6.170      False  \n",
       "2858         6.170      False  \n",
       "2859         6.170      False  \n",
       "\n",
       "[2860 rows x 14 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "csv = os.path.join(\"..\", \"data\", \"walmart\", \"walmart_mini.csv\")\n",
    "display(pd.read_csv(csv))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5fde813d",
   "metadata": {},
   "source": [
    "As before, we have a column `Date` indicating the time. Note that in this case, we have a string rather than a timestamp; this is also okay. However, we now also have some index columns `Store` and `Dept` which are used to distinguish between different time series. We specify these to the data loader."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "fe500896",
   "metadata": {},
   "outputs": [],
   "source": [
    "from ts_datasets.forecast import CustomDataset\n",
    "dataset = CustomDataset(\n",
    "    rootdir=csv,                  # where the data is stored\n",
    "    index_cols=[\"Store\", \"Dept\"], # Individual time series are indexed by store & department\n",
    "    test_frac=0.75,               # use 25% of each time series for testing. \n",
    "                                  # overridden if the column `trainval` is in the actual CSV.\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "8ca5296f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 20 time series in this dataset.\n"
     ]
    }
   ],
   "source": [
    "print(f\"There are {len(dataset)} time series in this dataset.\")\n",
    "time_series, metadata = dataset[17]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7cfc92a8",
   "metadata": {},
   "source": [
    "This particular time series is multivariate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "301d9344",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Weekly_Sales</th>\n",
       "      <th>Temperature</th>\n",
       "      <th>Fuel_Price</th>\n",
       "      <th>MarkDown1</th>\n",
       "      <th>MarkDown2</th>\n",
       "      <th>MarkDown3</th>\n",
       "      <th>MarkDown4</th>\n",
       "      <th>MarkDown5</th>\n",
       "      <th>CPI</th>\n",
       "      <th>Unemployment</th>\n",
       "      <th>IsHoliday</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-02-05</th>\n",
       "      <td>69634.80</td>\n",
       "      <td>40.19</td>\n",
       "      <td>2.572</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.752605</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-12</th>\n",
       "      <td>63393.29</td>\n",
       "      <td>38.49</td>\n",
       "      <td>2.548</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.897994</td>\n",
       "      <td>8.324</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-19</th>\n",
       "      <td>66589.27</td>\n",
       "      <td>39.69</td>\n",
       "      <td>2.514</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.945160</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-26</th>\n",
       "      <td>61875.48</td>\n",
       "      <td>46.10</td>\n",
       "      <td>2.561</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>210.975957</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-03-05</th>\n",
       "      <td>67041.18</td>\n",
       "      <td>47.17</td>\n",
       "      <td>2.625</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>211.006754</td>\n",
       "      <td>8.324</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-09-28</th>\n",
       "      <td>57424.00</td>\n",
       "      <td>79.45</td>\n",
       "      <td>3.666</td>\n",
       "      <td>7106.05</td>\n",
       "      <td>1.91</td>\n",
       "      <td>1.65</td>\n",
       "      <td>1549.10</td>\n",
       "      <td>3946.03</td>\n",
       "      <td>222.616433</td>\n",
       "      <td>6.565</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-05</th>\n",
       "      <td>62955.51</td>\n",
       "      <td>70.27</td>\n",
       "      <td>3.617</td>\n",
       "      <td>6037.76</td>\n",
       "      <td>NaN</td>\n",
       "      <td>10.04</td>\n",
       "      <td>3027.37</td>\n",
       "      <td>3853.40</td>\n",
       "      <td>222.815930</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-12</th>\n",
       "      <td>63083.63</td>\n",
       "      <td>60.97</td>\n",
       "      <td>3.601</td>\n",
       "      <td>2145.50</td>\n",
       "      <td>NaN</td>\n",
       "      <td>33.31</td>\n",
       "      <td>586.83</td>\n",
       "      <td>10421.01</td>\n",
       "      <td>223.015426</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-19</th>\n",
       "      <td>60502.97</td>\n",
       "      <td>68.08</td>\n",
       "      <td>3.594</td>\n",
       "      <td>4461.89</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1.14</td>\n",
       "      <td>1579.67</td>\n",
       "      <td>2642.29</td>\n",
       "      <td>223.059808</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-26</th>\n",
       "      <td>63992.36</td>\n",
       "      <td>69.79</td>\n",
       "      <td>3.506</td>\n",
       "      <td>6152.59</td>\n",
       "      <td>129.77</td>\n",
       "      <td>200.00</td>\n",
       "      <td>272.29</td>\n",
       "      <td>2924.15</td>\n",
       "      <td>223.078337</td>\n",
       "      <td>6.170</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>143 rows × 11 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            Weekly_Sales  Temperature  Fuel_Price  MarkDown1  MarkDown2  \\\n",
       "Date                                                                      \n",
       "2010-02-05      69634.80        40.19       2.572        NaN        NaN   \n",
       "2010-02-12      63393.29        38.49       2.548        NaN        NaN   \n",
       "2010-02-19      66589.27        39.69       2.514        NaN        NaN   \n",
       "2010-02-26      61875.48        46.10       2.561        NaN        NaN   \n",
       "2010-03-05      67041.18        47.17       2.625        NaN        NaN   \n",
       "...                  ...          ...         ...        ...        ...   \n",
       "2012-09-28      57424.00        79.45       3.666    7106.05       1.91   \n",
       "2012-10-05      62955.51        70.27       3.617    6037.76        NaN   \n",
       "2012-10-12      63083.63        60.97       3.601    2145.50        NaN   \n",
       "2012-10-19      60502.97        68.08       3.594    4461.89        NaN   \n",
       "2012-10-26      63992.36        69.79       3.506    6152.59     129.77   \n",
       "\n",
       "            MarkDown3  MarkDown4  MarkDown5         CPI  Unemployment  \\\n",
       "Date                                                                    \n",
       "2010-02-05        NaN        NaN        NaN  210.752605         8.324   \n",
       "2010-02-12        NaN        NaN        NaN  210.897994         8.324   \n",
       "2010-02-19        NaN        NaN        NaN  210.945160         8.324   \n",
       "2010-02-26        NaN        NaN        NaN  210.975957         8.324   \n",
       "2010-03-05        NaN        NaN        NaN  211.006754         8.324   \n",
       "...               ...        ...        ...         ...           ...   \n",
       "2012-09-28       1.65    1549.10    3946.03  222.616433         6.565   \n",
       "2012-10-05      10.04    3027.37    3853.40  222.815930         6.170   \n",
       "2012-10-12      33.31     586.83   10421.01  223.015426         6.170   \n",
       "2012-10-19       1.14    1579.67    2642.29  223.059808         6.170   \n",
       "2012-10-26     200.00     272.29    2924.15  223.078337         6.170   \n",
       "\n",
       "            IsHoliday  \n",
       "Date                   \n",
       "2010-02-05      False  \n",
       "2010-02-12       True  \n",
       "2010-02-19      False  \n",
       "2010-02-26      False  \n",
       "2010-03-05      False  \n",
       "...               ...  \n",
       "2012-09-28      False  \n",
       "2012-10-05      False  \n",
       "2012-10-12      False  \n",
       "2012-10-19      False  \n",
       "2012-10-26      False  \n",
       "\n",
       "[143 rows x 11 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(time_series)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "33926c81",
   "metadata": {},
   "source": [
    "The metadata has the same timestamps as the time series. It has a \"trainval\" column as before, plus index columns \"Store\" and \"Dept\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4d3cd301",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>trainval</th>\n",
       "      <th>Store</th>\n",
       "      <th>Dept</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2010-02-05</th>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-12</th>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-19</th>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-02-26</th>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2010-03-05</th>\n",
       "      <td>True</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-09-28</th>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-05</th>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-12</th>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-19</th>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2012-10-26</th>\n",
       "      <td>False</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>143 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            trainval  Store  Dept\n",
       "Date                             \n",
       "2010-02-05      True      2     8\n",
       "2010-02-12      True      2     8\n",
       "2010-02-19      True      2     8\n",
       "2010-02-26      True      2     8\n",
       "2010-03-05      True      2     8\n",
       "...              ...    ...   ...\n",
       "2012-09-28     False      2     8\n",
       "2012-10-05     False      2     8\n",
       "2012-10-12     False      2     8\n",
       "2012-10-19     False      2     8\n",
       "2012-10-26     False      2     8\n",
       "\n",
       "[143 rows x 3 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19562928",
   "metadata": {},
   "source": [
    "## Broader Takeaways\n",
    "\n",
    "In general, a dataset can contain any number of CSVs stored under a single root directory. Each CSV can contain one or more time series, where the different time series within a single file are indicated by different values of the index column. Note that this works for anomaly detection as well! You just need to make sure that your CSVs all contain the `anomaly` column. In general, all features supported by `CustomDataset` are also supported by `CustomAnomalyDataset`, as long as your CSV files have the `anomaly` column.\n",
    "\n",
    "If you want to either of the above custom datasets for benchmarking, you can call\n",
    "\n",
    "```\n",
    "python benchmark_anomaly.py --model IsolationForest --retrain_freq 7d \\\n",
    "    --dataset CustomAnomalyDataset --data_root data/synthetic_anomaly \\\n",
    "    --data_kwargs '{\"assume_no_anomaly\": true, \"test_frac\": 0.75}'\n",
    "```\n",
    "\n",
    "or \n",
    "\n",
    "```\n",
    "python benchmark_forecast.py --model AutoETS  \\\n",
    "    --dataset CustomDataset --data_root data/walmart/walmart_mini.csv \\\n",
    "    --data_kwargs '{\"test_frac\": 0.25, \\\n",
    "                    \"index_cols\": [\"Store\", \"Dept\"], \\\n",
    "                    \"data_cols\": [\"Weekly_Sales\"]}'\n",
    "```\n",
    "\n",
    "Note in the example above, we specify \"data_cols\" as \"Weekly_Sales\". This indicates that the only column we are modeling is Weekly_Sales. If you wanted to do multivariate prediction, you could also add \"Temperature\", \"Fuel_Price\", \"CPI\", etc. We treat the first of the data columns as the target univariate whose value you wish to forecast."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}