{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Merlion's Data Format\n",
    "\n",
    "This notebook will explain how to use Merlion's `UnivariateTimeSeries` and `TimeSeries` classes. These classes are the core data format used throughout the repo. In general, you may think of each `TimeSeries` as being a collection of `UnivariateTimeSeries` objects, one for each variable. \n",
    "\n",
    "Let's start by loading some data using `pandas`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "       timestamp_millis       kpi  kpi_label\n",
      "0         1583140320000   667.118          0\n",
      "1         1583140380000   611.751          0\n",
      "2         1583140440000   599.456          0\n",
      "3         1583140500000   621.446          0\n",
      "4         1583140560000  1418.234          0\n",
      "...                 ...       ...        ...\n",
      "86802     1588376760000   874.214          0\n",
      "86803     1588376820000   937.929          0\n",
      "86804     1588376880000  1031.279          0\n",
      "86805     1588376940000  1099.698          0\n",
      "86806     1588377000000   935.405          0\n",
      "\n",
      "[86807 rows x 3 columns]\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "\n",
    "df = pd.read_csv(os.path.join(\"..\", \"data\", \"example.csv\"))\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The column `timestamp_millis` consists of Unix timestamps (in units of milliseconds), and the column `kpi` contains the value of the time series metric at each of those timestamps. We will also create a version of this dataframe that is indexed by time:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                          kpi  kpi_label\n",
      "timestamp_millis                        \n",
      "2020-03-02 09:12:00   667.118          0\n",
      "2020-03-02 09:13:00   611.751          0\n",
      "2020-03-02 09:14:00   599.456          0\n",
      "2020-03-02 09:15:00   621.446          0\n",
      "2020-03-02 09:16:00  1418.234          0\n",
      "...                       ...        ...\n",
      "2020-05-01 23:46:00   874.214          0\n",
      "2020-05-01 23:47:00   937.929          0\n",
      "2020-05-01 23:48:00  1031.279          0\n",
      "2020-05-01 23:49:00  1099.698          0\n",
      "2020-05-01 23:50:00   935.405          0\n",
      "\n",
      "[86807 rows x 2 columns]\n"
     ]
    }
   ],
   "source": [
    "time_idx_df = df.copy()\n",
    "time_idx_df[\"timestamp_millis\"] = pd.to_datetime(time_idx_df[\"timestamp_millis\"], unit=\"ms\")\n",
    "time_idx_df = time_idx_df.set_index(\"timestamp_millis\")\n",
    "print(time_idx_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## UnivariateTimeSeries: The Basic Building Block\n",
    "\n",
    "The most transparent way to initialize a `UnivariateTimeSeries` is to use its constructor. The constructor takes two arguments: `time_stamps`, a list of Unix timestamps (in units of seconds) or datetime objects, and `values`, a list of the actual time series values. You may optionally provide a name as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from merlion.utils import UnivariateTimeSeries\n",
    "\n",
    "kpi = UnivariateTimeSeries(\n",
    "    time_stamps=df.timestamp_millis/1000,  # timestamps in units of seconds\n",
    "    values=df.kpi,                         # time series values\n",
    "    name=\"kpi\"                             # optional: a name for this univariate\n",
    ")\n",
    "\n",
    "kpi_label = UnivariateTimeSeries(\n",
    "    time_stamps=df.timestamp_millis/1000,  # timestamps in units of seconds\n",
    "    values=df.kpi_label                    # time series values\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you may initialize a `UnivariateTimeSeries` directly from a time-indexed `pd.Series`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Are the two UnivariateTimeSeries equal? True\n"
     ]
    }
   ],
   "source": [
    "kpi_equivalent = UnivariateTimeSeries.from_pd(time_idx_df.kpi)\n",
    "print(f\"Are the two UnivariateTimeSeries equal? {(kpi == kpi_equivalent).all()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We implement the `UnivariateTimeSeries` as a `pd.Series` with a `DatetimeIndex`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Is UnivariateTimeSeries an instance of pd.Series? True\n"
     ]
    }
   ],
   "source": [
    "print(f\"Is {type(kpi).__name__} an instance of pd.Series? \"\n",
    "      f\"{isinstance(kpi, pd.Series)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-03-02 09:12:00     667.118\n",
      "2020-03-02 09:13:00     611.751\n",
      "2020-03-02 09:14:00     599.456\n",
      "2020-03-02 09:15:00     621.446\n",
      "2020-03-02 09:16:00    1418.234\n",
      "                         ...   \n",
      "2020-05-01 23:46:00     874.214\n",
      "2020-05-01 23:47:00     937.929\n",
      "2020-05-01 23:48:00    1031.279\n",
      "2020-05-01 23:49:00    1099.698\n",
      "2020-05-01 23:50:00     935.405\n",
      "Name: kpi, Length: 86807, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(kpi)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also convert a `UnivariateTimeSeries` back to a regular `pd.Series` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "type(kpi.to_pd()) = <class 'pandas.core.series.Series'>\n"
     ]
    }
   ],
   "source": [
    "print(f\"type(kpi.to_pd()) = {type(kpi.to_pd())}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access the timestamps (either as timestamps or datetime objects) and values independently:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[1583140320.0, 1583140380.0, 1583140440.0, 1583140500.0, 1583140560.0]\n"
     ]
    }
   ],
   "source": [
    "# Get the Unix timestamps (first 5 for brevity)\n",
    "print(kpi.time_stamps[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DatetimeIndex(['2020-03-02 09:12:00', '2020-03-02 09:13:00',\n",
      "               '2020-03-02 09:14:00', '2020-03-02 09:15:00',\n",
      "               '2020-03-02 09:16:00'],\n",
      "              dtype='datetime64[ns]', freq=None)\n"
     ]
    }
   ],
   "source": [
    "# Get the datetimes (this is just the index of the UnivariateTimeSeries,\n",
    "# since we inherit from pd.Series)\n",
    "print(kpi.index[:5])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[667.118, 611.751, 599.456, 621.446, 1418.234]\n"
     ]
    }
   ],
   "source": [
    "# Get the values\n",
    "print(kpi.values[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may index into a `UnivariateTimeSeries` to obtain a tuple of `(timestamp, value)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "kpi[0] = (1583140320.0, 667.118)\n"
     ]
    }
   ],
   "source": [
    "print(f\"kpi[0] = {kpi[0]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you instead use a slice index, you will obtain a new `UnivariateTimeSeries`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "type(kpi[1:5]) = <class 'merlion.utils.time_series.UnivariateTimeSeries'>\n",
      "\n",
      "kpi[1:5] = \n",
      "\n",
      "2020-03-02 09:13:00     611.751\n",
      "2020-03-02 09:14:00     599.456\n",
      "2020-03-02 09:15:00     621.446\n",
      "2020-03-02 09:16:00    1418.234\n",
      "Name: kpi, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(f\"type(kpi[1:5]) = {type(kpi[1:5])}\\n\")\n",
    "print(f\"kpi[1:5] = \\n\\n{kpi[1:5]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Iterating over a `UnivaraiateTimeSeries` will iterate over tuples of `(timestamp, value)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1583140320.0, 667.118)\n",
      "(1583140380.0, 611.751)\n",
      "(1583140440.0, 599.456)\n",
      "(1583140500.0, 621.446)\n",
      "(1583140560.0, 1418.234)\n"
     ]
    }
   ],
   "source": [
    "for t, x in kpi[:5]:\n",
    "    print((t, x))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TimeSeries: Merlion's Standard Data Class\n",
    "\n",
    "Because Merlion is a general-purpose library that handles both univariate and multivariate time series, our standard data class is `TimeSeries`. This class acts as a wrapper around a collection of `UnivariateTimeSeries`. We choose this format rather than a vector-based approach because this approach is much more robust to missing values, or different univariates being sampled at different rates.\n",
    "\n",
    "The most transparent way to initialize a `TimeSeries` is with its constructor, which takes a collection (list or (ordered) dictionary) of `UnivariateTimeSeries` its only argument:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from collections import OrderedDict\n",
    "from merlion.utils import TimeSeries\n",
    "\n",
    "time_series_list = TimeSeries(univariates=[kpi.copy(), kpi_label.copy()])\n",
    "time_series_dict = TimeSeries(\n",
    "    univariates=OrderedDict([(\"kpi_renamed\", kpi.copy()),\n",
    "                             (\"kpi_label\", kpi_label.copy())]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Alternatively, you may initialize a `TimeSeries` from a `pd.DataFrame` and convert a `TimeSeries` to a `pd.DataFrame` as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "type(TimeSeries.from_pd(time_idx_df)) = <class 'merlion.utils.time_series.TimeSeries'>\n",
      "\n",
      "(recovered_time_idx_df == time_idx_df).all()\n",
      "kpi          True\n",
      "kpi_label    True\n",
      "dtype: bool\n"
     ]
    }
   ],
   "source": [
    "time_series = TimeSeries.from_pd(time_idx_df)\n",
    "print(f\"type(TimeSeries.from_pd(time_idx_df)) = {type(time_series)}\\n\")\n",
    "\n",
    "recovered_time_idx_df = time_series.to_pd()\n",
    "print(\"(recovered_time_idx_df == time_idx_df).all()\")\n",
    "print((recovered_time_idx_df == time_idx_df).all())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We may access the names of the individual univariates with `time_series.names`, access a specific univariate via `time_series.univariates[name]`, and iterate over univariates by iterating `for univariate in time_series.univariates`. Concretely:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['kpi', 'kpi_label']\n"
     ]
    }
   ],
   "source": [
    "# When we use a list of univariates, we retain the names of the univariates\n",
    "# where possible. If a univariate is unnamed, we set its name to its integer\n",
    "# index in the list of all univariates given. Here, kpi_label was\n",
    "# originally unnamed, so we set its name to 1\n",
    "print(time_series_list.names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['kpi_renamed', 'kpi_label']\n"
     ]
    }
   ],
   "source": [
    "# If we pass a dictionary instead of a list, all univariates will have\n",
    "# their specified names. The order is retained from the OrderedDict.\n",
    "print(time_series_dict.names)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can access the KPI like so:\n",
    "kpi1 = time_series_list.univariates[\"kpi\"]\n",
    "kpi2 = time_series_dict.univariates[\"kpi_renamed\"]\n",
    "\n",
    "# kpi1 and kpi2 are the same univariate, just with different names\n",
    "assert (kpi1 == kpi2).all()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2020-03-02 09:12:00     667.118\n",
      "2020-03-02 09:13:00     611.751\n",
      "2020-03-02 09:14:00     599.456\n",
      "2020-03-02 09:15:00     621.446\n",
      "2020-03-02 09:16:00    1418.234\n",
      "                         ...   \n",
      "2020-05-01 23:46:00     874.214\n",
      "2020-05-01 23:47:00     937.929\n",
      "2020-05-01 23:48:00    1031.279\n",
      "2020-05-01 23:49:00    1099.698\n",
      "2020-05-01 23:50:00     935.405\n",
      "Name: kpi_renamed, Length: 86807, dtype: float64\n",
      "\n",
      "2020-03-02 09:12:00    0.0\n",
      "2020-03-02 09:13:00    0.0\n",
      "2020-03-02 09:14:00    0.0\n",
      "2020-03-02 09:15:00    0.0\n",
      "2020-03-02 09:16:00    0.0\n",
      "                      ... \n",
      "2020-05-01 23:46:00    0.0\n",
      "2020-05-01 23:47:00    0.0\n",
      "2020-05-01 23:48:00    0.0\n",
      "2020-05-01 23:49:00    0.0\n",
      "2020-05-01 23:50:00    0.0\n",
      "Name: kpi_label, Length: 86807, dtype: float64\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# We can iterate over all univariates like so:\n",
    "for univariate in time_series_dict.univariates:\n",
    "    print(univariate)\n",
    "    print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Univariate kpi_renamed\n",
      "2020-03-02 09:12:00     667.118\n",
      "2020-03-02 09:13:00     611.751\n",
      "2020-03-02 09:14:00     599.456\n",
      "2020-03-02 09:15:00     621.446\n",
      "2020-03-02 09:16:00    1418.234\n",
      "                         ...   \n",
      "2020-05-01 23:46:00     874.214\n",
      "2020-05-01 23:47:00     937.929\n",
      "2020-05-01 23:48:00    1031.279\n",
      "2020-05-01 23:49:00    1099.698\n",
      "2020-05-01 23:50:00     935.405\n",
      "Name: kpi_renamed, Length: 86807, dtype: float64\n",
      "\n",
      "Univariate kpi_label\n",
      "2020-03-02 09:12:00    0.0\n",
      "2020-03-02 09:13:00    0.0\n",
      "2020-03-02 09:14:00    0.0\n",
      "2020-03-02 09:15:00    0.0\n",
      "2020-03-02 09:16:00    0.0\n",
      "                      ... \n",
      "2020-05-01 23:46:00    0.0\n",
      "2020-05-01 23:47:00    0.0\n",
      "2020-05-01 23:48:00    0.0\n",
      "2020-05-01 23:49:00    0.0\n",
      "2020-05-01 23:50:00    0.0\n",
      "Name: kpi_label, Length: 86807, dtype: float64\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# We can also iterate over all univariates & names like so:\n",
    "for name, univariate in time_series_dict.items():\n",
    "    print(f\"Univariate {name}\")\n",
    "    print(univariate)\n",
    "    print()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Time Series Indexing & Alignment\n",
    "\n",
    "An important concept of `TimeSeries` in Merlion is _alignment_. We call a time series _aligned_ if all of its univariates are sampled at the same time stamps. We illustrate examples of time series that are and aren't aligned below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Is aligned? True\n"
     ]
    }
   ],
   "source": [
    "aligned = TimeSeries({\"kpi\": kpi.copy(), \"kpi_label\": kpi_label.copy()})\n",
    "print(f\"Is aligned? {aligned.is_aligned}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Is aligned? False\n"
     ]
    }
   ],
   "source": [
    "not_aligned = TimeSeries({\"kpi\": kpi[1:],                # 2020-03-02 09:13:00 to 2020-05-01 23:50:00\n",
    "                          \"kpi_label\": kpi_label[:-1]})  # 2020-03-02 09:12:00 to 2020-05-01 23:49:00\n",
    "print(f\"Is aligned? {not_aligned.is_aligned}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your time series is aligned, you may use an integer index to obtain a tuple `(timestamp, (value_1, ..., value_k))`, or a slice index to obtain a sub-`TimeSeries`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(1583140320.0, (667.118, 0.0))"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aligned[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "type(aligned[1:5]) = <class 'merlion.utils.time_series.TimeSeries'>\n",
      "\n",
      "aligned[1:5] = \n",
      "                          kpi  kpi_label\n",
      "2020-03-02 09:13:00   611.751        0.0\n",
      "2020-03-02 09:14:00   599.456        0.0\n",
      "2020-03-02 09:15:00   621.446        0.0\n",
      "2020-03-02 09:16:00  1418.234        0.0\n"
     ]
    }
   ],
   "source": [
    "print(f\"type(aligned[1:5]) = {type(aligned[1:5])}\\n\")\n",
    "print(f\"aligned[1:5] = \\n{aligned[1:5]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may also iterate over an aligned time series as `for timestamp, (value_1, ..., value_k) in time_series`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(1583140320.0, (667.118, 0.0))\n",
      "(1583140380.0, (611.751, 0.0))\n",
      "(1583140440.0, (599.456, 0.0))\n",
      "(1583140500.0, (621.446, 0.0))\n",
      "(1583140560.0, (1418.234, 0.0))\n"
     ]
    }
   ],
   "source": [
    "for t, (x1, x2) in aligned[:5]:\n",
    "    print((t, (x1, x2)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that Merlion will throw an error if you try to do any of these things with a time series that isn't aligned! For example,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "RuntimeError: The univariates comprising this time series are not aligned (they have different time stamps), but alignment is required to index into the time series.\n"
     ]
    }
   ],
   "source": [
    "try:\n",
    "    not_aligned[0]\n",
    "except RuntimeError as e:\n",
    "    print(f\"{type(e).__name__}: {e}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can still get the length/shape of a misaligned time series, but Merlion will emit a warning."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/abhatnagar/Desktop/Merlion_public/merlion/utils/time_series.py:617: UserWarning: The univariates comprising this time series are not aligned (they have different time stamps). The length returned is equal to the length of the _union_ of all time stamps present in any of the univariates.\n",
      "  warnings.warn(warning)\n",
      "The univariates comprising this time series are not aligned (they have different time stamps). The length returned is equal to the length of the _union_ of all time stamps present in any of the univariates.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "86807\n"
     ]
    }
   ],
   "source": [
    "print(len(not_aligned))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "The univariates comprising this time series are not aligned (they have different time stamps). The length returned is equal to the length of the _union_ of all time stamps present in any of the univariates.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2, 86807)\n"
     ]
    }
   ],
   "source": [
    "print(not_aligned.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, you may call `time_series.align()` to automatically resample the individual univariates of a time series to make it aligned. By default, this will take the union of all the time stamps present in any of the individual univariates, but this is customizable."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Is not_aligned.align() aligned? True\n"
     ]
    }
   ],
   "source": [
    "print(f\"Is not_aligned.align() aligned? {not_aligned.align().is_aligned}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## TimeSeries: A Few Useful Features\n",
    "\n",
    "We provide much more information on the `merlion.utils.time_series.TimeSeries` class in the API docs, but we highlight two more useful features here. These work regardless of whether a time series is aligned!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may obtain the subset of a time series between times `t0` and `tf` by calling `time_series.window(t0, tf)`. `t0` and `tf` may be any reasonable format of datetime, or a Unix timestamp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                          kpi  kpi_label\n",
       "2020-03-05 12:00:00  1166.819        0.0\n",
       "2020-03-05 12:01:00  1345.504        0.0\n",
       "2020-03-05 12:02:00  1061.391        0.0\n",
       "2020-03-05 12:03:00  1260.874        0.0\n",
       "2020-03-05 12:04:00  1202.009        0.0\n",
       "...                       ...        ...\n",
       "2020-03-31 23:55:00  1154.397        0.0\n",
       "2020-03-31 23:56:00  1270.292        0.0\n",
       "2020-03-31 23:57:00  1160.761        0.0\n",
       "2020-03-31 23:58:00  1082.076        0.0\n",
       "2020-03-31 23:59:00  1167.297        0.0\n",
       "\n",
       "[38160 rows x 2 columns]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aligned.window(\"2020-03-05 12:00:00\", pd.Timestamp(year=2020, month=4, day=1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "                          kpi  kpi_label\n",
       "2020-03-02 09:12:00       NaN        0.0\n",
       "2020-03-02 09:13:00   611.751        0.0\n",
       "2020-03-02 09:14:00   599.456        0.0\n",
       "2020-03-02 09:15:00   621.446        0.0\n",
       "2020-03-02 09:16:00  1418.234        0.0\n",
       "...                       ...        ...\n",
       "2020-03-03 09:07:00  1132.564        0.0\n",
       "2020-03-03 09:08:00  1087.037        0.0\n",
       "2020-03-03 09:09:00   984.432        0.0\n",
       "2020-03-03 09:10:00  1085.008        0.0\n",
       "2020-03-03 09:11:00  1020.937        0.0\n",
       "\n",
       "[1440 rows x 2 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Note that the first value of the KPI (which is missing in not_aligned) is NaN\n",
    "not_aligned.window(1583140320, 1583226720)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You may also bisect a time series into a left and right portion, at any timestamp."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Left\n",
      "                          kpi  kpi_label\n",
      "2020-03-02 09:12:00   667.118        0.0\n",
      "2020-03-02 09:13:00   611.751        0.0\n",
      "2020-03-02 09:14:00   599.456        0.0\n",
      "2020-03-02 09:15:00   621.446        0.0\n",
      "2020-03-02 09:16:00  1418.234        0.0\n",
      "...                       ...        ...\n",
      "2020-04-30 23:55:00  1296.091        0.0\n",
      "2020-04-30 23:56:00  1323.743        0.0\n",
      "2020-04-30 23:57:00  1203.672        0.0\n",
      "2020-04-30 23:58:00  1278.720        0.0\n",
      "2020-04-30 23:59:00  1217.877        0.0\n",
      "\n",
      "[85376 rows x 2 columns]\n",
      "\n",
      "\n",
      "Right\n",
      "                          kpi  kpi_label\n",
      "2020-05-01 00:00:00  1381.110        0.0\n",
      "2020-05-01 00:01:00  1807.039        0.0\n",
      "2020-05-01 00:02:00  1833.385        0.0\n",
      "2020-05-01 00:03:00  1674.412        0.0\n",
      "2020-05-01 00:04:00  1683.194        0.0\n",
      "...                       ...        ...\n",
      "2020-05-01 23:46:00   874.214        0.0\n",
      "2020-05-01 23:47:00   937.929        0.0\n",
      "2020-05-01 23:48:00  1031.279        0.0\n",
      "2020-05-01 23:49:00  1099.698        0.0\n",
      "2020-05-01 23:50:00   935.405        0.0\n",
      "\n",
      "[1431 rows x 2 columns]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "left, right = aligned.bisect(\"2020-05-01\")\n",
    "print(f\"Left\\n{left}\\n\")\n",
    "print()\n",
    "print(f\"Right\\n{right}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Please refer to the API docs on `UnivariateTimeSeries` and `TimeSeries` for more information."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}