{ "cells": [ { "cell_type": "markdown", "id": "bd943f7c", "metadata": {}, "source": [ "# Multivariate Time Series Forecasting\n", "\n", "Multivariate time series forecasting works similarly to univariate time series forecasting (covered [here](0_ForecastIntro.ipynb) and [here](1_ForecastFeatures.ipynb)). The main difference is that you must specify the index of a target univariate to forecast, e.g. for a 5-variable time series you may want to forecast the value of the 3rd variable (we specify this by indicating `target_seq_index = 2`). To begin, we will load the multivariate `SeattleTrail` dataset for time series forecasting." ] }, { "cell_type": "code", "execution_count": 1, "id": "a6c1f175", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Time series is 5-dimensional\n" ] } ], "source": [ "from merlion.utils import TimeSeries\n", "from ts_datasets.forecast import SeattleTrail\n", "\n", "time_series, metadata = SeattleTrail()[0]\n", "train_data = TimeSeries.from_pd(time_series[metadata[\"trainval\"]])\n", "test_data = TimeSeries.from_pd(time_series[~metadata[\"trainval\"]])\n", "\n", "print(f\"Time series is {train_data.dim}-dimensional\")" ] }, { "cell_type": "markdown", "id": "82df16db", "metadata": {}, "source": [ "## Model Initialization and Training\n", "\n", "For the purposes of this tutorial, we will be using 3 models:\n", "\n", "1. `DefaultForeacster` (which automatically detects whether the input time series is univariate or multivariate);\n", "2. `ARIMA` (a classic univariate algorithm) trained to forecast a specific univariate; and \n", "3. A `ForecasterEnsemble` which selects the better of the two models.\n", "\n", "All models are trained with a maximum allowed forecasting horizon of 100 steps. Note that all multivariate forecasting models can be used for univariate time series, and by specifying `target_seq_index` appropriately, univariate models can be used for multivariate time series as well. Moreover, the API is identical in all cases." ] }, { "cell_type": "code", "execution_count": 2, "id": "46593ce3", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Inferred granularity 0 days 01:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Training DefaultForecaster...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Inferred granularity 0 days 01:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Training Arima...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Inferred granularity 0 days 01:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Training ForecasterEnsemble...\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Inferred granularity 0 days 01:00:00\n", "Inferred granularity 0 days 01:00:00\n", "Inferred granularity 0 days 01:00:00\n" ] } ], "source": [ "from merlion.evaluate.forecast import ForecastMetric\n", "from merlion.models.factory import ModelFactory\n", "from merlion.models.ensemble.combine import ModelSelector\n", "\n", "# Time series is sampled hourly, so max_forecast_steps = 24 means we can predict up to 1 day in the future\n", "target_seq_index = 2\n", "max_forecast_steps = 24\n", "kwargs = dict(target_seq_index=target_seq_index, max_forecast_steps=max_forecast_steps)\n", "\n", "model1 = ModelFactory.create(\"DefaultForecaster\", **kwargs)\n", "model2 = ModelFactory.create(\"Arima\", **kwargs)\n", "\n", "# This ModelSelector combiner picks the best model based on sMAPE\n", "model3 = ModelFactory.create(\"ForecasterEnsemble\", models=[model1, model2],\n", " combiner=ModelSelector(metric=ForecastMetric.sMAPE))\n", "for model in [model1, model2, model3]:\n", " print(f\"Training {type(model).__name__}...\")\n", " train_pred, train_stderr = model.train(train_data)" ] }, { "cell_type": "markdown", "id": "a720d718", "metadata": {}, "source": [ "## Model Inference and Quantitative Evaluation\n", "Like univariate models, we may call `model.forecast()` to get a forecast and potentially a standard error for the model. We can use these to evaluate the model's performance. Note that the model selector successfully picks the better of the two models." ] }, { "cell_type": "code", "execution_count": 3, "id": "6ee7d7bd", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "DefaultForecaster\n", "RMSE: 6.6216\n", "sMAPE: 121.1709\n", "\n", "Arima\n", "RMSE: 10.2208\n", "sMAPE: 140.2772\n", "\n", "ForecasterEnsemble\n", "RMSE: 6.6216\n", "sMAPE: 121.1709\n", "\n" ] } ], "source": [ "from merlion.evaluate.forecast import ForecastMetric\n", "\n", "target_univariate = test_data.univariates[test_data.names[target_seq_index]]\n", "target = target_univariate[:max_forecast_steps].to_ts()\n", "\n", "for model in [model1, model2, model3]:\n", " forecast, stderr = model.forecast(target.time_stamps)\n", " rmse = ForecastMetric.RMSE.value(ground_truth=target, predict=forecast)\n", " smape = ForecastMetric.sMAPE.value(ground_truth=target, predict=forecast)\n", " print(f\"{type(model).__name__}\")\n", " print(f\"RMSE: {rmse:.4f}\")\n", " print(f\"sMAPE: {smape:.4f}\")\n", " print()" ] }, { "cell_type": "markdown", "id": "0df4307b", "metadata": {}, "source": [ "We can also use a `ForecastEvaluator` to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 day (`horizon=\"1d\"` means that we want the model to predict 1 day in the future at each time step, and `cadence=\"1d\"` means that we obtain a prediction from the model once per day). Note that we never actually re-train the model (`retrain_freq=None`)." ] }, { "cell_type": "code", "execution_count": 4, "id": "bcac94ff", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Inferred granularity 0 days 01:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "DefaultForecaster Sliding Window Evaluation\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "ForecastEvaluator: 100%|██████████| 31528800/31528800 [01:04<00:00, 491549.47it/s]\n", "Inferred granularity 0 days 01:00:00\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "RMSE: 12.0339\n", "sMAPE: 99.4165\n", "\n", "Arima Sliding Window Evaluation\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "ForecastEvaluator: 100%|██████████| 31528800/31528800 [02:24<00:00, 218558.84it/s]" ] }, { "name": "stdout", "output_type": "stream", "text": [ "RMSE: 13.1032\n", "sMAPE: 112.2607\n", "\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "\n" ] } ], "source": [ "from merlion.evaluate.forecast import ForecastEvaluator, ForecastEvaluatorConfig\n", "\n", "for model in [model1, model2]:\n", " print(f\"{type(model).__name__} Sliding Window Evaluation\")\n", " evaluator = ForecastEvaluator(model=model, config=ForecastEvaluatorConfig(\n", " horizon=\"1d\", cadence=\"1d\", retrain_freq=None))\n", " train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)\n", " rmse = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.RMSE)\n", " smape = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.sMAPE)\n", " print(f\"RMSE: {rmse:.4f}\")\n", " print(f\"sMAPE: {smape:.4f}\")\n", " print()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.5" } }, "nbformat": 4, "nbformat_minor": 5 }