Multivariate Time Series Forecasting

Multivariate time series forecasting works similarly to univariate time series forecasting (covered here and here). The main difference is that you must specify the index of a target univariate to forecast, e.g. for a 5-variable time series you may want to forecast the value of the 3rd variable (we specify this by indicating target_seq_index = 2). To begin, we will load the multivariate SeattleTrail dataset for time series forecasting.

[1]:
from merlion.utils import TimeSeries
from ts_datasets.forecast import SeattleTrail

time_series, metadata = SeattleTrail()[0]
train_data = TimeSeries.from_pd(time_series[metadata["trainval"]])
test_data = TimeSeries.from_pd(time_series[~metadata["trainval"]])

print(f"Time series is {train_data.dim}-dimensional")
Time series is 5-dimensional

Model Initialization and Training

For the purposes of this tutorial, we will be using 3 models:

  1. DefaultForeacster (which automatically detects whether the input time series is univariate or multivariate);

  2. ARIMA (a classic univariate algorithm) trained to forecast a specific univariate; and

  3. A ForecasterEnsemble which selects the better of the two models.

All models are trained with a maximum allowed forecasting horizon of 100 steps. Note that all multivariate forecasting models can be used for univariate time series, and by specifying target_seq_index appropriately, univariate models can be used for multivariate time series as well. Moreover, the API is identical in all cases.

[2]:
from merlion.evaluate.forecast import ForecastMetric
from merlion.models.factory import ModelFactory
from merlion.models.ensemble.combine import ModelSelector

# Time series is sampled hourly, so max_forecast_steps = 24 means we can predict up to 1 day in the future
target_seq_index = 2
max_forecast_steps = 24
kwargs = dict(target_seq_index=target_seq_index, max_forecast_steps=max_forecast_steps)

model1 = ModelFactory.create("DefaultForecaster", **kwargs)
model2 = ModelFactory.create("Arima", **kwargs)

# This ModelSelector combiner picks the best model based on sMAPE
model3 = ModelFactory.create("ForecasterEnsemble", models=[model1, model2],
                             combiner=ModelSelector(metric=ForecastMetric.sMAPE))
for model in [model1, model2, model3]:
    print(f"Training {type(model).__name__}...")
    train_pred, train_stderr = model.train(train_data)
Inferred granularity 0 days 01:00:00
Training DefaultForecaster...
Inferred granularity 0 days 01:00:00
Training Arima...
Inferred granularity 0 days 01:00:00
Training ForecasterEnsemble...
Inferred granularity 0 days 01:00:00
Inferred granularity 0 days 01:00:00
Inferred granularity 0 days 01:00:00

Model Inference and Quantitative Evaluation

Like univariate models, we may call model.forecast() to get a forecast and potentially a standard error for the model. We can use these to evaluate the model’s performance. Note that the model selector successfully picks the better of the two models.

[3]:
from merlion.evaluate.forecast import ForecastMetric

target_univariate = test_data.univariates[test_data.names[target_seq_index]]
target = target_univariate[:max_forecast_steps].to_ts()

for model in [model1, model2, model3]:
    forecast, stderr = model.forecast(target.time_stamps)
    rmse = ForecastMetric.RMSE.value(ground_truth=target, predict=forecast)
    smape = ForecastMetric.sMAPE.value(ground_truth=target, predict=forecast)
    print(f"{type(model).__name__}")
    print(f"RMSE:  {rmse:.4f}")
    print(f"sMAPE: {smape:.4f}")
    print()
DefaultForecaster
RMSE:  6.6216
sMAPE: 121.1709

Arima
RMSE:  10.2208
sMAPE: 140.2772

ForecasterEnsemble
RMSE:  6.6216
sMAPE: 121.1709

We can also use a ForecastEvaluator to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 day (horizon="1d" means that we want the model to predict 1 day in the future at each time step, and cadence="1d" means that we obtain a prediction from the model once per day). Note that we never actually re-train the model (retrain_freq=None).

[4]:
from merlion.evaluate.forecast import ForecastEvaluator, ForecastEvaluatorConfig

for model in [model1, model2]:
    print(f"{type(model).__name__} Sliding Window Evaluation")
    evaluator = ForecastEvaluator(model=model, config=ForecastEvaluatorConfig(
        horizon="1d", cadence="1d", retrain_freq=None))
    train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)
    rmse = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.RMSE)
    smape = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.sMAPE)
    print(f"RMSE:  {rmse:.4f}")
    print(f"sMAPE: {smape:.4f}")
    print()
Inferred granularity 0 days 01:00:00
DefaultForecaster Sliding Window Evaluation
ForecastEvaluator: 100%|██████████| 31528800/31528800 [01:04<00:00, 491549.47it/s]
Inferred granularity 0 days 01:00:00
RMSE:  12.0339
sMAPE: 99.4165

Arima Sliding Window Evaluation
ForecastEvaluator: 100%|██████████| 31528800/31528800 [02:24<00:00, 218558.84it/s]
RMSE:  13.1032
sMAPE: 112.2607