Multivariate Time Series Forecasting
Multivariate time series forecasting works similarly to univariate time series forecasting (covered here and here). The main difference is that you must specify the index of a target univariate to forecast, e.g. for a 5-variable time series you may want to forecast the value of the 3rd variable (we specify this by indicating target_seq_index = 2
). To begin, we will load the multivariate SeattleTrail
dataset for time series
forecasting.
[1]:
from merlion.utils import TimeSeries
from ts_datasets.forecast import SeattleTrail
time_series, metadata = SeattleTrail()[0]
train_data = TimeSeries.from_pd(time_series[metadata["trainval"]])
test_data = TimeSeries.from_pd(time_series[~metadata["trainval"]])
print(f"Time series is {train_data.dim}-dimensional")
Time series is 5-dimensional
Model Initialization and Training
For the purposes of this tutorial, we will be using 3 models:
DefaultForeacster
(which automatically detects whether the input time series is univariate or multivariate);ARIMA
(a classic univariate algorithm) trained to forecast a specific univariate; andA
ForecasterEnsemble
which selects the better of the two models.
All models are trained with a maximum allowed forecasting horizon of 100 steps. Note that all multivariate forecasting models can be used for univariate time series, and by specifying target_seq_index
appropriately, univariate models can be used for multivariate time series as well. Moreover, the API is identical in all cases.
[2]:
from merlion.evaluate.forecast import ForecastMetric
from merlion.models.factory import ModelFactory
from merlion.models.ensemble.combine import ModelSelector
from merlion.transform.resample import TemporalResample
# Time series is sampled hourly, so max_forecast_steps = 24 means we can predict up to 1 day in the future
target_seq_index = 2
max_forecast_steps = 24
kwargs = dict(target_seq_index=target_seq_index, max_forecast_steps=max_forecast_steps)
model1 = ModelFactory.create("DefaultForecaster", **kwargs)
model2 = ModelFactory.create("Arima", **kwargs)
# This ModelSelector combiner picks the best model based on sMAPE
model3 = ModelFactory.create("ForecasterEnsemble", models=[model1, model2], transform=TemporalResample(),
combiner=ModelSelector(metric=ForecastMetric.sMAPE))
for model in [model1, model2, model3]:
print(f"Training {type(model).__name__}...")
train_pred, train_stderr = model.train(train_data)
Training DefaultForecaster...
Inferred granularity 0 days 01:00:00
Inferred granularity 0 days 01:00:00
Training Arima...
Inferred granularity 0 days 01:00:00
Training ForecasterEnsemble...
ForecastEvaluator: 100%|██████████| 31550400/31550400 [01:36<00:00, 328262.84it/s]
ForecastEvaluator: 100%|██████████| 31550400/31550400 [03:24<00:00, 154110.26it/s]
Model Inference and Quantitative Evaluation
Like univariate models, we may call model.forecast()
to get a forecast and potentially a standard error for the model. We can use these to evaluate the model’s performance. Note that the model selector successfully picks the better of the two models.
[3]:
from merlion.evaluate.forecast import ForecastMetric
for model in [model1, model2, model3]:
forecast, stderr = model.forecast(test_data.time_stamps[:max_forecast_steps])
rmse = ForecastMetric.RMSE.value(ground_truth=test_data, predict=forecast, target_seq_index=target_seq_index)
smape = ForecastMetric.sMAPE.value(ground_truth=test_data, predict=forecast, target_seq_index=target_seq_index)
print(f"{type(model).__name__}")
print(f"RMSE: {rmse:.4f}")
print(f"sMAPE: {smape:.4f}")
print()
DefaultForecaster
RMSE: 7.5235
sMAPE: 132.8147
Arima
RMSE: 10.2208
sMAPE: 140.2771
ForecasterEnsemble
RMSE: 7.5235
sMAPE: 132.8147
We can also use a ForecastEvaluator
to evaluate a model in a manner that simulates live deployment. Here, we train an initial model on the training data, and we obtain its predictions on the training data using a sliding window of 1 day (horizon="1d"
means that we want the model to predict 1 day in the future at each time step, and cadence="1d"
means that we obtain a prediction from the model once per day). Note that we never actually re-train the model (retrain_freq=None
).
[4]:
from merlion.evaluate.forecast import ForecastEvaluator, ForecastEvaluatorConfig
for model in [model1, model2]:
print(f"{type(model).__name__} Sliding Window Evaluation")
evaluator = ForecastEvaluator(model=model, config=ForecastEvaluatorConfig(
horizon="1d", cadence="1d", retrain_freq=None))
train_result, test_pred = evaluator.get_predict(train_vals=train_data, test_vals=test_data)
rmse = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.RMSE)
smape = evaluator.evaluate(ground_truth=test_data, predict=test_pred, metric=ForecastMetric.sMAPE)
print(f"RMSE: {rmse:.4f}")
print(f"sMAPE: {smape:.4f}")
print()
Inferred granularity 0 days 01:00:00
DefaultForecaster Sliding Window Evaluation
ForecastEvaluator: 100%|██████████| 31528800/31528800 [02:03<00:00, 255804.57it/s]
Inferred granularity 0 days 01:00:00
RMSE: 12.0339
sMAPE: 99.4165
Arima Sliding Window Evaluation
ForecastEvaluator: 100%|██████████| 31528800/31528800 [04:19<00:00, 121688.66it/s]
RMSE: 13.1032
sMAPE: 112.2604