merlion.models.forecast package

Contains all forecasting models.

For forecasting, we define an abstract base ForecasterBase class which inherits from ModelBase and supports the following interface, in addition to model.save() and ForecasterClass.load defined for ModelBase:

  1. model = ForecasterClass(config)

    • initialization with a model-specific config (which inherits from ForecasterConfig)

    • configs contain:

      • a (potentially trainable) data pre-processing transform from merlion.transform; note that model.transform is a property which refers to model.config.transform

      • model-specific hyperparameters

      • optionally, a maximum number of steps the model can forecast for

  2. model.forecast(time_stamps, time_series_prev=None)

    • returns the forecast (TimeSeries) for future values at the time stamps specified by time_stamps, as well as the standard error of that forecast (TimeSeries, may be None)

    • if time_series_prev is specified, it is used as the most recent context. Otherwise, the training data is used

  3. model.train(train_data, train_config=None)

    • trains the model on the TimeSeries train_data

    • train_config (optional): extra configuration describing how the model should be trained (e.g. learning rate for LSTM). Not used for all models. Class-level default provided for models which do use it.

    • returns the model’s prediction train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

base

Base class for forecasting models.

arima

The classic statistical forecasting model ARIMA (AutoRegressive Integrated Moving Average).

sarima

A variant of ARIMA with a user-specified Seasonality.

prophet

Wrapper around Facebook's popular Prophet model for time series forecasting.

smoother

Multi-Scale Exponential Smoother for univariate time series forecasting.

vector_ar

Vector AutoRegressive model for multivariate time series forecasting.

baggingtrees

Bagging Tree-based models for multivariate time series forecasting.

boostingtrees

Boosting Tree-based models for multivariate time series forecasting.

lstm

A forecaster based on a LSTM neural net.

Submodules

merlion.models.forecast.base module

Base class for forecasting models.

class merlion.models.forecast.base.ForecasterConfig(max_forecast_steps, target_seq_index=None, **kwargs)

Bases: Config

Config object used to define a forecaster model.

Parameters
  • max_forecast_steps (Optional[int]) – Max # of steps we would like to forecast for. Required for some models which pre-compute a forecast, like ARIMA, SARIMA, and LSTM.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

classmethod from_dict(config_dict, return_unused_kwargs=False, **kwargs)

Constructs a Config from a Python dictionary of parameters.

Parameters
  • config_dict (Dict[str, Any]) – dict that will be used to instantiate this object.

  • return_unused_kwargs – whether to return any unused keyword args.

  • kwargs – any additional parameters to set (overriding config_dict).

Returns

Config object initialized from the dict.

class merlion.models.forecast.base.ForecasterBase(config)

Bases: ModelBase

Base class for a forecaster model.

Note

If your model depends on an evenly spaced time series, make sure to

  1. Call ForecasterBase.train_pre_process in ForecasterBase.train

  2. Call ForecasterBase.resample_time_stamps at the start of ForecasterBase.forecast to get a set of resampled time stamps, and call time_series.align(reference=time_stamps) to align the forecast with the original time stamps.

config_class

alias of ForecasterConfig

property max_forecast_steps
property target_seq_index: int
Return type

int

Returns

the index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

property dim
resample_time_stamps(time_stamps, time_series_prev=None)
train_pre_process(train_data, require_even_sampling, require_univariate)

Applies pre-processing steps common for training most models.

Parameters
  • train_data (TimeSeries) – the original time series of training data

  • require_even_sampling (bool) – whether the model assumes that training data is sampled at a fixed frequency

  • require_univariate (bool) – whether the model only works with univariate time series

Return type

TimeSeries

Returns

the training data, after any necessary pre-processing has been applied

abstract train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Return type

Tuple[TimeSeries, Optional[TimeSeries]]

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

abstract forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr (bool) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev (bool) – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, Optional[TimeSeries]], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

batch_forecast(time_stamps_list, time_series_prev_list, return_iqr=False, return_prev=False)

Returns the model’s forecast on a batch of timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps_list (List[List[int]]) – a list of lists of timestamps we wish to forecast for

  • time_series_prev_list (List[TimeSeries]) – a list of TimeSeries immediately preceding the time stamps in time_stamps_list

  • return_iqr (bool) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev (bool) – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Tuple[Union[Tuple[List[TimeSeries], List[Optional[TimeSeries]]], Tuple[List[TimeSeries], List[TimeSeries], List[TimeSeries]]]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

invert_transform(forecast, time_series_prev=None)
get_figure(*, time_series=None, time_stamps=None, time_series_prev=None, plot_forecast_uncertainty=False, plot_time_series_prev=False)
Parameters
  • time_series (Optional[TimeSeries]) – the time series over whose timestamps we wish to make a forecast. Exactly one of time_series or time_stamps should be provided.

  • time_stamps (Optional[List[int]]) – a list of timestamps we wish to forecast for. Exactly one of time_series or time_stamps should be provided.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_stamps. If given, we use it to initialize the time series model. Otherwise, we assume that time_stamps immediately follows the training data.

  • plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

Return type

Figure

Returns

a Figure of the model’s forecast.

plot_forecast(*, time_series=None, time_stamps=None, time_series_prev=None, plot_forecast_uncertainty=False, plot_time_series_prev=False, figsize=(1000, 600), ax=None)

Plots the forecast for the time series in matplotlib, optionally also plotting the uncertainty of the forecast, as well as the past values (both true and predicted) of the time series.

Parameters
  • time_series (Optional[TimeSeries]) – the time series over whose timestamps we wish to make a forecast. Exactly one of time_series or time_stamps should be provided.

  • time_stamps (Optional[List[int]]) – a list of timestamps we wish to forecast for. Exactly one of time_series or time_stamps should be provided.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_stamps. If given, we use it to initialize the time series model. Otherwise, we assume that time_stamps immediately follows the training data.

  • plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

  • figsize – figure size in pixels

  • ax – matplotlib axis to add this plot to

Returns

(fig, ax): matplotlib figure & axes the figure was plotted on

plot_forecast_plotly(*, time_series=None, time_stamps=None, time_series_prev=None, plot_forecast_uncertainty=False, plot_time_series_prev=False, figsize=(1000, 600))

Plots the forecast for the time series in plotly, optionally also plotting the uncertainty of the forecast, as well as the past values (both true and predicted) of the time series.

Parameters
  • time_series (Optional[TimeSeries]) – the time series over whose timestamps we wish to make a forecast. Exactly one of time_series or time_stamps should be provided.

  • time_stamps (Optional[List[int]]) – a list of timestamps we wish to forecast for. Exactly one of time_series or time_stamps should be provided.

  • time_series_prev (Optional[TimeSeries]) – a TimeSeries immediately preceding time_stamps. If given, we use it to initialize the time series model. Otherwise, we assume that time_stamps immediately follows the training data.

  • plot_forecast_uncertainty – whether to plot uncertainty estimates (the inter-quartile range) for forecast values. Not supported for all models.

  • plot_time_series_prev – whether to plot time_series_prev (and the model’s fit for it). Only used if time_series_prev is given.

  • figsize – figure size in pixels

merlion.models.forecast.arima module

The classic statistical forecasting model ARIMA (AutoRegressive Integrated Moving Average).

class merlion.models.forecast.arima.ArimaConfig(max_forecast_steps=None, target_seq_index=None, order=(4, 1, 2), **kwargs)

Bases: SarimaConfig

Config object used to define a forecaster model.

Configuration class for Arima. Just a Sarima model with seasonal order (0, 0, 0, 0).

property seasonal_order: Tuple[int, int, int, int]
Return type

Tuple[int, int, int, int]

Returns

(0, 0, 0, 0) because ARIMA has no seasonal order.

class merlion.models.forecast.arima.Arima(config)

Bases: Sarima

Implementation of the classic statistical model ARIMA (AutoRegressive Integrated Moving Average) for forecasting.

config_class

alias of ArimaConfig

merlion.models.forecast.sarima module

A variant of ARIMA with a user-specified Seasonality.

class merlion.models.forecast.sarima.SarimaConfig(max_forecast_steps=None, target_seq_index=None, order=(4, 1, 2), seasonal_order=(2, 0, 1, 24), **kwargs)

Bases: ForecasterConfig

Config object used to define a forecaster model.

Configuration class for Sarima.

Parameters
  • max_forecast_steps – Number of steps we would like to forecast for.

  • target_seq_index – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • order – Order is (p, d, q) for an ARIMA(p, d, q) process. d must be an integer indicating the integration order of the process, while p and q must be integers indicating the AR and MA orders (so that all lags up to those orders are included).

  • seasonal_order – Seasonal order is (P, D, Q, S) for seasonal ARIMA process, where s is the length of the seasonality cycle (e.g. s=24 for 24 hours on hourly granularity). P, D, Q are as for ARIMA.

class merlion.models.forecast.sarima.Sarima(config)

Bases: ForecasterBase, SeasonalityModel

Implementation of the classic statistical model SARIMA (Seasonal AutoRegressive Integrated Moving Average) for forecasting.

config_class

alias of SarimaConfig

property order: Tuple[int, int, int]
Return type

Tuple[int, int, int]

Returns

the order (p, d, q) of the model, where p is the AR order, d is the integration order, and q is the MA order.

property seasonal_order: Tuple[int, int, int, int]
Return type

Tuple[int, int, int, int]

Returns

the seasonal order (P, D, Q, S) for the seasonal ARIMA process, where p is the AR order, D is the integration order, Q is the MA order, and S is the length of the seasonality cycle.

train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, TimeSeries], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

set_seasonality(theta, train_data)

Implement this method to do any model-specific adjustments on the seasonality that was provided by SeasonalityLayer.

Parameters
  • theta – Seasonality processed by SeasonalityLayer.

  • train_data (array) – Training data (or numpy array representing the target univariate) for any model-specific adjustments you might want to make.

merlion.models.forecast.prophet module

Wrapper around Facebook’s popular Prophet model for time series forecasting.

class merlion.models.forecast.prophet.ProphetConfig(max_forecast_steps=None, target_seq_index=None, yearly_seasonality='auto', weekly_seasonality='auto', daily_seasonality='auto', add_seasonality='auto', uncertainty_samples=100, **kwargs)

Bases: ForecasterConfig

Config object used to define a forecaster model.

Configuration class for Facebook’s Prophet model, as described in this paper.

Parameters
  • max_forecast_steps (Optional[int]) – Max # of steps we would like to forecast for.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • yearly_seasonality (Union[bool, int]) – If bool, whether to enable yearly seasonality. By default, it is activated if there are >= 2 years of history, but deactivated otherwise. If int, this is the number of Fourier series components used to model the seasonality (default = 10).

  • weekly_seasonality (Union[bool, int]) – If bool, whether to enable weekly seasonality. By default, it is activated if there are >= 2 weeks of history, but deactivated otherwise. If int, this is the number of Fourier series components used to model the seasonality (default = 3).

  • daily_seasonality (Union[bool, int]) – If bool, whether to enable daily seasonality. By default, it is activated if there are >= 2 days of history, but deactivated otherwise. If int, this is the number of Fourier series components used to model the seasonality (default = 4).

  • add_seasonality – ‘auto’ indicates automatically adding extra seasonaltiy by detection methods (default = None).

  • uncertainty_samples (int) – The number of posterior samples to draw in order to calibrate the anomaly scores.

class merlion.models.forecast.prophet.Prophet(config)

Bases: ForecasterBase

Facebook’s model for time series forecasting. See docs for ProphetConfig and the paper for more details.

config_class

alias of ProphetConfig

property yearly_seasonality
property weekly_seasonality
property daily_seasonality
property add_seasonality
property uncertainty_samples
train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, TimeSeries], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

merlion.models.forecast.smoother module

Multi-Scale Exponential Smoother for univariate time series forecasting.

class merlion.models.forecast.smoother.MSESConfig(max_forecast_steps, target_seq_index=None, max_backstep=None, recency_weight=0.5, accel_weight=1.0, optimize_acc=True, eta=0.0, rho=0.0, phi=2.0, inflation=1.0, **kwargs)

Bases: ForecasterConfig

Configuration class for an MSES forecasting model.

Letting w be the recency weight, B the maximum backstep, x_t the last seen data point, and l_s,t the series of losses for scale s.

\[\begin{split}\begin{align*} \hat{x}_{t+h} & = \sum_{b=0}^B p_{b} \cdot (x_{t-b} + v_{b+h,t} + a_{b+h,t}) \\ \space \\ \text{where} \space\space & v_{b+h,t} = \text{EMA}_w(\Delta_{b+h} x_t) \\ & a_{b+h,t} = \text{EMA}_w(\Delta_{b+h}^2 x_t) \\ \text{and} \space\space & p_b = \sigma(z)_b \space\space \\ \text{if} & \space\space z_b = (b+h)^\phi \cdot \text{EMA}_w(l_{b+h,t}) \cdot \text{RWSE}_w(l_{b+h,t})\\ \end{align*}\end{split}\]
Parameters
  • max_forecast_steps (int) – Max number of steps to forecast ahead.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • max_backstep (Optional[int]) – Max backstep to use in forecasting. If we train with x(0),…,x(t), Then, the b-th model MSES uses will forecast x(t+h) by anchoring at x(t-b) and predicting xhat(t+h) = x(t-b) + delta_hat(b+h).

  • recency_weight (float) – The recency weight parameter to use when estimating delta_hat.

  • accel_weight (float) – The weight to scale the acceleration by when computing delta_hat. Specifically, delta_hat(b+h) = velocity(b+h) + accel_weight * acceleration(b+h).

  • optimize_acc (bool) – If True, the acceleration correction will only be used at scales ranging from 1,…(max_backstep+max_forecast_steps)/2.

  • eta (float) – The parameter used to control the rate at which recency_weight gets tuned when online updates are made to the model and losses can be computed.

  • rho (float) – The parameter that determines what fraction of the overall error is due to velcity error, while the rest is due to the complement. The error at any scale will be determined as rho * velocity_error + (1-rho) * loss_error.

  • phi (float) – The parameter used to exponentially inflate the magnitude of loss error at different scales. Loss error for scale s will be increased by a factor of phi ** s.

  • inflation (float) – The inflation exponent to use when computing the distribution p(b|h) over the models when forecasting at horizon h according to standard errors of the estimated velocities over the models; inflation=1 is equivalent to using the softmax function.

property max_scale
property backsteps
class merlion.models.forecast.smoother.MSESTrainConfig(incremental=True, process_losses=True, tune_recency_weights=False, init_batch_sz=2, train_cadence=None)

Bases: object

MSES training configuration.

Parameters
  • incremental (bool) – If True, train the MSES model incrementally with the initial training data at the given train_cadence. This allows MSES to return a forecast for the training data.

  • init_batch_sz (int) – The size of the inital training batch for MSES. This is necessary because MSES cannot predict the past, but needs to start with some data. This should be very small. 2 is the minimum, and is recommended because 2 will result in the most representative train forecast.

  • train_cadence (Optional[int]) – The frequency at which the training forecasts will be generated during incremental training.

Param

If True, track the losses encountered during incremental initial training.

Tune_recency_weights

If True, tune recency weights during incremental initial training.

class merlion.models.forecast.smoother.MSES(config)

Bases: ForecasterBase

Multi-scale Exponential Smoother (MSES) is a forecasting algorithm modeled heavily after classical mechanical concepts, namely, velocity and acceleration.

Having seen data points of a time series up to time t, MSES forecasts x(t+h) by anchoring at a value b steps back from the last known value, x(t-b), and estimating the delta between x(t-b) and x(t+h). The delta over these b+h timesteps, delta(b+h), also known as the delta at scale b+h, is predicted by estimating the velocity over these timesteps as well as the change in the velocity, acceleration. Specifically,

xhat(t+h) = x(t-b) + velocity_hat(b+h) + acceleration_hat(b+h)

This estimation is done for each b, known as a backstep, from 0, which anchors at x(t), 1,… up to a maximum backstep configurable by the user. The algorithm then takes the seperate forecasts of x(t+h), indexed by which backstep was used, xhat_b(t+h), and determines a final forecast: p(b|h) dot xhat_b, where p(b|h) is a distribution over the xhat_b’s that is determined according to the lowest standard errors of the recency-weighted velocity estimates.

Letting w be the recency weight, B the maximum backstep, x_t the last seen data point, and l_s,t the series of losses for scale s.

\[\begin{split}\begin{align*} \hat{x}_{t+h} & = \sum_{b=0}^B p_{b} \cdot (x_{t-b} + v_{b+h,t} + a_{b+h,t}) \\ \space \\ \text{where} \space\space & v_{b+h,t} = \text{EMA}_w(\Delta_{b+h} x_t) \\ & a_{b+h,t} = \text{EMA}_w(\Delta_{b+h}^2 x_t) \\ \text{and} \space\space & p_b = \sigma(z)_b \space\space \\ \text{if} & \space\space z_b = (b+h)^\phi \cdot \text{EMA}_w(l_{b+h,t}) \cdot \text{RWSE}_w(l_{b+h,t})\\ \end{align*}\end{split}\]
config_class

alias of MSESConfig

property rho
property backsteps
property max_horizon
train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config (Optional[MSESTrainConfig]) – Additional training configs, if needed. Only required for some models.

Return type

Tuple[Optional[TimeSeries], None]

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

update(new_data, tune_recency_weights=True, train_cadence=None)

Updates the MSES model with new data that has been acquired since the model’s initial training.

Parameters
  • new_data (TimeSeries) – New data that has occured since the last training time.

  • tune_recency_weights (bool) – If True, the model will first forecast the values at the new_data’s timestamps, calculate the associated losses, and use these losses to make updates to the recency weight.

  • train_cadence – The frequency at which the training forecasts will be generated during incremental training.

Return type

Tuple[TimeSeries, TimeSeries]

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr (bool) – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev (bool) – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Tuple[TimeSeries, None]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

xhat_h(horizon)

Returns the forecasts for the input horizon at every backstep.

Return type

List[Optional[float]]

marginalize_xhat_h(horizon, xhat_h)

Given a list of forecasted values produced by delta estimators at different backsteps, compute a weighted average of these values. The weights are assigned based on the standard errors of the velocities, where the b’th estimate will be given more weight if its velocity has a lower standard error relative to the other estimates.

Parameters
  • horizon (int) – the horizon at which we want to predict

  • xhat_h (List[Optional[float]]) – the forecasted values at this horizon, using each of the possible backsteps

class merlion.models.forecast.smoother.DeltaStats(scale, recency_weight)

Bases: object

A wrapper around the statistics used to estimate deltas at a given scale.

Parameters
  • scale (int) – The scale associated with the statistics

  • recency_weight (float) – The recency weight parameter that that the incremental velocity, acceleration and standard error statistics should use.

property lag
update_velocity(vels)
update_acceleration(accs)
update_loss(losses)
tune(losses, eta)

Tunes the recency weight according to recent forecast losses.

Parameters
  • losses (List[float]) – List of recent losses.

  • eta (float) – Constant by which to scale the update to the recency weight. A bigger eta means more aggressive updates to the recency_weight.

class merlion.models.forecast.smoother.DeltaEstimator(max_scale, recency_weight, accel_weight, optimize_acc, eta, phi, data=None, stats=None)

Bases: object

Class for estimating the delta for MSES.

Parameters
  • max_scale (int) – Delta Estimator can estimate delta over multiple scales, or time steps, ranging from 1,2,…,max_scale.

  • recency_weight (float) – The recency weight parameter to use when estimating delta_hat.

  • accel_weight (float) – The weight to scale the acceleration by when computing delta_hat. Specifically, delta_hat(b+h) = velocity(b+h) + accel_weight * acceleration(b+h).

  • optimize_acc (bool) – If True, the acceleration correction will only be used at scales ranging from 1,…,max_scale/2.

  • eta (float) – The parameter used to control the rate at which recency_weight gets tuned when online updates are made to the model and losses can be computed.

  • data (Optional[UnivariateTimeSeries]) – The data to initialize the delta estimator with.

  • stats (Optional[Dict[int, DeltaStats]]) – Dictionary mapping scales to DeltaStats objects to be used for delta estimation.

property acc_max_scale
property max_scale
property data
property x
train(new_data)

Updates the delta statistics: velocity, acceleration and velocity standard error at each scale using new data.

Parameters

new_data (UnivariateTimeSeries) – new datapoints in the time series.

process_losses(scale_losses, tune_recency_weights=False)

Uses recent forecast errors to improve the delta estimator. This is done by updating the recency_weight that is used by delta stats at particular scales.

Parameters

scale_losses (Dict[int, List[float]]) – A dictionary mapping a scale to a list of forecasting errors that associated with that scale.

velocity(scale)
Return type

float

acceleration(scale)
Return type

float

vel_err(scale)
Return type

float

pos_err(scale)
Return type

float

neg_err(scale)
Return type

float

loss_err(scale)
Return type

float

delta_hat(scale)
Return type

float

merlion.models.forecast.vector_ar module

Vector AutoRegressive model for multivariate time series forecasting.

class merlion.models.forecast.vector_ar.VectorARConfig(max_forecast_steps, maxlags, target_seq_index=None, **kwargs)

Bases: ForecasterConfig

Config object used to define a forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • maxlags (int) – Max # of lags for AR

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • maxlags – Max # of lags for AR

class merlion.models.forecast.vector_ar.VectorAR(config)

Bases: ForecasterBase

Vector AutoRegressive model for multivariate time series forecasting.

config_class

alias of VectorARConfig

property maxlags: int
Return type

int

train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Return type

Tuple[TimeSeries, TimeSeries]

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Union[Tuple[TimeSeries, TimeSeries], Tuple[TimeSeries, TimeSeries, TimeSeries]]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

set_data_already_transformed()
reset_data_already_transformed()

merlion.models.forecast.baggingtrees module

Bagging Tree-based models for multivariate time series forecasting. Random Forest ExtraTreesRegressor

class merlion.models.forecast.baggingtrees.BaggingTreeForecasterConfig(max_forecast_steps, maxlags, target_seq_index=None, sampling_mode='normal', prediction_stride=1, n_estimators=100, random_state=None, max_depth=None, min_samples_split=2, **kwargs)

Bases: ForecasterConfig

Configuration class for bagging Tree-based forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • maxlags (int) – Max # of lags for forecasting

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • sampling_mode (str) – how to process time series data for the tree model. If “normal”, then concatenate all sequences over the window. If “stats”, then give statistics measures over the window. Note: “stats” mode is statistical summary for a multivariate dataset, mainly to reduce the computation cost for high-dimensional time series. For univariate data, it is not necessary to use “stats” instead of the sequence itself as the input. Therefore, for univariate, the model will automatically adopt “normal” mode.

  • prediction_stride (int) –

    the prediction step for training and forecasting

    • If univariate: the sequence target of the length of prediction_stride will be utilized, forecasting will be done by means of autoregression with the stride unit of prediction_stride

    • If multivariate:

      • if = 1: the autoregression with the stride unit of 1

      • if > 1: only support sequence mode, and the model will set prediction_stride = max_forecast_steps

  • n_estimators (int) – number of base estimators for the tree ensemble

  • random_state – random seed for bagging

  • max_depth – max depth of base estimators

  • min_samples_split – min split for tree leaves

class merlion.models.forecast.baggingtrees.BaggingTreeForecaster(config)

Bases: ForecasterBase, MultiVariateAutoRegressionMixin

Tree model for multivariate time series forecasting.

config_class

alias of BaggingTreeForecasterConfig

model = None
property maxlags: int
Return type

int

property sampling_mode: str
Return type

str

property prediction_stride: int
Return type

int

train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (List[int]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

set_data_already_transformed()
reset_data_already_transformed()
class merlion.models.forecast.baggingtrees.RandomForestForecasterConfig(max_forecast_steps, maxlags, target_seq_index=None, sampling_mode='normal', prediction_stride=1, n_estimators=100, random_state=None, max_depth=None, min_samples_split=2, **kwargs)

Bases: BaggingTreeForecasterConfig

Configuration class for bagging Tree-based forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • maxlags (int) – Max # of lags for forecasting

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • sampling_mode (str) – how to process time series data for the tree model. If “normal”, then concatenate all sequences over the window. If “stats”, then give statistics measures over the window. Note: “stats” mode is statistical summary for a multivariate dataset, mainly to reduce the computation cost for high-dimensional time series. For univariate data, it is not necessary to use “stats” instead of the sequence itself as the input. Therefore, for univariate, the model will automatically adopt “normal” mode.

  • prediction_stride (int) –

    the prediction step for training and forecasting

    • If univariate: the sequence target of the length of prediction_stride will be utilized, forecasting will be done by means of autoregression with the stride unit of prediction_stride

    • If multivariate:

      • if = 1: the autoregression with the stride unit of 1

      • if > 1: only support sequence mode, and the model will set prediction_stride = max_forecast_steps

  • n_estimators (int) – number of base estimators for the tree ensemble

  • random_state – random seed for bagging

  • max_depth – max depth of base estimators

  • min_samples_split – min split for tree leaves

class merlion.models.forecast.baggingtrees.RandomForestForecaster(config)

Bases: BaggingTreeForecaster

Random Forest Regressor for time series forecasting

Random Forest is a meta estimator that fits a number of classifying decision trees on various sub-samples of the dataset, and uses averaging to improve the predictive accuracy and control over-fitting.

config_class

alias of RandomForestForecasterConfig

class merlion.models.forecast.baggingtrees.ExtraTreesForecasterConfig(max_forecast_steps, maxlags, target_seq_index=None, sampling_mode='normal', prediction_stride=1, n_estimators=100, random_state=None, max_depth=None, min_samples_split=2, **kwargs)

Bases: BaggingTreeForecasterConfig

Configuration class for bagging Tree-based forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • maxlags (int) – Max # of lags for forecasting

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • sampling_mode (str) – how to process time series data for the tree model. If “normal”, then concatenate all sequences over the window. If “stats”, then give statistics measures over the window. Note: “stats” mode is statistical summary for a multivariate dataset, mainly to reduce the computation cost for high-dimensional time series. For univariate data, it is not necessary to use “stats” instead of the sequence itself as the input. Therefore, for univariate, the model will automatically adopt “normal” mode.

  • prediction_stride (int) –

    the prediction step for training and forecasting

    • If univariate: the sequence target of the length of prediction_stride will be utilized, forecasting will be done by means of autoregression with the stride unit of prediction_stride

    • If multivariate:

      • if = 1: the autoregression with the stride unit of 1

      • if > 1: only support sequence mode, and the model will set prediction_stride = max_forecast_steps

  • n_estimators (int) – number of base estimators for the tree ensemble

  • random_state – random seed for bagging

  • max_depth – max depth of base estimators

  • min_samples_split – min split for tree leaves

class merlion.models.forecast.baggingtrees.ExtraTreesForecaster(config)

Bases: BaggingTreeForecaster

Extra Trees Regressor for time series forecasting

Extra Trees Regressor implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

config_class

alias of ExtraTreesForecasterConfig

merlion.models.forecast.bo0stingtrees module

Boosting Tree-based models for multivariate time series forecasting. LightGBM

class merlion.models.forecast.boostingtrees.BoostingTreeForecasterConfig(max_forecast_steps, maxlags, target_seq_index=None, sampling_mode='normal', prediction_stride=1, learning_rate=0.1, n_estimators=100, random_state=None, max_depth=None, n_jobs=-1, **kwargs)

Bases: ForecasterConfig

Configuration class for boosting Tree-based forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • maxlags (int) – Max # of lags for forecasting

  • sampling_mode (str) – how to process time series data for the tree model. If “normal”, then concatenate all sequences over the window. If “stats”, then give statistics measures over the window. Note: “stats” mode is statistical summary for a multivariate dataset, mainly to reduce the computation cost for high-dimensional time series. For univariate data, it is not necessary to use “stats” instead of the sequence itself as the input. Therefore, for univariate, the model will automatically adopt “normal” mode.

  • prediction_stride (int) –

    the prediction step for training and forecasting

    • If univariate: the sequence target of the length of prediction_stride will be utilized, forecasting will be done by means of autoregression with the stride unit of prediction_stride

    • If multivariate:

      • if = 1: the autoregression with the stride unit of 1

      • if > 1: only support sequence mode, and the model will set prediction_stride = max_forecast_steps

  • learning_rate – learning rate for boosting

  • n_estimators – number of base estimators for the tree ensemble

  • random_state – random seed for boosting

  • max_depth – max depth of base estimators

  • n_jobs – num of threading, -1 or 0 indicates device default, positive int indicates num of threads

class merlion.models.forecast.boostingtrees.BoostingTreeForecaster(config)

Bases: ForecasterBase, MultiVariateAutoRegressionMixin

Tree model for multivariate time series forecasting.

config_class

alias of BoostingTreeForecasterConfig

model = None
property maxlags: int
Return type

int

property sampling_mode: str
Return type

str

property prediction_stride: int
Return type

int

train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config – Additional training configs, if needed. Only required for some models.

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (List[int]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp

set_data_already_transformed()
reset_data_already_transformed()
class merlion.models.forecast.boostingtrees.LGBMForecasterConfig(max_forecast_steps, maxlags, target_seq_index=None, sampling_mode='normal', prediction_stride=1, learning_rate=0.1, n_estimators=100, random_state=None, max_depth=None, n_jobs=-1, **kwargs)

Bases: BoostingTreeForecasterConfig

Configuration class for boosting Tree-based forecaster model.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • maxlags (int) – Max # of lags for forecasting

  • sampling_mode (str) – how to process time series data for the tree model. If “normal”, then concatenate all sequences over the window. If “stats”, then give statistics measures over the window. Note: “stats” mode is statistical summary for a multivariate dataset, mainly to reduce the computation cost for high-dimensional time series. For univariate data, it is not necessary to use “stats” instead of the sequence itself as the input. Therefore, for univariate, the model will automatically adopt “normal” mode.

  • prediction_stride (int) –

    the prediction step for training and forecasting

    • If univariate: the sequence target of the length of prediction_stride will be utilized, forecasting will be done by means of autoregression with the stride unit of prediction_stride

    • If multivariate:

      • if = 1: the autoregression with the stride unit of 1

      • if > 1: only support sequence mode, and the model will set prediction_stride = max_forecast_steps

  • learning_rate – learning rate for boosting

  • n_estimators – number of base estimators for the tree ensemble

  • random_state – random seed for boosting

  • max_depth – max depth of base estimators

  • n_jobs – num of threading, -1 or 0 indicates device default, positive int indicates num of threads

class merlion.models.forecast.boostingtrees.LGBMForecaster(config)

Bases: BoostingTreeForecaster

Light gradient boosting (LGBM) regressor for time series forecasting

LightGBM is a light weight and fast gradient boosting framework that uses tree based learning algorithms, for more details, please refer to the document https://lightgbm.readthedocs.io/en/latest/Features.html

config_class

alias of LGBMForecasterConfig

merlion.models.forecast.lstm module

A forecaster based on a LSTM neural net.

class merlion.models.forecast.lstm.LSTMConfig(max_forecast_steps, target_seq_index=None, nhid=1024, model_strides=(1,), **kwargs)

Bases: ForecasterConfig

Config object used to define a forecaster model.

Configuration class for LSTM.

Parameters
  • max_forecast_steps (int) – Max # of steps we would like to forecast for.

  • target_seq_index (Optional[int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to forecast.

  • nhid – hidden dimension of LSTM

  • model_strides – tuple indicating the stride(s) at which we would like to subsample the input data before giving it to the model.

class merlion.models.forecast.lstm.LSTMTrainConfig(lr=1e-05, batch_size=128, epochs=128, seq_len=256, data_stride=1, valid_split=0.2, checkpoint_file='checkpoint.pt')

Bases: object

LSTM training configuration.

class merlion.models.forecast.lstm.LSTM(config)

Bases: ForecasterBase

LSTM forecaster: this assume the input time series has equal intervals across all its values so that we can use sequence modeling to make forecast.

config_class

alias of LSTMConfig

train(train_data, train_config=None)

Trains the forecaster on the input time series.

Parameters
  • train_data (TimeSeries) – a TimeSeries of metric values to train the model.

  • train_config (Optional[LSTMTrainConfig]) – Additional training configs, if needed. Only required for some models.

Return type

Tuple[TimeSeries, None]

Returns

the model’s prediction on train_data, in the same format as if you called ForecasterBase.forecast on the time stamps of train_data

forecast(time_stamps, time_series_prev=None, return_iqr=False, return_prev=False)

Returns the model’s forecast on the timestamps given. Note that if self.transform is specified in the config, the forecast is a forecast of transformed values! It is up to you to manually invert the transform if desired.

Parameters
  • time_stamps (Union[int, List[int]]) – Either a list of timestamps we wish to forecast for, or the number of steps (int) we wish to forecast for.

  • time_series_prev (Optional[TimeSeries]) – a list of (timestamp, value) pairs immediately preceding time_series. If given, we use it to initialize the time series model. Otherwise, we assume that time_series immediately follows the training data.

  • return_iqr – whether to return the inter-quartile range for the forecast. Note that not all models support this option.

  • return_prev – whether to return the forecast for time_series_prev (and its stderr or IQR if relevant), in addition to the forecast for time_stamps. Only used if time_series_prev is provided.

Return type

Tuple[TimeSeries, None]

Returns

(forecast, forecast_stderr) if return_iqr is false, (forecast, forecast_lb, forecast_ub) otherwise.

  • forecast: the forecast for the timestamps given

  • forecast_stderr: the standard error of each forecast value.

    May be None.

  • forecast_lb: 25th percentile of forecast values for each timestamp

  • forecast_ub: 75th percentile of forecast values for each timestamp