utils
Contains various utility files & functions useful for different models.
Utils for converting pandas datetime to numerical vectors |
|
A rolling window dataset |
|
Earlying Stopping |
|
Low-level utils for AutoML models. |
utils.time_features
Utils for converting pandas datetime to numerical vectors
- class merlion.models.utils.time_features.TimeFeature
Bases:
object
- class merlion.models.utils.time_features.SecondOfMinute
Bases:
TimeFeature
Second of minute encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.MinuteOfHour
Bases:
TimeFeature
Minute of hour encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.HourOfDay
Bases:
TimeFeature
Hour of day encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.DayOfWeek
Bases:
TimeFeature
Day of week encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.DayOfMonth
Bases:
TimeFeature
Day of month encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.DayOfYear
Bases:
TimeFeature
Day of year encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.MonthOfYear
Bases:
TimeFeature
Month of year encoded as value between [-0.5, 0.5]
- class merlion.models.utils.time_features.WeekOfYear
Bases:
TimeFeature
Week of year encoded as value between [-0.5, 0.5]
- merlion.models.utils.time_features.time_features_from_frequency_str(freq_str)
- Parameters
freq_str (
str
) – Frequency string of the form [multiple][granularity] such as “12H”, “5min”, “1D” etc.- Return type
List
[TimeFeature
]- Returns
a list of time features that will be appropriate for the given frequency string.
- merlion.models.utils.time_features.get_time_features(dates, ts_encoding='h')
Convert pandas Datetime to numerical vectors that can be used for training
utils.rolling_window_dataset
A rolling window dataset
- class merlion.models.utils.rolling_window_dataset.RollingWindowDataset(data, target_seq_index, n_past, n_future, exog_data=None, shuffle=False, ts_index=False, batch_size=1, flatten=True, ts_encoding=None, valid_fraction=0.0, validation=False, seed=0)
Bases:
object
A rolling window dataset which returns
(past, future)
windows for the whole time series. Ifts_index=True
is used, a batch size of 1 is employed, and each window returned by the dataset is(past, future)
, wherepast
andfuture
are bothTimeSeries
objects. Ifts_index=False
is used (default option, more efficient), each window returned by the dataset is(past_np, past_time, future_np, future_time)
:past_np
is a numpy array with shape(batch_size, n_past * dim)
ifflatten
isTrue
, otherwise(batch_size, n_past, dim)
.past_time
is a numpy array of times with shape(batch_size, n_past)
future_np
is a numpy array with shape(batch_size, dim)
iftarget_seq_index
isNone
(autoregressive prediction), or shape(batch_size, n_future)
iftarget_seq_index
is specified.future_time
is a numpy array of times with shape(batch_size, n_future)
- Parameters
data (
Union
[TimeSeries
,DataFrame
]) – time series data in the format of TimeSeries or pandas DataFrame with DatetimeIndextarget_seq_index (
Optional
[int
]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to use for the future labeling. Iftarget_seq_index = None
, it implies that all the sequences are required for the future labeling. In this case, we setn_future = 1
and use the time series for 1-step autoregressive prediction.n_past (
int
) – number of steps for pastn_future (
int
) – number of steps for future. Iftarget_seq_index = None
, we manually setn_future = 1
.exog_data (
Union
[TimeSeries
,DataFrame
,None
]) – exogenous data to as inputs for the model, but not as outputs to predict. We assume the future values of exogenous variables are known a priori at test time.shuffle (
bool
) – whether the windows of the time series should be shuffled.ts_index (
bool
) – keep original TimeSeries internally for all the slicing, and output TimeSeries. by default, Numpy array will handle the internal data workflow and Numpy array will be the output.batch_size (
Optional
[int
]) – the number of windows to return in parallel. IfNone
, return the whole dataset.flatten (
bool
) – whether the output time series arrays should be flattened to 2 dimensions.ts_encoding (
Optional
[str
]) – whether the timestamp should be encoded to a float vector, which can be used for training deep learning based time series models; ifNone
, the timestamp is not encoded. If notNone
, it represents the frequency for time features encoding options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly]valid_fraction (
float
) – Fraction of validation set splitted from training data. ifvalid_fraction = 0
orvalid_fraction = 1
, we iterate over the entire dataset.validation (
Optional
[bool
]) – Whether the data is from the validation set or not. ifvalidation = None
, we iterate over the entire dataset.
- property validation
If set
False
, we only provide access to the training windows; if setTrue
, we only provide access to the validation windows. if set``None``, we iterate over the entire dataset.
- property seed
Set Random seed to perturb the training data
- property n_windows
Number of total slides windows
- property n_valid
Number of slides windows in validation set
- property n_train
Number of slides windows in training set
- property n_points
- collate_batch(batch)
utils.early_stopping
Earlying Stopping
- class merlion.models.utils.early_stopping.EarlyStopping(patience=7, delta=0)
Bases:
object
Early stopping for deep model training
- Parameters
patience – Number of epochs with no improvement after which training will be stopped.
delta – Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.
- save_best_state_and_dict(val_loss, model)
- load_best_model(model)
utils.autosarima_utils
Low-level utils for AutoML models.
- merlion.models.utils.autosarima_utils.diff(x, lag=1, differences=1)
Return suitably lagged and iterated differences from the given 1D or 2D array x
- merlion.models.utils.autosarima_utils.detect_maxiter_sarima_model(y, d, D, m, method, information_criterion, exog=None, **kwargs)
run a zero model with SARIMA(2; d; 2)(1; D; 1) / ARIMA(2; d; 2) determine the optimal maxiter
- merlion.models.utils.autosarima_utils.seas_seasonalstationaritytest(x, m)
Estimate the strength of seasonal component. The idea can be found in https://otexts.com/fpp2/seasonal-strength.html R implementation uses mstl instead of stl to deal with multiple seasonality
- merlion.models.utils.autosarima_utils.nsdiffs(x, m, max_D=1, test='seas')
Estimate the seasonal differencing order D with statistical test
Parameters: x : the time series to difference m : the number of seasonal periods max_D : the maximal number of seasonal differencing order allowed test: the type of test of seasonality to use to detect seasonal periodicity
- merlion.models.utils.autosarima_utils.KPSS_stationaritytest(xx, alpha=0.05)
The KPSS test is used with the null hypothesis that x has a stationary root against a unit-root alternative
The KPSS test is used with the null hypothesis that x has a stationary root against a unit-root alternative. Then the test returns the least number of differences required to pass the test at the level alpha
- merlion.models.utils.autosarima_utils.ndiffs(x, alpha=0.05, max_d=2, test='kpss')
Estimate the differencing order d with statistical test
Parameters: x : the time series to difference alpha : level of the test, possible values range from 0.01 to 0.1 max_d : the maximal number of differencing order allowed test: the type of test of seasonality to use to detect seasonal periodicity