utils
Contains various utility files & functions useful for different models.
| Utils for converting pandas datetime to numerical vectors | |
| A rolling window dataset | |
| Earlying Stopping | |
| Low-level utils for AutoML models. | 
utils.time_features
Utils for converting pandas datetime to numerical vectors
- class merlion.models.utils.time_features.TimeFeature
- Bases: - object
- class merlion.models.utils.time_features.SecondOfMinute
- Bases: - TimeFeature- Second of minute encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.MinuteOfHour
- Bases: - TimeFeature- Minute of hour encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.HourOfDay
- Bases: - TimeFeature- Hour of day encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.DayOfWeek
- Bases: - TimeFeature- Day of week encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.DayOfMonth
- Bases: - TimeFeature- Day of month encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.DayOfYear
- Bases: - TimeFeature- Day of year encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.MonthOfYear
- Bases: - TimeFeature- Month of year encoded as value between [-0.5, 0.5] 
- class merlion.models.utils.time_features.WeekOfYear
- Bases: - TimeFeature- Week of year encoded as value between [-0.5, 0.5] 
- merlion.models.utils.time_features.time_features_from_frequency_str(freq_str)
- Parameters
- freq_str ( - str) – Frequency string of the form [multiple][granularity] such as “12H”, “5min”, “1D” etc.
- Return type
- List[- TimeFeature]
- Returns
- a list of time features that will be appropriate for the given frequency string. 
 
- merlion.models.utils.time_features.get_time_features(dates, ts_encoding='h')
- Convert pandas Datetime to numerical vectors that can be used for training 
utils.rolling_window_dataset
A rolling window dataset
- class merlion.models.utils.rolling_window_dataset.RollingWindowDataset(data, target_seq_index, n_past, n_future, exog_data=None, shuffle=False, ts_index=False, batch_size=1, flatten=True, ts_encoding=None, valid_fraction=0.0, validation=False, seed=0)
- Bases: - object- A rolling window dataset which returns - (past, future)windows for the whole time series. If- ts_index=Trueis used, a batch size of 1 is employed, and each window returned by the dataset is- (past, future), where- pastand- futureare both- TimeSeriesobjects. If- ts_index=Falseis used (default option, more efficient), each window returned by the dataset is- (past_np, past_time, future_np, future_time):- past_npis a numpy array with shape- (batch_size, n_past * dim)if- flattenis- True, otherwise- (batch_size, n_past, dim).
- past_timeis a numpy array of times with shape- (batch_size, n_past)
- future_npis a numpy array with shape- (batch_size, dim)if- target_seq_indexis- None(autoregressive prediction), or shape- (batch_size, n_future)if- target_seq_indexis specified.
- future_timeis a numpy array of times with shape- (batch_size, n_future)
 - Parameters
- data ( - Union[- TimeSeries,- DataFrame]) – time series data in the format of TimeSeries or pandas DataFrame with DatetimeIndex
- target_seq_index ( - Optional[- int]) – The index of the univariate (amongst all univariates in a general multivariate time series) whose value we would like to use for the future labeling. If- target_seq_index = None, it implies that all the sequences are required for the future labeling. In this case, we set- n_future = 1and use the time series for 1-step autoregressive prediction.
- n_past ( - int) – number of steps for past
- n_future ( - int) – number of steps for future. If- target_seq_index = None, we manually set- n_future = 1.
- exog_data ( - Union[- TimeSeries,- DataFrame,- None]) – exogenous data to as inputs for the model, but not as outputs to predict. We assume the future values of exogenous variables are known a priori at test time.
- shuffle ( - bool) – whether the windows of the time series should be shuffled.
- ts_index ( - bool) – keep original TimeSeries internally for all the slicing, and output TimeSeries. by default, Numpy array will handle the internal data workflow and Numpy array will be the output.
- batch_size ( - Optional[- int]) – the number of windows to return in parallel. If- None, return the whole dataset.
- flatten ( - bool) – whether the output time series arrays should be flattened to 2 dimensions.
- ts_encoding ( - Optional[- str]) – whether the timestamp should be encoded to a float vector, which can be used for training deep learning based time series models; if- None, the timestamp is not encoded. If not- None, it represents the frequency for time features encoding options:[s:secondly, t:minutely, h:hourly, d:daily, b:business days, w:weekly, m:monthly]
- valid_fraction ( - float) – Fraction of validation set splitted from training data. if- valid_fraction = 0or- valid_fraction = 1, we iterate over the entire dataset.
- validation ( - Optional[- bool]) – Whether the data is from the validation set or not. if- validation = None, we iterate over the entire dataset.
 
 - property validation
- If set - False, we only provide access to the training windows; if set- True, we only provide access to the validation windows. if set``None``, we iterate over the entire dataset.
 - property seed
- Set Random seed to perturb the training data 
 - property n_windows
- Number of total slides windows 
 - property n_valid
- Number of slides windows in validation set 
 - property n_train
- Number of slides windows in training set 
 - property n_points
 - collate_batch(batch)
 
utils.early_stopping
Earlying Stopping
- class merlion.models.utils.early_stopping.EarlyStopping(patience=7, delta=0)
- Bases: - object- Early stopping for deep model training - Parameters
- patience – Number of epochs with no improvement after which training will be stopped. 
- delta – Minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement. 
 
 - save_best_state_and_dict(val_loss, model)
 - load_best_model(model)
 
utils.autosarima_utils
Low-level utils for AutoML models.
- merlion.models.utils.autosarima_utils.diff(x, lag=1, differences=1)
- Return suitably lagged and iterated differences from the given 1D or 2D array x 
- merlion.models.utils.autosarima_utils.detect_maxiter_sarima_model(y, d, D, m, method, information_criterion, exog=None, **kwargs)
- run a zero model with SARIMA(2; d; 2)(1; D; 1) / ARIMA(2; d; 2) determine the optimal maxiter 
- merlion.models.utils.autosarima_utils.seas_seasonalstationaritytest(x, m)
- Estimate the strength of seasonal component. The idea can be found in https://otexts.com/fpp2/seasonal-strength.html R implementation uses mstl instead of stl to deal with multiple seasonality 
- merlion.models.utils.autosarima_utils.nsdiffs(x, m, max_D=1, test='seas')
- Estimate the seasonal differencing order D with statistical test - Parameters: x : the time series to difference m : the number of seasonal periods max_D : the maximal number of seasonal differencing order allowed test: the type of test of seasonality to use to detect seasonal periodicity 
- merlion.models.utils.autosarima_utils.KPSS_stationaritytest(xx, alpha=0.05)
- The KPSS test is used with the null hypothesis that x has a stationary root against a unit-root alternative - The KPSS test is used with the null hypothesis that x has a stationary root against a unit-root alternative. Then the test returns the least number of differences required to pass the test at the level alpha 
- merlion.models.utils.autosarima_utils.ndiffs(x, alpha=0.05, max_d=2, test='kpss')
- Estimate the differencing order d with statistical test - Parameters: x : the time series to difference alpha : level of the test, possible values range from 0.01 to 0.1 max_d : the maximal number of differencing order allowed test: the type of test of seasonality to use to detect seasonal periodicity