ts_datasets: Easy Data Loading
ts_datasets implements Python classes that manipulate numerous time series datasets
into standardized pandas.DataFrame s. The sub-modules are ts_datasets.anomaly
for time series anomaly detection, and ts_datasets.forecast for time series forecasting.
Simply install the package by calling pip install -e ts_datasets/ from the root directory of Merlion.
Then, you can load a dataset (e.g. the “realAWSCloudwatch” split of the Numenta Anomaly Benchmark
or the “Hourly” subset of the M4 dataset) by calling
from ts_datasets.anomaly import NAB
from ts_datasets.forecast import M4
anom_dataset = NAB(subset="realAWSCloudwatch", rootdir=path_to_NAB)
forecast_dataset = M4(subset="Hourly", rootdir=path_to_M4)
If you install this package in editable mode (i.e. specify -e when calling pip install -e ts_datasets/),
there is no need to specify a rootdir for any of the data loaders.
The core features of general data loaders (e.g. for forecasting) are outlined in the API doc for
ts_datasets.base.BaseDataset, and the features for time series anomaly detection data loaders
are outlined in the API doc for ts_datasets.anomaly.TSADBaseDataset.
The easiest way to load a custom dataset is to use either the ts_datasets.forecast.CustomDataset or
ts_datasets.anomaly.CustomAnomalyDataset classes. Please review the tutorial
to get started.
| Datasets for time series anomaly detection (TSAD). | |
| Datasets for time series forecasting. | 
Subpackages
datasets.base module
- class ts_datasets.base.BaseDataset
- Bases: - object- Base dataset class for storing time series as - pd.DataFrames. Each dataset supports the following features:- __getitem__: you may call- ts, metadata = dataset[i].- tsis a time-indexed- pandasDataFrame, with each column representing a different variable (in the case of multivariate time series).- metadatais a dict or- pd.DataFramewith the same index as- ts, with different keys indicating different dataset-specific metadata (train/test split, anomaly labels, etc.) for each timestamp.
- __len__: Calling- len(dataset)will return the number of time series in the dataset.
- __iter__: You may iterate over the- pandasrepresentations of the time series in the dataset with- for ts, metadata in dataset: ...
 - Note - For each time series, the - metadatawill always have the key- trainval, which is a- pd.Seriesof- boolindicating whether each timestamp of the time series should be training/validation (if- True) or testing (if- False).- 
time_series: list
- A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, - time_seriesmay be a list of filenames, which are read lazily either during iteration, or whenever- __getitem__is invoked.
 - 
metadata: list
- A list containing the metadata for all individual time series in the dataset. 
 - describe()