ts_datasets.forecast package

Datasets for time series forecasting. Really, these are just time series with no labels of any sort.

ts_datasets.forecast.get_dataset(dataset_name, rootdir=None, **kwargs)

Parameters

dataset_name (str) – the name of the dataset to load, formatted as <name> or <name>_<subset>, e.g. EnergyPower or M4_Hourly
rootdir (Optional[str]) – the directory where the desired dataset is stored. Not required if the package ts_datasets is installed in editable mode, i.e. with flag -e.
kwargs – keyword arguments for the data loader you are trying to load.

Return type

BaseDataset

Returns

the data loader for the desired dataset (and subset) desired

class ts_datasets.forecast.CustomDataset(rootdir, test_frac=0.5, time_col=None, time_unit='s', data_cols=None, index_cols=None)

Bases: BaseDataset

Wrapper to load a custom dataset. Please review the tutorial to get started.

Parameters

rootdir – Filename of a single CSV, or a directory containing many CSVs. Each CSV must contain 1 or more time series.
test_frac – If we don’t find a column “trainval” in the time series, this is the fraction of each time series which we use for testing.
time_col – Name of the column used to index time. We use the first non-index, non-metadata column if none is given.
time_unit – If the time column is numerical, we assume it is a timestamp expressed in this unit.
data_cols – Name of the columns to fetch from the dataset. If None, use all non-time, non-index columns.
index_cols – If a CSV file contains multiple time series, these are the columns used to index those time series. For example, a CSV file may contain time series of sales for many (store, department) pairs. In this case, index_cols may be ["Store", "Dept"]. The values of the index columns will be added to the metadata of the data loader.

property metadata_cols

check_ts_for_metadata(ts, col)

time_series: list: A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, time_series may be a list of filenames, which are read lazily either during iteration, or whenever __getitem__ is invoked.

metadata: list: A list containing the metadata for all individual time series in the dataset.

class ts_datasets.forecast.M4(subset='Hourly', rootdir=None)

Bases: BaseDataset

The M4 Competition data is an extended and diverse set of time series to identify the most accurate forecasting method(s) for different types of domains, including Business, financial and economic forecasting, and different type of granularity, including Yearly (23,000 sequences), Quarterly (24,000 sequences), Monthly (48,000 sequences), Weekly(359 sequences), Daily (4,227 sequences) and Hourly (414 sequences) data.

source: https://github.com/Mcompetitions/M4-methods/tree/master/Dataset
timeseries sequences: 100,000

valid_subsets = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly']

url = 'https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/{}.csv'

time_series: list: A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, time_series may be a list of filenames, which are read lazily either during iteration, or whenever __getitem__ is invoked.

metadata: list: A list containing the metadata for all individual time series in the dataset.

class ts_datasets.forecast.EnergyPower(rootdir=None)

Bases: BaseDataset

Wrapper to load the open source energy grid power usage dataset.

source: https://www.kaggle.com/robikscube/hourly-energy-consumption
contains one 10-variable time series

Parameters: rootdir – The root directory at which the dataset can be found.

time_series: list: A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, time_series may be a list of filenames, which are read lazily either during iteration, or whenever __getitem__ is invoked.

metadata: list: A list containing the metadata for all individual time series in the dataset.

class ts_datasets.forecast.SeattleTrail(rootdir=None)

Bases: BaseDataset

Wrapper to load the open source Seattle Trail pedestrian/bike traffic dataset.

source: https://www.kaggle.com/city-of-seattle/seattle-burke-gilman-trail
contains one 5-variable time series

Parameters: rootdir – The root directory at which the dataset can be found.

time_series: list: A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, time_series may be a list of filenames, which are read lazily either during iteration, or whenever __getitem__ is invoked.

metadata: list: A list containing the metadata for all individual time series in the dataset.

class ts_datasets.forecast.SolarPlant(rootdir=None, num_columns=100)

Bases: BaseDataset

Wrapper to load the open source solar plant power dataset.

source: https://www.nrel.gov/grid/solar-power-data.html
contains one 405-variable time series

Note

The loader currently only includes the first 100 (of 405) variables.

Parameters

rootdir – The root directory at which the dataset can be found.
num_columns – indicates how many univariate columns should be returned

time_series: list: A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets, time_series may be a list of filenames, which are read lazily either during iteration, or whenever __getitem__ is invoked.

metadata: list: A list containing the metadata for all individual time series in the dataset.