ts_datasets.forecast package
Datasets for time series forecasting. Really, these are just time series with no labels of any sort.
- ts_datasets.forecast.get_dataset(dataset_name, rootdir=None)
- Parameters
dataset_name (
str
) – the name of the dataset to load, formatted as<name>
or<name>_<subset>
, e.g.EnergyPower
orM4_Hourly
rootdir (
Optional
[str
]) – the directory where the desired dataset is stored. Not required if the packagets_datasets
is installed in editable mode, i.e. with flag-e
.
- Return type
- Returns
the data loader for the desired dataset (and subset) desired
- class ts_datasets.forecast.M4(subset='Hourly', rootdir=None)
Bases:
BaseDataset
The M4 Competition data is an extended and diverse set of time series to identify the most accurate forecasting method(s) for different types of domains, including Business, financial and economic forecasting, and different type of granularity, including Yearly (23,000 sequences), Quarterly (24,000 sequences), Monthly (48,000 sequences), Weekly(359 sequences), Daily (4,227 sequences) and Hourly (414 sequences) data.
source: https://github.com/Mcompetitions/M4-methods/tree/master/Dataset
timeseries sequences: 100,000
- valid_subsets = ['Yearly', 'Quarterly', 'Monthly', 'Weekly', 'Daily', 'Hourly']
- url = 'https://github.com/Mcompetitions/M4-methods/raw/master/Dataset/{}.csv'
- time_series: list
A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets,
time_series
may be a list of filenames, which are read lazily either during iteration, or whenever__getitem__
is invoked.
- metadata: list
A list containing the metadata for all individual time series in the dataset.
- class ts_datasets.forecast.EnergyPower(rootdir=None)
Bases:
BaseDataset
Wrapper to load the open source energy grid power usage dataset.
source: https://www.kaggle.com/robikscube/hourly-energy-consumption
contains one 10-variable time series
- Parameters
rootdir – The root directory at which the dataset can be found.
- time_series: list
A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets,
time_series
may be a list of filenames, which are read lazily either during iteration, or whenever__getitem__
is invoked.
- metadata: list
A list containing the metadata for all individual time series in the dataset.
- class ts_datasets.forecast.SeattleTrail(rootdir=None)
Bases:
BaseDataset
Wrapper to load the open source Seattle Trail pedestrian/bike traffic dataset.
source: https://www.kaggle.com/city-of-seattle/seattle-burke-gilman-trail
contains one 5-variable time series
- Parameters
rootdir – The root directory at which the dataset can be found.
- time_series: list
A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets,
time_series
may be a list of filenames, which are read lazily either during iteration, or whenever__getitem__
is invoked.
- metadata: list
A list containing the metadata for all individual time series in the dataset.
- class ts_datasets.forecast.SolarPlant(rootdir=None, num_columns=100)
Bases:
BaseDataset
Wrapper to load the open source solar plant power dataset.
contains one 405-variable time series
Note
The loader currently only includes the first 100 (of 405) variables.
- Parameters
rootdir – The root directory at which the dataset can be found.
num_columns – indicates how many univariate columns should be returned
- time_series: list
A list of all individual time series contained in the dataset. Iterating over the dataset will iterate over this list. Note that for some large datasets,
time_series
may be a list of filenames, which are read lazily either during iteration, or whenever__getitem__
is invoked.
- metadata: list
A list containing the metadata for all individual time series in the dataset.