A library to showcase time series analysis and forecasting
This project and repository aims to showcase a sample time series analysis and forecasting project. From a dataset point of view, we will be using Rossmann Retail store dataset from below competition https://www.kaggle.com/competitions/rossmann-store-sales/data.
On macOS: brew install pipx -> pipx ensurepath -> pipx install poetry -> poetry install
On Linux: Ubuntu 23.04 or above sudo apt update -> sudo apt install pipx -> pipx ensurepath -> pipx install poetry -> poetry install
Ubuntu 22.04 or below python3 -m pip install –user pipx -> python3 -m pipx ensurepath -> pipx install poetry -> poetry install
There are three main parts of the code base:
A simple python3 run.py command triggers the entire pipeline where forecasts for all stores get generated.
After the pipeline is run, a single prediction can be obtained using the below command, or results can be viewed under models/results folder.
Configuration of the run.py can be set in the conf/config.yaml file. Paths for data, models and reports can be set here. Since there are 856 stores currently, to limit runtime I’ve also added a max store count parameter. Sample config file can be seen below.
In this part we load the datasets train.csv and test.csv and make some basic transformations necessary for the prophet module
Dataset consists of: Train data from Jan 2013 to July 2015 (1115 Stores)
Test dataset from August 2015 to September 2015 (856 Stores)
First 25 store sales annually look like below:
Here, on top of seasonality and trend, we would like to create features based on whether a store has been in a promotion on the specific date and whether it has been affected by a School Holiday on that specific day. For this, we use a Column Transformer, One Hot Encoder and a pipeline to handle these transformations. Because the expected results are categorical and known (binary), we are able to do the transformations at once and did not need to handle this as part of the modelling pipeline. Therefore we only use fit_transform methodology within the pipeline.
Mass Forecaster attempts to find the best combination of parameters for the facebook prophet algorithm, and the optimal model for each store by applying backtesting forecasts onto multiple subsets of the data. This is performed for all Rossmann stores in the test dataset at a forecasting horizon of 42 days, which is what the Rossmann needs for their business and for this competition.
The parameter grid (or the search space for the models) are as per below:
'changepoint_prior_scale': [0.001, 0.01, 0.1, 0.5],
'seasonality_prior_scale': [0.01, 0.1, 1.0, 10.0],
'seasonality_mode': ['additive','multiplicative']
The methodology used to determine the most optimal forecast is “backtesting cross-validation with refit”, and below representation in my opinion does a great job of visualizing the overall process.
(Animation from: https://skforecast.org/0.11.0/user_guides/backtesting)
(Plotly animated html files are available under: models/graphs)
In terms of explainability, it’s possible to separate components of the forecast such as trend, weekly and yearly seasonality, which is a great benefit of the prophet algorithm. See figure below:
Cross validation results can be seen for each store under model/results folder in below form: