The goal of tsrecipes is to provide time series proprocessing to accommodate time series classification and clustering in the tidymodels framework.
The primary steps are
You can install tsrecipes with:
# install.packages("devtools") devtools::install_github("tmastny/tsrecipes")
In time series classification, using the time series as a feature often results in poor accuracy because it’s the auto-correlation between entries and the overall trend that’s the most informative, rather than individual values.
The discrete cosine transform is one way to extract useful, uncorrelated features with significantly fewer dimensions than the time series.
In this example, it’s possible to classify time series of length 1751 into 4 classes with 70% accuracy, only using 16 dimensions.
library(tidymodels) library(tsrecipes) rec <- recipe(ethanol, var = names(ethanol), roles = c("id", "outcome", "input")) %>% step_dct(ts, k = tune()) set.seed(2532) tune_results <- workflow() %>% add_model(multinom_reg() %>% set_engine("nnet")) %>% add_recipe(rec) %>% tune_grid( resamples = validation_split(ethanol), grid = expand_grid(k = c(4, 8, 16, 32, 64)) ) tune_results %>% collect_metrics() %>% filter(.metric == "accuracy") %>% select(k, mean) #> # A tibble: 5 x 2 #> k mean #> <dbl> <dbl> #> 1 4 0.278 #> 2 8 0.389 #> 3 16 0.754 #> 4 32 0.746 #> 5 64 0.849
tsrecipes also enables times series clustering using the dynamic time warping metric.
step_dtw is powered by the dtwclust package.
ethanol_clusters <- recipe(ethanol) %>% step_dtw(ts, k = 4) %>% prep() %>% bake(ethanol)
Sayood, K. (2006). Introduction to data compression.
Primer on DCT: https://squidarth.com/rc/math/2018/06/24/fourier.html
dtwclust and dtt
This package is modeled after the textrecipes: both packages transform a sequence of data (words vs. time series).