The goal of tsrecipes is to provide time series proprocessing to accommodate time series classification and clustering in the tidymodels framework.
The primary steps are
You can install tsrecipes with:
# install.packages("devtools") devtools::install_github("tmastny/tsrecipes")
In time series classification, using the time series as a feature often results in poor accuracy because it’s the auto-correlation between entries and the overall trend that’s the most informative, rather than individual values.
The discrete cosine transform is one way to extract useful, uncorrelated features with significantly fewer dimensions than the time series.
In this example, it’s possible to classify time series of length 1751 into 4 classes with 70% accuracy, only using 16 dimensions.
library(tidymodels) library(tsrecipes) rec <- recipe(ethanol, var = names(ethanol), roles = c("id", "outcome", "input")) %>% step_dct(ts, k = tune()) set.seed(2532) tune_results <- workflow() %>% add_model(multinom_reg() %>% set_engine("nnet")) %>% add_recipe(rec) %>% tune_grid( resamples = validation_split(ethanol), grid = expand_grid(k = c(4, 8, 16, 32, 64)) ) tune_results %>% collect_metrics() %>% filter(.metric == "accuracy") %>% select(k, mean) #> # A tibble: 5 x 2 #> k mean #> <dbl> <dbl> #> 1 4 0.278 #> 2 8 0.389 #> 3 16 0.754 #> 4 32 0.746 #> 5 64 0.849
Sayood, K. (2006). Introduction to data compression.
Primer on DCT: https://squidarth.com/rc/math/2018/06/24/fourier.html
dtwclust and dtt
This package is modeled after the textrecipes: both packages transform a sequence of data (words vs. time series).
The feasts R package and the Python package tsfresh contain many other time series preprocessing methods not included in this package.