The Dynamic Time Warping Problem

The goal of dynamic time warping (DTW) is to find a function that transforms, or "warps," time in order to approximately align two signals. DTW has various applications, some of which are demonstrated in the Interactive Demo below. Existing methods are often slow, with quadratic time complexity, and suffer from inaccuracies falling into two categories: susceptibility to local minima leading to poor alignments, and/or sharp irregularities in the warping function, known as "singularities." Many methods have attempted to address these issues in an ad-hoc manner, usually through hand-tuned pre- or post-processing steps. Our method is the first general solution that rapidly —in linear time— produces a smooth warping function without the need for any pre- or post-processing steps. Additionally, we offer a fast open-source C++ and Python package under a generous Apache license, comprehensive documentation, and several demos.

Read the paper
Top. Signal \(x\) will be time-warped to align to target signal \(y\). Middle. Warping function \( \phi \) drawn as lines between \(x\) and \(y\). Bottom. The time-warped signal \( (x \circ \phi) \) and target signal \(y\).
$$ \newcommand{\dt}{{\hspace{1pt} dt}} \newcommand{\Rcum}{R^\mathrm{cum}} \newcommand{\Lamcum}{\lambda ^\mathrm{cum}} \newcommand{\calRcum}{\mathcal R^\mathrm{cum}} \newcommand{\Rinst}{R^\mathrm{inst}} \newcommand{\calRinst}{\mathcal R^\mathrm{inst}} \newcommand{\Laminst}{\lambda ^\mathrm{inst}} \newcommand{\Smin}{s^\mathrm{min}} \newcommand{\Smax}{s^\mathrm{max}} $$

Interactive Demo


Choose a dataset and adjust the sliders below to interactively explore how the smoothing and cumulative warp regularizers affect the results.

Please wait while this example loads.

This hyperparameter increases the penalty on roughness. Lower values result in singularities.

This hyperparameter increases the penalty on over-fitting. Higher values result in a gentler warp.

Software


General

The documentation is written to be consistent with the paper, and will often refer to the variable and symbols used in the paper. So that the community can benefit from a common knowledge-base, software-related questions should be submitted via GitHub issues.

System Requirements

The Python package requires Python 3.6 or greater. There are pre-compiled binary distributions available for OSX, Linux, and Windows.

Installation

The Python package is distributed through PyPi and installed using the following command:

pip install gdtw

Source Code

Source Code on Github

You may download the source code by following the link above, or by cloning the repo with the following command:

git clone https://www.github.com/dderiso/gdtw

Custom Compilation

General notes on compilation are provided as comments in setup.py. Compilation requires the Numpy C++ bindings (automatically installed with Numpy) and g++ compiler with support for C++11.

git clone https://www.github.com/dderiso/gdtw
cd gdtw
python setup.py install

License

This software is provided under the Apache license.

Citing

Deriso, Dave, and Stephen Boyd (2022). A general optimization framework for dynamic time warping. Optimization and Engineering, 1-22.

@article{deriso2022general,
  title={A general optimization framework for dynamic time warping},
  author={Deriso, Dave and Boyd, Stephen},
  journal={Optimization and Engineering},
  pages={1--22},
  year={2022},
  publisher={Springer}
}

Questions

So that the community can benefit from a common knowledge-base, software-related questions should be submitted via GitHub issues. Other questions about code or documenation can be addressed to Dave Deriso (dderiso/at/alunmi-dot-stanford-dot-edu).

Quick Start


1. Generate Signals

In this example, we'll define signal \( x(t) = \sin(2 \pi * 5 t) \) to be a 5Hz sine wave and signal \( y(t) = x(q(t)) \) to be the same 5Hz sine wave but warped with a quadratic time warping function \( q(t) = t^2 \).

import numpy as np
t = np.linspace(0,1,1000)
q = lambda t_: t_**2
x = lambda t_: np.sin(2*np.pi * 5 * t_)
y = lambda t_: x(q(t_))

2. Plot Signals

The signals are plotted using the following code. If you don't have matplotlib installed, run pip install matplotlib into your shell to install the library.

import matplotlib.pyplot as plt
plt.plot(t, y(t), '-', color='C1', label='y(t)')
plt.plot(t, x(t), '-', color='C0', label='x(t)')
plt.legend()
plt.show()

3. Perform Time Warping

To warp signal \( x \) into signal \( y \), we'll use the function gdtw.warp(x,y,params) using the default parameters. We'll pass discretized signals \( x(t) \) and \( y(t) \), where \( t \) is a vector of \( N \) values \( 0 = t_1 < t_2 < \cdots < t_N = 1 \), into arguments x and y as Numpy arrays.

Recall that in our formulation, the warping function is denoted as \( \phi \), the discretized warped time is denoted as \( \tau \), the discretized warped signal is denoted as \( \hat{x} = x(\tau) \), and objective function evaluated on \( \tau \) (aka. DTW distance) is denoted as \( \hat{f}(\tau) \) (see \( \S 4 \) in the paper). Correspondingly, the function returns \( \phi \), \( x(\tau) \), and \( \hat{f}(\tau) \) as tuple consisting of a function, a Numpy array, and a Numpy double precision float.

import gdtw
phi, x_tau, f_tau, g = gdtw.warp(x(t), y(t))

4. Plot Warped Signals

The warped signals are plotted using the following code. You can plot the discretized result, \( x ( \tau ) \).

plt.plot(t,y(t),       '-', color='C1', label='y(t)')
plt.plot(t,x_tau, '--', color='C0', label='x(tau)')
plt.legend()
plt.show()

Alternatively, you can plot the composition \( x \circ \phi \).

plt.plot(t,y(t),       '-', color='C1', label='y(t)')
plt.plot(t,x(phi(t)), '--', color='C0', label='x(phi(t))')
plt.legend()
plt.show()

4. Custom Parameters

As illustrated in the interactive demonstration above, the choice of hyper-parameters greatly affects the solution. In addition to specifying hyper-parameter values, you can also specify custom loss and regularization functions. Together, you can design a custom objective function to better address more complicated problems. A table of allowable parameters is included in the API Reference.

gdtw.warp(x, y, params={
    "lambda_cum": 0.001, 
    "lambda_inst": 0.001,
    "Loss":   lambda x,y:      np.abs(x-y),
    "R_cum":  lambda phi,t:    np.abs(phi-t),
    "R_inst": lambda grad_phi: np.abs(grad_phi-1)
})

6. Summary

Congratulations! You now know how to time-warp two signals together, inspect the warping function, set hyper-parameter values, ' and specify custom loss and regularization functions. This is all you need to know for most applications.

API Reference


gdtw.warp

gdtw.warp(x,y,params)

Uses our formulation of dynamic time warping and method of iterative refinement to find a function \( \phi : t \to \tau \) that minimizes the difference between \( x(\phi(t)) \) and \( y(t) \). Recall that in our formulation, the warping function is denoted as \( \phi \), the discretized warped time is denoted as \( \tau \), the discretized warped signal is denoted as \( x(\tau) \), and the objective function evaluated on \( \tau \) (aka. DTW distance) is denoted as \( \hat{f}(\tau) \) (see \( \S 4 \) in the paper). Correspondingly, the function returns \( \phi \), \( \tau \), \( x(\tau) \), and \( \hat{f}(\tau) \) as tuple consisting of a function, two Numpy arrays, and a Numpy double precision float.

Arguments

Argument Type(s) Default Description
x np.ndarray or function Required \(x\) is a sequence, or \(x: R \to R \) is a function, that is being warped. Any specified function must have the following signature:
(t:float) -> float
Here is an example of a valid function:
lambda t: np.sin(2*np.pi * t)
y np.ndarray or function Required \(y\) is a sequence, or \(y: R \to R \) is a function, that serves as the target. Any specified function must have the following signature:
(t:float) -> float
Here is an example of a valid function:
lambda t: np.sin(2*np.pi * t)
params dictionary {} Parameters are optional. When a parameter is left unspecified, the solver uses default value listed in the Parameter Table below.

Parameter Table

Key Value Type(s) Default Value Description
"lambda_inst" np.double 0.1 \(\Laminst \ge 0 \) is a positive hyperparameter for the regularization functional \(\calRinst\). This hyperparameter increases the penalty on the instantaneous amount of warping. Higher values produce a smoother \( \phi \), and lower values result in singularities.
"lambda_cum" np.double 1.0 \(\Lamcum \ge 0\) is a positive hyperparameter for the regularization functional \(\calRcum\). This hyperparameter increases the penalty on the cumulative amount of warping. Higher values result in less over-fitting.
"Loss" string or function "L2" \( \mathcal{L} \) is the loss that ensures \( x( \phi (t)) \) is close to a target \( y(t) \). Fast C++ methods are invoked by passing a string instead of a function, built-in options include: "L1" or "L2". Any specified function must have the following signature:
(x:float, y:float) -> float
Here is an example of a valid function:
lambda x,y: np.abs(x-y)
"R_cum" string or function "L2" \( \calRcum \) is the regularization functional for the cumulative warp and reduces over-fitting. Fast C++ methods are invoked by passing a string instead of a function, built-in options include: "L1" or "L2". Any specified function must have the following signature:
(phi:float, t:float) -> float
Here is an example of a valid function:
lambda phi,t: np.abs(phi-t)
"R_inst" string or function "L2" \( \calRinst \) is the regularization functional for the instantaneous warp and improves the smoothness of the warping. Any specified function must have the following signature:
(grad_phi:float) -> float
Here is an example of a valid function:
lambda grad_phi: np.abs(grad_phi-1)
The constraint that the slope of \( \phi \) must be between \( s_\mathrm{min} \) and \( s_\mathrm{max} \) is handled in the solver outside of this function.
"N" np.double 300 \( N \) is the number of time points in the downsampled input. The computation time grown linearly as \( N \) increases.
"M" np.double 165 \( M \) is the search window size, and must satisfy \( M \le M_\mathrm{max}\). When \( M \) is not explicitly set, \( M \) is computed as \( \mathrm{min}( 0.55*N, M_\mathrm{max}) \). With default settings \( N = 300 \) and \( M_\mathrm{max} = 300 \), \( M = 165 \) The computation time grows quadratically as parameter \( M \) increases.
"M_max" np.double 300 \( M_\mathrm{max} \) is the maximum search window size.
"eta" np.double 0.15 \( \eta \) is the step size of the iterative refinement. Smaller values result in fewer iterations, but may result in lower quality results.
"s_min" np.double 1E-8 \( s_\mathrm{min} \) specifies the constraint for the minimum slope of \( \phi \). When this is non-zero, the warping function will be monotone increasing. We can allow the slope of \( \phi \) be negative by choosing \( s_\mathrm{min} < 0 \).
"s_max" np.double 1E8 \( s_\mathrm{max} \) specifies the constraint for the maximum slope of \( \phi \). Large values allow \( \phi \) to have sharp discontinuities.
"s_beta" np.double 0 \( s_\beta \) specifies a tolerance on the boundary constraints. The default is \(s_\beta = 0 \), which enforces boundary conditions \( \phi(0)=0 \) and \( \phi(1)=1 \).

Returns

phi function \( \phi [0,1]: R \to R \) is the time warping function that maps \( t \to \tau \). This function has the following signature:
(t:float) -> np.double
x_tau np.ndarray \( x( \tau ) \in R^N \) is an array of length \( N \) that contains the discretized version of the time warped signal \( x \).
f_tau np.double \( \hat{f}(\tau) \), a.k.a. the "DTW distance", is the discretized version of the objective function evaluated on \( \tau \).
g GDTW This is a reference to the GDTW solver object. It's generally ignored.

Raises

ValueError This error is raised when x or y are not of type np.ndarray or do not have the required function signature (t:float) -> float.
ValueError This error is raised when params is not of the dictionary type.

Example 1

This example is explained in detail in the quick start tutorial.

import numpy as np
t = np.linspace(0,1,1000)
q = lambda t_: t_**2
x = lambda t_: np.sin(2*np.pi * 5 * t_)
y = lambda t_: x(q(t_))

import gdtw
phi, x_tau, f_tau, g = gdtw.warp( x(t), y(t) )

import matplotlib.pyplot as plt
plt.plot(t,y(t),       '-', color='C1', label='y(t)')
plt.plot(t,x_tau, '--', color='C0', label='x(tau)')
plt.legend()
plt.show()

Example 2

This method supports custom hyper-parameter values as well as custom loss and regularizer functionals. The available options are listed in the Parameter Table above.

gdtw.warp(x, y, params={
    "lambda_cum": 0.001, 
    "lambda_inst": 0.001,
    "Loss":   lambda x,y:      np.abs(x-y),
    "R_cum":  lambda phi,t:    np.abs(phi-t),
    "R_inst": lambda grad_phi: np.abs(grad_phi-1)
})

Performance

The node costs are computed and stored in an \( M\times N \) array, and the edge costs are computed on the fly and stored in an \( M\times M\times N \) array. The computational cost of our dynamic programming procedure is order \( \mathcal{O}( N ) \) flops (not counting the evaluation of the loss and regularization terms). With current hardware, it is entirely practical for \( N=1000 \) or (much) larger. For example, \( N=1000 \) takes \( 0.83 \) ms to solve on a \( 4 \)-core Macbook and only \( 4 \) iterations are needed before the iterative refinement procedure terminates.

Runtime as a function of \( N \).