Link to original article:tecdat.cn/?p=22945
Source: Tuoduan Number according to the public account of the tribe
Dynamic Time warping (DTW,Dynamic Time Warping/warping/Bending) is an algorithm that measures the best arrangement between two sequences. Linear series data such as time series, audio and video can be analyzed in this way. DTW finds the best match between two sequences of numbers by local stretching and compression, and also calculates the distance between these sequences.
What does DTW do?
Dynamic time regularization algorithm, hence the name, is the two representing the same type of things of different length sequence of time “alignment”. For example, where DTW is most commonly used, in speech recognition, the same letter, pronounced by different people, the length is certainly different. After recording the sound, its signal must be very similar, only in time is not quite neatly. So we need to stretch or shorten one of the signals by a function to minimize the error between them.
How is DTW calculated?
Therefore, the problem to be solved by dynamic time warping is to find an optimal warping path W = {\varpi _1},{\varpi _2}… {\ varpi _k} W = 1, ϖ ϖ 2… ϖk, where {w_k} =(I,j) wk=(I,j), that is, the ith point of time series 1 is considered to be similar to the JTH point of time series 2. The sum of the distances of all similar points is regarded as the regular path distance, which is used to measure the similarity of two time series. The smaller the distance of the regular path, the higher the similarity.
Let’s summarize the simple steps of DTW dynamic time structuring algorithm:
1. First of all, there must be two or more sequences known, but they are both compared, so we assume that there are two sequences A={a1,a2,a3… ,am} B={b1,b2,b3,…. ,bn}, dimension m>n
2. Then use Euclidean distance to calculate the distance between each two points of each sequence, D(ai,bj), where 1≤ I ≤m, 1≤j≤n
Draw the following table:
3. The next step is to find the shortest path according to the figure above. From D(a1,a2), we follow some path to D(am,bn). If the current node is D(ai,bj), then the next node must be D(I +1,j), D(I,j+1), D(I +1,j+1), and the path must be the shortest. The calculation is based on the idea of dynamic programming. That is to say, when calculating the shortest path to node (I,j), the shortest distance from the upper left corner (i-1,j), (i-1,j-1) and (I,j-1) to node (I,j) is considered.
4. Then find the best output path back from the final shortest distance, from D(A1,b1) to D(am,bn). The sum of them is the required DTW distance
[Note] If the path is not backtracked, and the shortest point from the top left three nodes to the next node is taken as the optimal path node in step 3, it is the greedy algorithm. DTW is to calculate the minimum value from the beginning to the end, and then go back from the minimum value to see what nodes the minimum value passed through.
R language implementation
In this article, we will learn how to find the arrangement of two numeric sequences of data.
Create sequence data
First, we generate sequence data and visualize it in a graph.
plot(a, type = "l")lines(b, col = "blue")
Copy the code
Calculate the regularity mode
The DTW () function calculates an optimal arrangement.
align(a, b)
Copy the code
Return the following items. You can refer to the STR () function for more information.
Now, we can draw the combination.
Draw in a bidirectional way
Plotting the results of dynamic temporal regularization: point comparison
Displays query and reference time series and how they are arranged for visual inspection.
Plot(align)
Copy the code
Plot with density
Shows the cumulative cost density with a structured path stacked.
The graph is based on a cumulative cost matrix. It displays the optimal path as a “ridge” in the global cost density map.
PlotDensity(align)
Copy the code
summary
All in all, DTW is a very useful method to calculate the minimum distance between sequences, which can be used in speech sequence matching, stock market trading curve matching, DNA base sequence matching and so on. Its biggest feature is that it allows time to scale when matching, so it can better find the best matching sequence in a set of sequences.
- Eamonn Keogh, Chotirat Ann Ratanamahatana, Exact indexing of dynamic time warping, Knowledge and Information Systems, 2005.
Most popular insight
1. Use LSTM and PyTorch for time series prediction in Python
2. Use LSTM to predict and analyze time series in Python
3. Use R language to analyze time series (ARIMA, exponential smoothing)
4. Multiple Copula-GARCH-model time series prediction in R language
5. R language Copulas and financial time series cases
6. Use R language random wave model SV to deal with random wave in time series
7. Tar threshold autoregression model of r language time series
8. R language K-shape time series clustering method to stock price time series clustering
9. Python3 uses arIMA model for time series prediction