Pattern finder: how to find similar shapes? And create a score of similarity

Question

0.00/5 (No votes)

See more:

Hello People.

I found this code at stackoverflow, which is about the more mathematical approach to find patterns (shapes, or more precisely, a sequence of certain numbers) in a dataset like an array.

Python

import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np



stock = 'TSLA'
start = '2020-11-01'

df = yf.download(stock , start=start)

nfx = []
nfy = []

for index, row in df.iterrows():
    
    x_data = nfx.append(index)
    y_data = nfy.append(row['Close'])
    
SampleTarget = nfy
Pattern = np.array([641.760009765625, 649.8800048828125, 604.47998046875, 627.0700073242188, 609.989990234375, 639.8300170898438, 633.25, 622.77001953125, 655.9000244140625, 695.0, 649.8599853515625, 640.3400268554688, 645.97998046875])


pat = np.array(Pattern)
data = np.array(SampleTarget)
n = len(data)
m = len(pat)
k = data.strides[0] # typically 8 for float64

# data2d is a view to the original data,
# with data_2d[:-m, 6] == data_2d[1:1-m, 5] == ... == data_2d[6:, 0]
data_2d = np.lib.stride_tricks.as_strided(data, shape=(n-m+1, m), strides=(k, k))

# So you can check for matches on data[i, :] for all i
print(np.all(np.isclose(data_2d, pat), axis=1))

np.linalg.norm(data_2d - pat, axis=1) 

# New dataset with two occurrences of the pattern: one scaled by a factor 1.1,
# one scaled 0.5 with a bit of noise added
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))

# pat_inv: pseudoinverse of pat vector
pat_inv = 1/(pat @ pat) * pat 

# cofs: fit coefficients, shape (n1,)
cofs = data_2d_mod @ pat_inv # fit coefficients, shape (n1,)

# sum of squared residuals, shape (n1,) - zero means perfect fit
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)

print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')

So, i get what this is trying to do until this part comes:

Python

# New dataset with two occurrences of the pattern: one scaled by a factor 1.1,
# one scaled 0.5 with a bit of noise added
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))

# pat_inv: pseudoinverse of pat vector
pat_inv = 1/(pat @ pat) * pat 

# cofs: fit coefficients, shape (n1,)
cofs = data_2d_mod @ pat_inv # fit coefficients, shape (n1,)

# sum of squared residuals, shape (n1,) - zero means perfect fit
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)

here i just understand the logic of providing a reshaped pattern, thats scaled 1.1 times or 0.5 with additional noise (reshaping the values and distort them a bit).

But what is the

data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))

part doing? especially i dont get how this is a good solution, because the

np.lib.stride_tricks.as_strided

line changes the original array. (like here is a warning flag in the documentation: numpy.lib.stride_tricks.as_strided — NumPy v1.21 Manual[^]

It would be wonderfull if someone could explain a bit more about that code function in the lower part.

Also i wanted to find the minima and maxima and plot them togheter with the residuals plot. I tried these lines:

Python

print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')

minimums = min(ssqr)
print(minimums)



y_v = ssqr[int(min(ssqr))]

print(y_v)

plt.plot(ssqr)

plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()

What comes out here semes to me like nonsense. Why is that?

I thought ssqr would output a score (lowest is the best fitting) of the pattern match in the original dataset, and "cofs" would output how much the pattern is scaled to produce the scoring output.

Why do I get so strange values?

Also, the simple min() function does not seem to really find the minimum of the ssqr set.
Additionally, I wanted to label the absolute minimum in the plot to see, where the pattern matches best.

What am I doing wrong here?

By the way, the pattern is from the original dataset so it should give one perfect match.

Thanks, I appreciate
Benjamin

What I have tried:

minimums = min(ssqr)
print(minimums)



y_v = ssqr[int(min(ssqr))]

print(y_v)

plt.plot(ssqr)

plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()

Posted 7-Oct-21 8:44am

Benjamin di Lorenzo

Updated 7-Oct-21 9:21am

v2

Add a Solution

Comments

Richard MacCutchan 8-Oct-21 4:20am

Why not ask the person on StackOverflow who wrote it?

Benjamin di Lorenzo 8-Oct-21 4:36am

well, I did and probably will never get an answer, because the topic is old and there is no way to contact members via message at StackOverflow ;)

CHill60 8-Oct-21 4:56am

When you respond to a comment use the "Reply" link so that the member you are responding to is notified. Otherwise you are relying on them coming back to this post - unlikely to happen

Benjamin di Lorenzo 8-Oct-21 5:32am

Thanks, do you mean here at codeproject or at stackoverflow? because at SO i cannot find a "reply" button.

CHill60 8-Oct-21 6:28am

I meant here in this instance. But on StackOverflow once you have enough rep you can comment on a post. The author of the post will be notified. You can also use the @username convention there. See Privileges - Comment everywhere - Stack Overflow[^]

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)