Click here to Skip to main content
15,914,014 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello People.

I found this code at stackoverflow, which is about the more mathematical approach to find patterns (shapes, or more precisely, a sequence of certain numbers) in a dataset like an array.


Python
import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np



stock = 'TSLA'
start = '2020-11-01'

df = yf.download(stock , start=start)

nfx = []
nfy = []

for index, row in df.iterrows():
    
    x_data = nfx.append(index)
    y_data = nfy.append(row['Close'])
    
SampleTarget = nfy
Pattern = np.array([641.760009765625, 649.8800048828125, 604.47998046875, 627.0700073242188, 609.989990234375, 639.8300170898438, 633.25, 622.77001953125, 655.9000244140625, 695.0, 649.8599853515625, 640.3400268554688, 645.97998046875])


pat = np.array(Pattern)
data = np.array(SampleTarget)
n = len(data)
m = len(pat)
k = data.strides[0] # typically 8 for float64

# data2d is a view to the original data,
# with data_2d[:-m, 6] == data_2d[1:1-m, 5] == ... == data_2d[6:, 0]
data_2d = np.lib.stride_tricks.as_strided(data, shape=(n-m+1, m), strides=(k, k))

# So you can check for matches on data[i, :] for all i
print(np.all(np.isclose(data_2d, pat), axis=1))

np.linalg.norm(data_2d - pat, axis=1) 

# New dataset with two occurrences of the pattern: one scaled by a factor 1.1,
# one scaled 0.5 with a bit of noise added
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))

# pat_inv: pseudoinverse of pat vector
pat_inv = 1/(pat @ pat) * pat 

# cofs: fit coefficients, shape (n1,)
cofs = data_2d_mod @ pat_inv # fit coefficients, shape (n1,)

# sum of squared residuals, shape (n1,) - zero means perfect fit
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)

print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')


So, i get what this is trying to do until this part comes:
Python
# New dataset with two occurrences of the pattern: one scaled by a factor 1.1,
# one scaled 0.5 with a bit of noise added
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))

# pat_inv: pseudoinverse of pat vector
pat_inv = 1/(pat @ pat) * pat 

# cofs: fit coefficients, shape (n1,)
cofs = data_2d_mod @ pat_inv # fit coefficients, shape (n1,)

# sum of squared residuals, shape (n1,) - zero means perfect fit
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)


here i just understand the logic of providing a reshaped pattern, thats scaled 1.1 times or 0.5 with additional noise (reshaping the values and distort them a bit).

But what is the
data_2d_mod = np.lib.stride_tricks.as_strided(
    data_mod, shape=(n-m+1, m), strides=(k, k))
part doing? especially i dont get how this is a good solution, because the
np.lib.stride_tricks.as_strided
line changes the original array. (like here is a warning flag in the documentation: numpy.lib.stride_tricks.as_strided — NumPy v1.21 Manual[^]

It would be wonderfull if someone could explain a bit more about that code function in the lower part.

Also i wanted to find the minima and maxima and plot them togheter with the residuals plot. I tried these lines:

Python
print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')

minimums = min(ssqr)
print(minimums)



y_v = ssqr[int(min(ssqr))]

print(y_v)

plt.plot(ssqr)

plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()


What comes out here semes to me like nonsense. Why is that?

I thought ssqr would output a score (lowest is the best fitting) of the pattern match in the original dataset, and "cofs" would output how much the pattern is scaled to produce the scoring output.

Why do I get so strange values?

Also, the simple min() function does not seem to really find the minimum of the ssqr set.
Additionally, I wanted to label the absolute minimum in the plot to see, where the pattern matches best.

What am I doing wrong here?

By the way, the pattern is from the original dataset so it should give one perfect match.

Thanks, I appreciate
Benjamin

What I have tried:

minimums = min(ssqr)
print(minimums)



y_v = ssqr[int(min(ssqr))]

print(y_v)

plt.plot(ssqr)

plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()
Posted
Updated 7-Oct-21 9:21am
v2
Comments
Richard MacCutchan 8-Oct-21 4:20am    
Why not ask the person on StackOverflow who wrote it?
Benjamin di Lorenzo 8-Oct-21 4:36am    
well, I did and probably will never get an answer, because the topic is old and there is no way to contact members via message at StackOverflow ;)
CHill60 8-Oct-21 4:56am    
When you respond to a comment use the "Reply" link so that the member you are responding to is notified. Otherwise you are relying on them coming back to this post - unlikely to happen
Benjamin di Lorenzo 8-Oct-21 5:32am    
Thanks, do you mean here at codeproject or at stackoverflow? because at SO i cannot find a "reply" button.
CHill60 8-Oct-21 6:28am    
I meant here in this instance. But on StackOverflow once you have enough rep you can comment on a post. The author of the post will be notified. You can also use the @username convention there. See Privileges - Comment everywhere - Stack Overflow[^]

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900