Hello People.
I found this code at stackoverflow, which is about the more mathematical approach to find patterns (shapes, or more precisely, a sequence of certain numbers) in a dataset like an array.
import yfinance as yf
import matplotlib.pyplot as plt
import numpy as np
stock = 'TSLA'
start = '2020-11-01'
df = yf.download(stock , start=start)
nfx = []
nfy = []
for index, row in df.iterrows():
x_data = nfx.append(index)
y_data = nfy.append(row['Close'])
SampleTarget = nfy
Pattern = np.array([641.760009765625, 649.8800048828125, 604.47998046875, 627.0700073242188, 609.989990234375, 639.8300170898438, 633.25, 622.77001953125, 655.9000244140625, 695.0, 649.8599853515625, 640.3400268554688, 645.97998046875])
pat = np.array(Pattern)
data = np.array(SampleTarget)
n = len(data)
m = len(pat)
k = data.strides[0]
data_2d = np.lib.stride_tricks.as_strided(data, shape=(n-m+1, m), strides=(k, k))
print(np.all(np.isclose(data_2d, pat), axis=1))
np.linalg.norm(data_2d - pat, axis=1)
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
data_mod, shape=(n-m+1, m), strides=(k, k))
pat_inv = 1/(pat @ pat) * pat
cofs = data_2d_mod @ pat_inv
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)
print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')
So, i get what this is trying to do until this part comes:
data_mod = data*1.1
np.random.seed(1)
data_mod[16:16+m] = pat*0.5 + np.random.uniform(-0.5, 0.5, size=m)
data_2d_mod = np.lib.stride_tricks.as_strided(
data_mod, shape=(n-m+1, m), strides=(k, k))
pat_inv = 1/(pat @ pat) * pat
cofs = data_2d_mod @ pat_inv
ssqr = ((data_2d_mod - cofs.reshape(-1, 1) * pat)**2).sum(axis=1)
here i just understand the logic of providing a reshaped pattern, thats scaled 1.1 times or 0.5 with additional noise (reshaping the values and distort them a bit).
But what is the
data_2d_mod = np.lib.stride_tricks.as_strided(
data_mod, shape=(n-m+1, m), strides=(k, k))
part doing? especially i dont get how this is a good solution, because the
np.lib.stride_tricks.as_strided
line changes the original array. (like here is a warning flag in the documentation:
numpy.lib.stride_tricks.as_strided — NumPy v1.21 Manual[
^]
It would be wonderfull if someone could explain a bit more about that code function in the lower part.
Also i wanted to find the minima and maxima and plot them togheter with the residuals plot. I tried these lines:
print(f'cofs:\n{np.around(cofs, 2)}')
print(f'ssqr:\n{np.around(ssqr, 1)}')
minimums = min(ssqr)
print(minimums)
y_v = ssqr[int(min(ssqr))]
print(y_v)
plt.plot(ssqr)
plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()
What comes out here semes to me like nonsense. Why is that?
I thought ssqr would output a score (lowest is the best fitting) of the pattern match in the original dataset, and "cofs" would output how much the pattern is scaled to produce the scoring output.
Why do I get so strange values?
Also, the simple min() function does not seem to really find the minimum of the ssqr set.
Additionally, I wanted to label the absolute minimum in the plot to see, where the pattern matches best.
What am I doing wrong here?
By the way, the pattern is from the original dataset so it should give one perfect match.
Thanks, I appreciate
Benjamin
What I have tried:
minimums = min(ssqr)
print(minimums)
y_v = ssqr[int(min(ssqr))]
print(y_v)
plt.plot(ssqr)
plt.scatter(minimums, y_v)
plt.show()
plt.plot(pat)
plt.show()