Click here to Skip to main content
15,888,521 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am calculating lags from a data frame from which I need to carry out regressions per day.

I used the below code to get the parse the dates by combining the date and input time (timestamp) for the data and the root specification from my pc.

df_fp =pd.DataFrame(pd.read_csv(root+'/GEM10_FullPeriod_LIQ.csv',parse_dates=[['refDate','inputtime']]))
I set up the index using this below code

df_fp.set_index('refDate_inputtime', inplace=True)

The snippet of data parsed

I am supposed to create 6 lags of each variable to start the manipulations to I use the below code

df_fp_lag = df_fp.copy()
trailing_window_size = 6

for window in range(1, trailing_window_size + 1):
shifted = df_fp.shift(window)
shifted.columns = [x + "_lag" + str(window) for x in df_fp.columns]

df_fp_lag = pd.concat((df_fp_lag, shifted), axis=1)
df_fp_lag = df_fp_lag.dropna()
df_fp_lag
When I run the above code I get the below dataframe. It has 222208 rows × 14 columns

Data with the lagged variables

I could not manage to develop a for loop to extract data for each date through iterations. So I just tried to query the above for a single date and the code used is

df_fp_lagd = df_fp_lag.query("refDate_inputtime >= '2008-01-02 08:00:00' and refDate_inputtime <='2008-01-02 17:30:00'")

So from the above data, I opted to manually create new columns to calculate the difference between Real Value of a variable from its lag values. I could not figure out a for loop since I am totally new to Python.

The new formed columns from the differencing variable and its lag values up to 6 lags

Doing so I get this error message.

> <ipython-input-47-0ff0924bacc0>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
I am able to run the OLS regression for the single day (first day) with the below code.

import statsmodels.api as sm
from statsmodels.formula.api import ols

xyxyxy = ols('c_spread ~ relative_spread_lag1+c2_spread+c3_spread+c4_spread+c5_spread', df_fp_lagd).fit()
print(xyxyxyx.summary())
I Appreciate help on how I can be able to develop a for loop to iterate over the many days and run regressions for each day and store the daily data separately so that it can be as well imported and manipulated in other software.

Thank you.

Link to the dataset used in this question

What I have tried:

df_fp_lag = df_fp.copy()
  trailing_window_size = 6

 for window in range(1, trailing_window_size + 1):
    shifted = df_fp.shift(window)
    shifted.columns = [x + "_lag" + str(window) for x in df_fp.columns]
    
    df_fp_lag = pd.concat((df_fp_lag, shifted), axis=1)
df_fp_lag = df_fp_lag.dropna()
df_fp_lag


xyxyxy = ols('c_spread ~ relative_spread_lag1+c2_spread+c3_spread+c4_spread+c5_spread', df_fp_lagd).fit()
print(xyxyxyx.summary())
Posted

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900