python - Pandas: how to compute the rolling sum of a variable over the last few days but only at a given hour? -
i have dataframe follows
df = pd.dataframe({ 'x' : np.random.randn(50000)}, index=pd.date_range('1/1/2000', periods=50000, freq='t')) df.head(10) out[37]: x 2000-01-01 00:00:00 -0.699565 2000-01-01 00:01:00 -0.646129 2000-01-01 00:02:00 1.339314 2000-01-01 00:03:00 0.559563 2000-01-01 00:04:00 1.529063 2000-01-01 00:05:00 0.131740 2000-01-01 00:06:00 1.282263 2000-01-01 00:07:00 -1.003991 2000-01-01 00:08:00 -1.594918 2000-01-01 00:09:00 -0.775230
i create variable contains sum
of x
- over last 5 days (not including current observation)
- only considering observations fall @ exact same hour current observation.
in other words:
- at index
2000-01-01 00:00:00
,df['rolling_sum_same_hour']
contains sum values of x observed @00:00:00
during last 5 days in data (not including2000-01-01
of course). - at index
2000-01-01 00:01:00
,df['rolling_sum_same_hour']
contains sum of of x observed @00:00:01
during last 5 days , on.
the intuitive idea intraday prices have intraday seasonality, , want rid of way.
i tried use df['rolling_sum_same_hour']=df.at_time(df.index.minute).rolling(window=5).sum()
with no success. ideas?
many thanks!
behold power of groupby
!
df = # defined above df['rolling_sum_by_time'] = df.groupby(df.index.time)['x'].apply(lambda x: x.shift(1).rolling(10).sum())
it's big pill swallow there, grouping time (as in python datetime.time), getting column care (else apply work on columns - works on time-groups), , applying function want!
Comments
Post a Comment