python - Pandas aggregate list in resample/groupby -
i have dataframe in each instance has timestamp, id , list of numbers follows:
timestamp | id | lists ---------------------------------- 2016-01-01 00:00:00 | 1 | [2, 10] 2016-01-01 05:00:00 | 1 | [9, 10, 3, 5] 2016-01-01 10:00:00 | 1 | [1, 10, 5] 2016-01-02 01:00:00 | 1 | [2, 6, 7] 2016-01-02 04:00:00 | 1 | [2, 6] 2016-01-01 02:00:00 | 2 | [0] 2016-01-01 08:00:00 | 2 | [10, 3, 2] 2016-01-01 14:00:00 | 2 | [0, 9, 3] 2016-01-02 03:00:00 | 2 | [0, 9, 2]
for each id want resample day(and easy) , concatenate lists of instances happened in same day. resample + concat/sum not work because resample removes non-numeric columns (see here)
i want write similar this:
daily_data = data.groupby('id').resample('1d').concatenate() # .concatenate() not exist
result desired:
timestamp | id | lists ---------------------------------- 2016-01-01 | 1 | [2, 10, 9, 10, 3, 5, 1, 10, 5] 2016-01-02 | 1 | [2, 6, 7, 2, 6] 2016-01-01 | 2 | [0, 10, 3, 2] 2016-01-02 | 2 | [0, 9, 3, 0, 9, 2]
here can copy script generates input used description:
import pandas pd random import randint time = pd.to_datetime( ['2016-01-01 00:00:00', '2016-01-01 05:00:00', '2016-01-01 10:00:00', '2016-01-02 01:00:00', '2016-01-02 04:00:00', '2016-01-01 02:00:00', '2016-01-01 08:00:00', '2016-01-01 14:00:00', '2016-01-02 03:00:00' ] ) id_1 = [1] * 5 id_2 = [2] * 4 lists = [0] * 9 in range(9): l = [randint(0,10) _ in range(randint(1,5) ) ] l = list(set(l)) lists[i] = l data = {'timestamp': time, 'id': id_1 + id_2, 'lists': lists} example = pd.dataframe(data=data)
bonus points if there way optionally remove duplicates in concatenated list.
as pointed out @jezrael, works in pandas version 0.18.1+
set_index
'timestamp'
prep laterresample
groupby
'id'
column , selectlists
columns- after
resample
,sum
of lists concatenate them reset_index
columns in correct order
df.set_index('timestamp').groupby('id').lists.resample('d').sum() \ .reset_index('id').reset_index()
Comments
Post a Comment