python - Pandas aggregate list in resample/groupby -


i have dataframe in each instance has timestamp, id , list of numbers follows:

timestamp           | id | lists ---------------------------------- 2016-01-01 00:00:00 | 1  | [2, 10] 2016-01-01 05:00:00 | 1  | [9, 10, 3, 5] 2016-01-01 10:00:00 | 1  | [1, 10, 5] 2016-01-02 01:00:00 | 1  | [2, 6, 7] 2016-01-02 04:00:00 | 1  | [2, 6] 2016-01-01 02:00:00 | 2  | [0] 2016-01-01 08:00:00 | 2  | [10, 3, 2] 2016-01-01 14:00:00 | 2  | [0, 9, 3] 2016-01-02 03:00:00 | 2  | [0, 9, 2] 

for each id want resample day(and easy) , concatenate lists of instances happened in same day. resample + concat/sum not work because resample removes non-numeric columns (see here)

i want write similar this:

daily_data = data.groupby('id').resample('1d').concatenate() # .concatenate() not exist 

result desired:

timestamp  | id | lists ---------------------------------- 2016-01-01 | 1  | [2, 10, 9, 10, 3, 5, 1, 10, 5] 2016-01-02 | 1  | [2, 6, 7, 2, 6] 2016-01-01 | 2  | [0, 10, 3, 2] 2016-01-02 | 2  | [0, 9, 3, 0, 9, 2] 

here can copy script generates input used description:

import pandas pd  random import randint  time = pd.to_datetime( ['2016-01-01 00:00:00', '2016-01-01 05:00:00',                          '2016-01-01 10:00:00', '2016-01-02 01:00:00',                          '2016-01-02 04:00:00', '2016-01-01 02:00:00',                          '2016-01-01 08:00:00', '2016-01-01 14:00:00',                         '2016-01-02 03:00:00' ]                       )  id_1 = [1] * 5 id_2 = [2] * 4  lists = [0] * 9 in range(9):     l = [randint(0,10)  _ in range(randint(1,5) ) ]     l = list(set(l))     lists[i] = l  data = {'timestamp': time, 'id': id_1 + id_2, 'lists': lists}  example = pd.dataframe(data=data) 

bonus points if there way optionally remove duplicates in concatenated list.

as pointed out @jezrael, works in pandas version 0.18.1+

  • set_index 'timestamp' prep later resample
  • groupby 'id' column , select lists columns
  • after resample, sum of lists concatenate them
  • reset_index columns in correct order

df.set_index('timestamp').groupby('id').lists.resample('d').sum() \   .reset_index('id').reset_index() 

enter image description here


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -