python - Pandas Apply groupby function to every column efficiently -
this question has answer here:
in pandas
can apply groupby functions every column in dataframe such in case of:
pt=df.groupby(['group']).sum().reset_index()
lets want apply lambda function lambda x: (0 < x).sum()
count cells value in them , include count of total items in each group. there more efficient way apply columns other repeating code:
import pandas pd df=pd.dataframe({'group':['w', 'w', 'w', 'e','e','e','n'], 'a':[0,1,5,0,1,5,7], 'b':[1,0,5,0,0,2,0], 'c':[1,1,5,0,0,5,0], 'total':[2,2,15,0,1,12,7] }) #check how many items present in group grp=df.groupby(['group']) pt1 = grp['a'].apply(lambda x: (0 < x).sum()).reset_index() pt2 = grp['b'].apply(lambda x: (0 < x).sum()).reset_index() pt3 = grp['c'].apply(lambda x: (0 < x).sum()).reset_index() pct=pd.merge(pt1, pt2, on=['group']) pct=pd.merge(pt2, pct, on=['group']) #get total items , merge counts pt = df.groupby(['group'])['total'].count().reset_index() pct=pd.merge(pt, pct, on=['group'])
output:
group total c b 0 e 3 1 2 1 1 n 1 0 1 0 2 w 3 3 2 2
what efficient way write n columns?
the cleanest way can think of this:
(df > 0).groupby(df['group']).agg({'a': 'sum', 'b': 'sum', 'c': 'sum', 'total': 'count'}) out: c total b group e 1.0 3 1.0 2.0 n 0.0 1 0.0 1.0 w 3.0 3 2.0 2.0
you can sort , cast int if want:
((df > 0).groupby(df['group']).agg({'a': 'sum', 'b': 'sum', 'c': 'sum', 'total': 'count'}) .sort_index(axis=1).astype('int') out: b c total group e 2 1 1 3 n 1 0 0 1 w 2 2 3 3
Comments
Post a Comment