python - Have Pandas column containing lists, how to pivot unique list elements to columns? -
i wrote web scraper pull information table of products , build dataframe. data table has description column contains comma separated string of attributes describing product. want create column in dataframe every unique attribute , populate row in column attribute's substring. example df below.
products date description product 2016-9-12 steel, red, high hardness product b 2016-9-11 blue, lightweight, steel product c 2016-9-12 red
i figure first step split description list.
in: df2 = df['description'].str.split(',') out: description ['steel', 'red', 'high hardness'] ['blue', 'lightweight', 'steel'] ['red']
my desired output looks table below. column names not particularly important.
products date steel_col red_col high hardness_col blue col lightweight_col product 2016-9-12 steel red high hardness product b 2016-9-11 steel blue lightweight product c 2016-9-12 red
i believe columns can set using pivot i'm not sure pythonic way populate columns after establishing them. appreciated.
update
thank answers. selected @maxu's response correct since seems more flexible, @pirsquared's gets similar result , may considered more pythonic approach. tested both version , both needed. thanks!
you can build sparse matrix:
in [27]: df out[27]: products date description 0 product 2016-9-12 steel, red, high hardness 1 product b 2016-9-11 blue, lightweight, steel 2 product c 2016-9-12 red in [28]: (df.set_index(['products','date']) ....: .description.str.split(',\s*', expand=true) ....: .stack() ....: .reset_index() ....: .pivot_table(index=['products','date'], columns=0, fill_value=0, aggfunc='size') ....: ) out[28]: 0 blue high hardness lightweight red steel products date product 2016-9-12 0 1 0 1 1 product b 2016-9-11 1 0 1 0 1 product c 2016-9-12 0 0 0 1 0 in [29]: (df.set_index(['products','date']) ....: .description.str.split(',\s*', expand=true) ....: .stack() ....: .reset_index() ....: .pivot_table(index=['products','date'], columns=0, fill_value='', aggfunc='size') ....: ) out[29]: 0 blue high hardness lightweight red steel products date product 2016-9-12 1 1 1 product b 2016-9-11 1 1 1 product c 2016-9-12 1
Comments
Post a Comment