python - Have Pandas column containing lists, how to pivot unique list elements to columns? -


i wrote web scraper pull information table of products , build dataframe. data table has description column contains comma separated string of attributes describing product. want create column in dataframe every unique attribute , populate row in column attribute's substring. example df below.

products     date        description product    2016-9-12   steel, red, high hardness product b    2016-9-11   blue, lightweight, steel product c    2016-9-12   red 

i figure first step split description list.

in: df2 = df['description'].str.split(',')  out: description ['steel', 'red', 'high hardness'] ['blue', 'lightweight', 'steel'] ['red'] 

my desired output looks table below. column names not particularly important.

products     date        steel_col  red_col    high hardness_col  blue col   lightweight_col product    2016-9-12   steel      red        high hardness product b    2016-9-11   steel                                    blue       lightweight product c    2016-9-12              red 

i believe columns can set using pivot i'm not sure pythonic way populate columns after establishing them. appreciated.

update

thank answers. selected @maxu's response correct since seems more flexible, @pirsquared's gets similar result , may considered more pythonic approach. tested both version , both needed. thanks!

you can build sparse matrix:

in [27]: df out[27]:     products       date                description 0  product  2016-9-12  steel, red, high hardness 1  product b  2016-9-11   blue, lightweight, steel 2  product c  2016-9-12                        red  in [28]: (df.set_index(['products','date'])    ....:    .description.str.split(',\s*', expand=true)    ....:    .stack()    ....:    .reset_index()    ....:    .pivot_table(index=['products','date'], columns=0, fill_value=0, aggfunc='size')    ....: ) out[28]: 0                    blue  high hardness  lightweight  red  steel products  date product 2016-9-12     0              1            0    1      1 product b 2016-9-11     1              0            1    0      1 product c 2016-9-12     0              0            0    1      0  in [29]: (df.set_index(['products','date'])    ....:    .description.str.split(',\s*', expand=true)    ....:    .stack()    ....:    .reset_index()    ....:    .pivot_table(index=['products','date'], columns=0, fill_value='', aggfunc='size')    ....: ) out[29]: 0                   blue high hardness lightweight red steel products  date product 2016-9-12                  1               1     1 product b 2016-9-11    1                         1         1 product c 2016-9-12                                  1 

Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -