python - Insert rows and add missing data -


i wonder if give few pointers on how proceed following. being newbie pandas, feel @ moment overall knowledge , skill level not sufficient @ moment able process request outline below.

i have pandas dataframe has list of 2000+ part numbers. each part there years of sale parts, month number, quantity sold , sales value. each year, there occasional missing months. in example data shown below year 2007,month 11 missing there no sales during month. 2008, months 11 & 12 missing. insert missing months each year , insert row containing appropriate year, month , 0 value qty , sales within each part_id group.
in total data approx. 60200, rows approx. 2000 part id's. not mind spending time on developing solution few pointers aid education.

index                     part_id  year     month    qty           sales 60182                       zzssl  2007      5       11.0          724.85    60183                       zzssl  2007      6        7.0          537.94    60184                       zzssl  2007      7       17.0         1165.02    60185                       zzssl  2007      8        3.0          159.56    60186                       zzssl  2007      9       67.0         4331.28    60187                       zzssl  2007     10       72.0         4582.98    60188                       zzssl  2007     12       42.0         2651.42    60189                       zzssl  2008      1       22.0         1422.32    60190                       zzssl  2008      2       16.0         1178.98    60191                       zzssl  2008      3       20.0         1276.60    60192                       zzssl  2008      4       28.0         2120.84    60193                       zzssl  2008      5        2.0           83.03    60194                       zzssl  2008      6       16.0         1250.24    60195                       zzssl  2008      9       17.0         1323.34    60196                       zzssl  2008     10        2.0          197.98    60197                       zzssl  2009      1       21.0         1719.30    60198                       zzssl  2009      2        1.0           78.15    60199                       zzssl  2009      3        3.0          281.34    60200                       zzssl  2009      4       25.0         2214.25    60201                       zzssl  2009      5       10.0          833.60    60202                       zzssl  2009      6        1.0           83.36    60203                       zzssl  2009      7        1.0           83.36 

i think need first set_index, unstack , reindex columns multiindex created from_product stack:

mux = pd.multiindex.from_product([['qty','sales'],np.arange(1,13)])  print (df.set_index(['part_id','year', 'month'])          .unstack(fill_value=0)          .reindex(columns=mux, fill_value=0)          .stack()          .rename_axis(['part_id','year','month'])          .reset_index()) 
   part_id  year  month   qty    sales 0    zzssl  2007      1   0.0     0.00 1    zzssl  2007      2   0.0     0.00 2    zzssl  2007      3   0.0     0.00 3    zzssl  2007      4   0.0     0.00 4    zzssl  2007      5  11.0   724.85 5    zzssl  2007      6   7.0   537.94 6    zzssl  2007      7  17.0  1165.02 7    zzssl  2007      8   3.0   159.56 8    zzssl  2007      9  67.0  4331.28 9    zzssl  2007     10  72.0  4582.98 10   zzssl  2007     11   0.0     0.00 11   zzssl  2007     12  42.0  2651.42 12   zzssl  2008      1  22.0  1422.32 13   zzssl  2008      2  16.0  1178.98 14   zzssl  2008      3  20.0  1276.60 15   zzssl  2008      4  28.0  2120.84 16   zzssl  2008      5   2.0    83.03 17   zzssl  2008      6  16.0  1250.24 18   zzssl  2008      7   0.0     0.00 19   zzssl  2008      8   0.0     0.00 20   zzssl  2008      9  17.0  1323.34 21   zzssl  2008     10   2.0   197.98 22   zzssl  2008     11   0.0     0.00 23   zzssl  2008     12   0.0     0.00 24   zzssl  2009      1  21.0  1719.30 25   zzssl  2009      2   1.0    78.15 26   zzssl  2009      3   3.0   281.34 27   zzssl  2009      4  25.0  2214.25 28   zzssl  2009      5  10.0   833.60 29   zzssl  2009      6   1.0    83.36 30   zzssl  2009      7   1.0    83.36 31   zzssl  2009      8   0.0     0.00 32   zzssl  2009      9   0.0     0.00 33   zzssl  2009     10   0.0     0.00 34   zzssl  2009     11   0.0     0.00 35   zzssl  2009     12   0.0     0.00 

if need missing values between start , end month each year:

df['month'] = pd.to_datetime(df.month.astype(str) + '-01-'                                                    + df.year.astype(str)) df = df.set_index('month')        .groupby(['part_id','year'])        .resample('ms')        .asfreq()        .fillna(0)        .drop(['part_id','year'], axis=1)        .reset_index() df['month'] = df['month'].dt.month  print (df)    part_id  year  month   qty    sales 0    zzssl  2007      5  11.0   724.85 1    zzssl  2007      6   7.0   537.94 2    zzssl  2007      7  17.0  1165.02 3    zzssl  2007      8   3.0   159.56 4    zzssl  2007      9  67.0  4331.28 5    zzssl  2007     10  72.0  4582.98 6    zzssl  2007     11   0.0     0.00 7    zzssl  2007     12  42.0  2651.42 8    zzssl  2008      1  22.0  1422.32 9    zzssl  2008      2  16.0  1178.98 10   zzssl  2008      3  20.0  1276.60 11   zzssl  2008      4  28.0  2120.84 12   zzssl  2008      5   2.0    83.03 13   zzssl  2008      6  16.0  1250.24 14   zzssl  2008      7   0.0     0.00 15   zzssl  2008      8   0.0     0.00 16   zzssl  2008      9  17.0  1323.34 17   zzssl  2008     10   2.0   197.98 18   zzssl  2009      1  21.0  1719.30 19   zzssl  2009      2   1.0    78.15 20   zzssl  2009      3   3.0   281.34 21   zzssl  2009      4  25.0  2214.25 22   zzssl  2009      5  10.0   833.60 23   zzssl  2009      6   1.0    83.36 24   zzssl  2009      7   1.0    83.36 

Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -