python - Insert rows and add missing data -
i wonder if give few pointers on how proceed following. being newbie pandas, feel @ moment overall knowledge , skill level not sufficient @ moment able process request outline below.
i have pandas dataframe has list of 2000+ part numbers. each part there years of sale parts, month number, quantity sold , sales value. each year, there occasional missing months. in example data shown below year 2007,month 11 missing there no sales during month. 2008, months 11 & 12 missing. insert missing months each year , insert row containing appropriate year, month , 0 value qty , sales within each part_id group.
in total data approx. 60200, rows approx. 2000 part id's. not mind spending time on developing solution few pointers aid education.
index part_id year month qty sales 60182 zzssl 2007 5 11.0 724.85 60183 zzssl 2007 6 7.0 537.94 60184 zzssl 2007 7 17.0 1165.02 60185 zzssl 2007 8 3.0 159.56 60186 zzssl 2007 9 67.0 4331.28 60187 zzssl 2007 10 72.0 4582.98 60188 zzssl 2007 12 42.0 2651.42 60189 zzssl 2008 1 22.0 1422.32 60190 zzssl 2008 2 16.0 1178.98 60191 zzssl 2008 3 20.0 1276.60 60192 zzssl 2008 4 28.0 2120.84 60193 zzssl 2008 5 2.0 83.03 60194 zzssl 2008 6 16.0 1250.24 60195 zzssl 2008 9 17.0 1323.34 60196 zzssl 2008 10 2.0 197.98 60197 zzssl 2009 1 21.0 1719.30 60198 zzssl 2009 2 1.0 78.15 60199 zzssl 2009 3 3.0 281.34 60200 zzssl 2009 4 25.0 2214.25 60201 zzssl 2009 5 10.0 833.60 60202 zzssl 2009 6 1.0 83.36 60203 zzssl 2009 7 1.0 83.36
i think need first set_index
, unstack
, reindex
columns multiindex
created from_product
stack
:
mux = pd.multiindex.from_product([['qty','sales'],np.arange(1,13)]) print (df.set_index(['part_id','year', 'month']) .unstack(fill_value=0) .reindex(columns=mux, fill_value=0) .stack() .rename_axis(['part_id','year','month']) .reset_index())
part_id year month qty sales 0 zzssl 2007 1 0.0 0.00 1 zzssl 2007 2 0.0 0.00 2 zzssl 2007 3 0.0 0.00 3 zzssl 2007 4 0.0 0.00 4 zzssl 2007 5 11.0 724.85 5 zzssl 2007 6 7.0 537.94 6 zzssl 2007 7 17.0 1165.02 7 zzssl 2007 8 3.0 159.56 8 zzssl 2007 9 67.0 4331.28 9 zzssl 2007 10 72.0 4582.98 10 zzssl 2007 11 0.0 0.00 11 zzssl 2007 12 42.0 2651.42 12 zzssl 2008 1 22.0 1422.32 13 zzssl 2008 2 16.0 1178.98 14 zzssl 2008 3 20.0 1276.60 15 zzssl 2008 4 28.0 2120.84 16 zzssl 2008 5 2.0 83.03 17 zzssl 2008 6 16.0 1250.24 18 zzssl 2008 7 0.0 0.00 19 zzssl 2008 8 0.0 0.00 20 zzssl 2008 9 17.0 1323.34 21 zzssl 2008 10 2.0 197.98 22 zzssl 2008 11 0.0 0.00 23 zzssl 2008 12 0.0 0.00 24 zzssl 2009 1 21.0 1719.30 25 zzssl 2009 2 1.0 78.15 26 zzssl 2009 3 3.0 281.34 27 zzssl 2009 4 25.0 2214.25 28 zzssl 2009 5 10.0 833.60 29 zzssl 2009 6 1.0 83.36 30 zzssl 2009 7 1.0 83.36 31 zzssl 2009 8 0.0 0.00 32 zzssl 2009 9 0.0 0.00 33 zzssl 2009 10 0.0 0.00 34 zzssl 2009 11 0.0 0.00 35 zzssl 2009 12 0.0 0.00
if need missing values between start , end month
each year
:
df['month'] = pd.to_datetime(df.month.astype(str) + '-01-' + df.year.astype(str)) df = df.set_index('month') .groupby(['part_id','year']) .resample('ms') .asfreq() .fillna(0) .drop(['part_id','year'], axis=1) .reset_index() df['month'] = df['month'].dt.month print (df) part_id year month qty sales 0 zzssl 2007 5 11.0 724.85 1 zzssl 2007 6 7.0 537.94 2 zzssl 2007 7 17.0 1165.02 3 zzssl 2007 8 3.0 159.56 4 zzssl 2007 9 67.0 4331.28 5 zzssl 2007 10 72.0 4582.98 6 zzssl 2007 11 0.0 0.00 7 zzssl 2007 12 42.0 2651.42 8 zzssl 2008 1 22.0 1422.32 9 zzssl 2008 2 16.0 1178.98 10 zzssl 2008 3 20.0 1276.60 11 zzssl 2008 4 28.0 2120.84 12 zzssl 2008 5 2.0 83.03 13 zzssl 2008 6 16.0 1250.24 14 zzssl 2008 7 0.0 0.00 15 zzssl 2008 8 0.0 0.00 16 zzssl 2008 9 17.0 1323.34 17 zzssl 2008 10 2.0 197.98 18 zzssl 2009 1 21.0 1719.30 19 zzssl 2009 2 1.0 78.15 20 zzssl 2009 3 3.0 281.34 21 zzssl 2009 4 25.0 2214.25 22 zzssl 2009 5 10.0 833.60 23 zzssl 2009 6 1.0 83.36 24 zzssl 2009 7 1.0 83.36
Comments
Post a Comment