python 3.x - When a Pandas Data Frame is Subset, does it need to be reindex? -
i making multi-plot using pandas dataframe , matplotlib. however, when alter dataframe , remove 1 of items error:
valueerror: cannot reindex duplicate axis
my initial code following , plots great, have group (plot) don't need:
branchgroups = alldata['branchgroupings'].unique() fig2 = plt.figure(figsize = (15,15)) i,branchgroups in enumerate(branchgroups): ax = plt.subplot(3,3,i+1) idx = alldata['branchgroupings'] == branchgroups kmf.fit(t[idx], c[idx], label=branchgroups) kmf.plot(ax=ax, legend=false) plt.title(branchgroups) plt.xlabel('timeline in months') plt.xlim(0,150) fig2.tight_layout() fig2.suptitle('cumulative hazard function of employee groups', size = 16) fig2.subplots_adjust(top=0.88, hspace = .4) plt.show()
in branchgroups there 7 items when print them out:
['branchmgr', 'banker', 'service', 'mdoandrsm', 'sbrmandbbrm','fc', 'de']
the code above makes 7 plots nicely, don't need 'de' grouping (one plot each of groups).
so, did drop of de performing following:
#remove de data set node = alldata[alldata.branchgroupings != 'de']
this drops 'de' categories , reduces number of rows. head(), , looks great; new data frame.
then, modifying plot give 6 groups , plot reduced data frame node, used same code name changes fig3 rather fig2 , changed idx idxx prevent overwriting, otherwise it's same except new data frame reference node:
groups = node['branchgroupings'].unique() #new data frame node fig3 = plt.figure(figsize = (15,15)) i,groups in enumerate(groups): ax = plt.subplot(3,2,i+1) idxx = node['branchgroupings'] == groups #new idxx rather idx kmf.fit(t[idxx], c[idxx], label=groups) kmf.plot(ax=ax, legend=false) plt.title(groups) plt.xlabel('timeline in months') plt.xlim(0,150) if ==0: plt.ylabel('frac employed after $n$ months') if ==3: plt.ylabel('frac employed after $n$ months') fig3.tight_layout() fig3.suptitle('survivability of branch employees', size = 16) fig3.subplots_adjust(top=0.88, hspace = .4) plt.show()
except, error mentioned above
cannot reindex duplicate axis
and traceback shows associated line below:
kmf.fit(t[idxx], c[idxx], label=groups)
most due re-assignment line above it:
idxx = node['branchgroupings'] == groups
do need reset/drop or new data frame node reset this?
update - has been solved; not sure how 'pythonic' is, works:
okay, after more research on this, seems when slicing dataframe, there inheritance issue. found out post here.
initially, performing following:
node.index.is_unique
returns false
to make clean slice following steps needed:
#create slice using .copy node = alldata[['prodcat', 'duration', 'observed', 'branchgroupings']].copy() #remove de data set node = node.loc[node['branchgroupings'] != 'de'] #use .loc cleaner slice #reset index unique node['index'] = np.arange(len(node)) node = node.set_index('index')
now performing node.index.is_unique
returns true
, error gone.
Comments
Post a Comment