calculating Gini coefficient in Python/numpy -
i'm calculating gini coefficient (similar to: python - gini coefficient calculation using numpy) odd result. uniform distribution sampled np.random.rand(), gini coefficient 0.3 have expected close 0 (perfect equality). going wrong here?
def g(v): bins = np.linspace(0., 100., 11) total = float(np.sum(v)) yvals = [] b in bins: bin_vals = v[v <= np.percentile(v, b)] bin_fraction = (np.sum(bin_vals) / total) * 100.0 yvals.append(bin_fraction) # perfect equality area pe_area = np.trapz(bins, x=bins) # lorenz area lorenz_area = np.trapz(yvals, x=bins) gini_val = (pe_area - lorenz_area) / float(pe_area) return bins, yvals, gini_val v = np.random.rand(500) bins, result, gini_val = g(v) plt.figure() plt.subplot(2, 1, 1) plt.plot(bins, result, label="observed") plt.plot(bins, bins, '--', label="perfect eq.") plt.xlabel("fraction of population") plt.ylabel("fraction of wealth") plt.title("gini: %.4f" %(gini_val)) plt.legend() plt.subplot(2, 1, 2) plt.hist(v, bins=20) for given set of numbers, above code calculates fraction of total distribution's values in each percentile bin.
the result:
uniform distributions should near "perfect equality" lorenz curve bending off.
this expected. random sample uniform distribution not result in uniform values (i.e. values relatively close each other). little calculus, can shown expected value (in statistical sense) of gini coefficient of sample uniform distribution on [0, 1] 1/3, getting values around 1/3 given sample reasonable.
you'll lower gini coefficient sample such v = 10 + np.random.rand(500). values close 10.5; relative variation lower sample v = np.random.rand(500). in fact, expected value of gini coefficient sample base + np.random.rand(n) 1/(6*base + 3).
here's simple implementation of gini coefficient. uses fact gini coefficient half relative mean absolute difference.
def gini(x): # (warning: concise implementation, o(n**2) # in time , memory, n = len(x). *don't* pass in huge # samples!) # mean absolute difference mad = np.abs(np.subtract.outer(x, x)).mean() # relative mean absolute difference rmad = mad/np.mean(x) # gini coefficient g = 0.5 * rmad return g here's gini coefficient several samples of form v = base + np.random.rand(500):
in [80]: v = np.random.rand(500) in [81]: gini(v) out[81]: 0.32760618249832563 in [82]: v = 1 + np.random.rand(500) in [83]: gini(v) out[83]: 0.11121487509454202 in [84]: v = 10 + np.random.rand(500) in [85]: gini(v) out[85]: 0.01567937753659053 in [86]: v = 100 + np.random.rand(500) in [87]: gini(v) out[87]: 0.0016594595244509495 
Comments
Post a Comment