calculating Gini coefficient in Python/numpy -


i'm calculating gini coefficient (similar to: python - gini coefficient calculation using numpy) odd result. uniform distribution sampled np.random.rand(), gini coefficient 0.3 have expected close 0 (perfect equality). going wrong here?

def g(v):     bins = np.linspace(0., 100., 11)     total = float(np.sum(v))     yvals = []     b in bins:         bin_vals = v[v <= np.percentile(v, b)]         bin_fraction = (np.sum(bin_vals) / total) * 100.0         yvals.append(bin_fraction)     # perfect equality area     pe_area = np.trapz(bins, x=bins)     # lorenz area     lorenz_area = np.trapz(yvals, x=bins)     gini_val = (pe_area - lorenz_area) / float(pe_area)     return bins, yvals, gini_val  v = np.random.rand(500) bins, result, gini_val = g(v) plt.figure() plt.subplot(2, 1, 1) plt.plot(bins, result, label="observed") plt.plot(bins, bins, '--', label="perfect eq.") plt.xlabel("fraction of population") plt.ylabel("fraction of wealth") plt.title("gini: %.4f" %(gini_val)) plt.legend() plt.subplot(2, 1, 2) plt.hist(v, bins=20) 

for given set of numbers, above code calculates fraction of total distribution's values in each percentile bin.

the result:

enter image description here

uniform distributions should near "perfect equality" lorenz curve bending off.

this expected. random sample uniform distribution not result in uniform values (i.e. values relatively close each other). little calculus, can shown expected value (in statistical sense) of gini coefficient of sample uniform distribution on [0, 1] 1/3, getting values around 1/3 given sample reasonable.

you'll lower gini coefficient sample such v = 10 + np.random.rand(500). values close 10.5; relative variation lower sample v = np.random.rand(500). in fact, expected value of gini coefficient sample base + np.random.rand(n) 1/(6*base + 3).

here's simple implementation of gini coefficient. uses fact gini coefficient half relative mean absolute difference.

def gini(x):     # (warning: concise implementation, o(n**2)     # in time , memory, n = len(x).  *don't* pass in huge     # samples!)      # mean absolute difference     mad = np.abs(np.subtract.outer(x, x)).mean()     # relative mean absolute difference     rmad = mad/np.mean(x)     # gini coefficient     g = 0.5 * rmad     return g 

here's gini coefficient several samples of form v = base + np.random.rand(500):

in [80]: v = np.random.rand(500)  in [81]: gini(v) out[81]: 0.32760618249832563  in [82]: v = 1 + np.random.rand(500)  in [83]: gini(v) out[83]: 0.11121487509454202  in [84]: v = 10 + np.random.rand(500)  in [85]: gini(v) out[85]: 0.01567937753659053  in [86]: v = 100 + np.random.rand(500)  in [87]: gini(v) out[87]: 0.0016594595244509495 

Comments

Popular posts from this blog

php - isset function not working properly -

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -