python - Compare 2 Pandas dataframes, row by row, cell by cell -


i have 2 dataframes, df1 , df2, , want following, storing results in df3:

for each row in df1:      each row in df2:          create new row in df3 (called "df1-1, df2-1" or whatever) store results           each cell(column) in df1:               cell in df2 column name same cell in df1:                  compare cells (using comparing function func(a,b) ) and,                  depending on result of comparison, write result                  appropriate column of "df1-1, df2-1" row of df3) 

for example, like:

df1   b    c      d foo bar  foobar 7 gee whiz herp   10  df2   b   c      d zoo car foobar 8  df3 df1-df2             b              c                   d foo-zoo func(foo,zoo) func(bar,car)  func(foobar,foobar) func(7,8) gee-zoo func(gee,zoo) func(whiz,car) func(herp,foobar)   func(10,8) 

i've started this:

for r1 in df1.iterrows():     r2 in df2.iterrows():         c1 in r1:             c2 in r2: 

but not sure it, , appreciate help.

so continue discussion in comments, can use vectorization, 1 of selling points of library pandas or numpy. ideally, shouldn't ever calling iterrows(). little more explicit suggestion:

# df1 , df2 provided above, example df3 = df1['a'] * 3 + df2['a']  # recall df2 has 1 row pandas broadcast nan there df3 0    foofoofoozoo 1             nan name: a, dtype: object  # more  # know df1 , df2 share column names, can initialize df3 names df3 = pd.dataframe(columns=df1.columns)  colname in df1:     df3[colname] = func(df1[colname], df2[colname])  

now, have different functions applied different columns by, say, creating lambda functions , zipping them column names:

# example functions colafunc = lambda x, y: x + y colbfunc = lambda x, y; x - y .... columnfunctions = [colafunc, colbfunc, ...]  # initialize df3 above df3 = pd.dataframe(columns=df1.columns) func, colname in zip(columnfunctions, df1.columns):     df3[colname] = func(df1[colname], df2[colname]) 

the "gotcha" comes mind need sure function applicable data in columns. instance, if df1['a'] - df2['a'] (with df1, df2 have provided), raise valueerror subtraction of 2 strings undefined. aware of.


edit, re: comment: doable well. iterate on dfx.columns larger, don't run keyerror, , throw if statement in there:

# other jazz # let's df1 [['a', 'b', 'c']] , df2 [['a', 'b', 'c', 'd']] # iterate on df2 columns colname in df2:     if colname not in df1:         df3[colname] = np.nan # sure import numpy np     else:         df3[colname] = func(df1[colname], df2[colname])   

Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -