algorithm - Calculating cannibalization/attribution in R -


i have metric distributed among 4 categories a, b, c, d.

over period of time track movement in metric each category. sum of these movements represents quantity has either left or entered system elsewhere ('external').

# setup -------------------------------------------------------------------  categories <- letters[1:4] set.seed(1) movements <- lapply(categories, function(...) {round(runif(10, -10,10))*10}) names(movements) <- categories movements[['external']] <- reduce(`+`, movements)*-1 problem <- as.data.frame(movements) problem         b    c   d external 1  -50 -60   90   0       20 2  -30 -60  -60  20      130 3   10  40   30   0      -80 4   80 -20  -70 -60       70 5  -60  50  -50  70      -10 6   80   0  -20  30      -90 7   90  40 -100  60      -90 8   30 100  -20 -80      -30 9   30 -20   70  40     -120 10 -90  60  -30 -20       80 

where categories have undergone positive movements , others have undergone negative movements, can infer transfer within system.

# add transfer columns , initialise 0 --------------------------------  transfer_matrix <- combn(c(categories, 'external'), 2) transfer_list <- combn(c(categories, 'external'), 2, simplify=f) problem[,sapply(transfer_list, paste, collapse='.')] <- 0 paste(names(problem), collapse=', ')  [1] "a, b, c, d, external, a.b, a.c, a.d, a.external, b.c, b.d, b.external, c.d, c.external, d.external" 

for example a has decreased 50 , c has increased 90, can infer there transfer a c stored in variable a.c.

the rule calculating transfers proportional. when 'a' has decrease 50 , b has decreased 60, 50/(50+60) of increase in c should attributed 'a', , 60/(50+60) of increase in c should attributed b. , transfers in , out of system.

below shows full manual calculation of variables need, first row:

# manual calculation ------------------------------------------------------  row_limit <- 1  # change e.g. 1:10 problem[row_limit, 'a.b'] <- 0 problem[row_limit, 'a.c'] <- 90*(-50/(-50+-60)) problem[row_limit, 'a.d'] <- 0 problem[row_limit, 'a.external'] <- 20 * -50/(-50+-60) problem[row_limit, 'b.c'] <- 90*(-60/(-50+-60)) problem[row_limit, 'b.d'] <- 0  problem[row_limit, 'b.external'] <- 20 * -60/(-50+-60) problem[row_limit, 'c.d'] <- 0 problem[row_limit, 'c.external'] <- 0 problem[row_limit, 'd.external'] <- 0 

note since a.c = -c.a subset of possible transfers need calculated.

my question is, how can code above calculations programmatically, in concise , efficient way handle 10-20 categories , large number of rows?

i typically use data.table open suggestions of packages use.

below code checking output:

# checking ----------------------------------------------------------------  check <- function(problem, category, categories, transfer_list, transfer_matrix) {   out_columns <- sapply(transfer_list[transfer_matrix[1,] == category], paste, collapse='.')   in_columns <- sapply(transfer_list[transfer_matrix[2,] == category], paste, collapse='.')   stopifnot(length(c(out_columns, in_columns)) == length(categories)-1)    out_sum <- 0   if(length(out_columns) == 1) {     out_sum <- problem[,out_columns]   } else if(length(out_columns) > 1) {     out_sum <- reduce(`+`, problem[,out_columns])   }    in_sum <- 0        if(length(in_columns) == 1) {     in_sum <- problem[,in_columns]   }   else if(length(in_columns) > 1) {     in_sum <- reduce(`+`, problem[,in_columns])   }    lhs <- out_sum - in_sum   rhs <- -problem[, category]   sprintf('%s vs %s',lhs, rhs) }  # each category, actual vs expected sapply(c(categories,'external'), check, problem=problem, categories=c(categories,'external'), transfer_list=transfer_list,transfer_matrix=transfer_matrix)                 b           c            d           [1,] "50 vs 50" "60 vs 60"  "-90 vs -90" "0 vs 0"    [2,] "0 vs 30"  "0 vs 60"   "0 vs 60"    "0 vs -20"  [3,] "0 vs -10" "0 vs -40"  "0 vs -30"   "0 vs 0"    [4,] "0 vs -80" "0 vs 20"   "0 vs 70"    "0 vs 60"   [5,] "0 vs 60"  "0 vs -50"  "0 vs 50"    "0 vs -70"  [6,] "0 vs -80" "0 vs 0"    "0 vs 20"    "0 vs -30"  [7,] "0 vs -90" "0 vs -40"  "0 vs 100"   "0 vs -60"  [8,] "0 vs -30" "0 vs -100" "0 vs 20"    "0 vs 80"   [9,] "0 vs -30" "0 vs 20"   "0 vs -70"   "0 vs -40" [10,] "0 vs 90"  "0 vs -60"  "0 vs 30"    "0 vs 20"  

here 1 idea. believe output matches want.

#x row problem df #y column transfer_matrix check_pairs <- function(x,y){     #split y columns being compared . e.g. if col 1 'd' vs 'external', ...     <- y[1]  #would 'd'     b <- y[2]  #would 'external'     #if both pos, both neg, or 1 val 0, return 0     if( sign(x[a]) == sign(x[b]) | sign(x[[a]]) == 0){         return(0)     }else{ #else return formula manual calculation         return( x[[b]] * x[[a]] / sum( x[sign(x)==sign(x[[a]]) ] ) )     } }  #for each row of problem matrix, compare each column of transfer_matrix check_matrix_cols <- function(x){      return( apply(transfer_matrix, 2, function(y) check_pairs(x,y)) ) }  problem[,-seq(length(c(categories, 'external')))] <- t( apply(problem, 1, check_matrix_cols) )  sapply(c(categories,'external'), check, problem=problem, categories=c(categories,'external'), transfer_list=transfer_list,transfer_matrix=transfer_matrix)                       b              c            d            external        [1,] "50 vs 50"   "60 vs 60"     "-90 vs -90" "0 vs 0"     "-20 vs -20"    [2,] "30 vs 30"   "60 vs 60"     "60 vs 60"   "-20 vs -20" "-130 vs -130"  [3,] "-10 vs -10" "-40 vs -40"   "-30 vs -30" "0 vs 0"     "80 vs 80"      [4,] "-80 vs -80" "20 vs 20"     "70 vs 70"   "60 vs 60"   "-70 vs -70"    [5,] "60 vs 60"   "-50 vs -50"   "50 vs 50"   "-70 vs -70" "10 vs 10"      [6,] "-80 vs -80" "0 vs 0"       "20 vs 20"   "-30 vs -30" "90 vs 90"      [7,] "-90 vs -90" "-40 vs -40"   "100 vs 100" "-60 vs -60" "90 vs 90"      [8,] "-30 vs -30" "-100 vs -100" "20 vs 20"   "80 vs 80"   "30 vs 30"      [9,] "-30 vs -30" "20 vs 20"     "-70 vs -70" "-40 vs -40" "120 vs 120"   [10,] "90 vs 90"   "-60 vs -60"   "30 vs 30"   "20 vs 20"   "-80 vs -80"   

Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

mongodb - How to keep track of users making Stripe Payments -