Spark scala filter tuples in a list -


i have rdd below

val m = sc.parallelize(seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2)))) 

i transformed above rdd using groupbykey below

val b = m.groupbykey.mapvalues( _.tolist) 

result:

(a,list((x,1), (y,2), (z,2))) (b,list((x,1), (y,2)))  

now, want filter tuples max values in each list expected result be

(a,list((y,2), (z,2))) (b,list((y,2))) 

considering sequence given is: val m = seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2)))

val r1 =    m.groupby(_._1)    .map { case (k, v) => k -> v.map(_._2) }    .map { case (k, v) =>       k -> {         val sorted = v.sortwith { case (x, y) => x._2 > y._2 }        val max = sorted.head._2         sorted.takewhile(_._2 == max)       }    }    .tolist 

which gives result as: r1: list[(string, seq[(string, int)])] = list((b,list((y,2))), (a,list((y,2), (z,2))))


Comments

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

Sound is not coming out while implementing Text-to-speech in Android activity -