Spark scala filter tuples in a list -

i have rdd below

val m = sc.parallelize(seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2))))

i transformed above rdd using groupbykey below

val b = m.groupbykey.mapvalues( _.tolist)

result:

(a,list((x,1), (y,2), (z,2))) (b,list((x,1), (y,2)))

now, want filter tuples max values in each list expected result be

(a,list((y,2), (z,2))) (b,list((y,2)))

considering sequence given is: val m = seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2)))

val r1 =    m.groupby(_._1)    .map { case (k, v) => k -> v.map(_._2) }    .map { case (k, v) =>       k -> {         val sorted = v.sortwith { case (x, y) => x._2 > y._2 }        val max = sorted.head._2         sorted.takewhile(_._2 == max)       }    }    .tolist

which gives result as: r1: list[(string, seq[(string, int)])] = list((b,list((y,2))), (a,list((y,2), (z,2))))

Search This Blog

Tomorrow

Spark scala filter tuples in a list -

Comments

Post a Comment

Popular posts from this blog

javascript - Thinglink image not visible until browser resize -

firebird - Error "invalid transaction handle (expecting explicit transaction start)" executing script from Delphi -

Sound is not coming out while implementing Text-to-speech in Android activity -