Spark scala filter tuples in a list -
i have rdd below
val m = sc.parallelize(seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2))))
i transformed above rdd using groupbykey below
val b = m.groupbykey.mapvalues( _.tolist)
result:
(a,list((x,1), (y,2), (z,2))) (b,list((x,1), (y,2)))
now, want filter tuples max values in each list expected result be
(a,list((y,2), (z,2))) (b,list((y,2)))
considering sequence given is: val m = seq(("a",("x",1)), ("a",("y",2)), ("a",("z",2)), ("b",("x",1)),("b",("y",2)))
val r1 = m.groupby(_._1) .map { case (k, v) => k -> v.map(_._2) } .map { case (k, v) => k -> { val sorted = v.sortwith { case (x, y) => x._2 > y._2 } val max = sorted.head._2 sorted.takewhile(_._2 == max) } } .tolist
which gives result as: r1: list[(string, seq[(string, int)])] = list((b,list((y,2))), (a,list((y,2), (z,2))))
Comments
Post a Comment