rdd.zipWithIndex.filter(_._2==9).map(_._1).first()
The first function transforms the RDD into a pair (value, idx) with idx going from 0 onwards. The second function takes the element with idx==9 (the 10th). The third function takes the original value. Then the result is returned.
The first function could be pulled up by the execution engine and influence the behavior of the whole processing. Give it a try.
In any case, if n is very large, this method is efficient in that it does not require to collect an array of the first n elements in the driver node.
//yet to be modify
No comments:
Post a Comment