Thursday, 25 April 2019

coalesce() vs repartition()


both are used to repartition the RDD to avoid full shuffle we can use coalesce.

val rdd1=sc.parallelize(1 to 1000,15)
rdd1.partitions.length

val rdd2=rdd1.coalesce(5,false)
rdd2.partitions.length

output
=====
int=15
int=5


for example if we are going to repartition 100 to 10 ,10 partition will claim the available resource to achieve the same if we use coalesce(). 

No comments:

Post a Comment