both are used to repartition the RDD to avoid full shuffle we can use coalesce.
val rdd1=sc.parallelize(1 to 1000,15)
rdd1.partitions.length
val rdd2=rdd1.coalesce(5,false)
rdd2.partitions.length
output
=====
int=15
int=5
for example if we are going to repartition 100 to 10 ,10 partition will claim the available resource to achieve the same if we use coalesce().
No comments:
Post a Comment