Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.5.online -jd2.4.5.16-202012212053 /_/ Using Scala version 2.11.12 (Java HotSpot(TM) 64-bit Server VM, Java 1.8.0_121) Type in expressions to have them Evaluated. Type :help for more information.Copy the code
union
def union(other: RDD[T]): RDD[T]
This is a simple function that merges the two RDDS without deduplicating them.
Scala > var rdd1 = sc.makerdd (1 to 2,1) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[45] at makeRDD at :21 scala> rdd1.collect res42: Array[Int] = Array(1, 2) scala> var rdd2 = sc.makerdd (2 to 3,1) rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[46] at makeRDD at :21 scala> rdd2.collect res43: Array[Int] = Array(2, 3) scala> rdd1.union(rdd2).collect res44: Array[Int] = Array(1, 2, 2, 3)Copy the code
intersection
def intersection(other: RDD[T]): RDD[T]
def intersection(other: RDD[T], numPartitions: Int): RDD[T]
def intersection(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
This function returns the intersection of two RDD’s, and is de-duplicated.
The numPartitions parameter specifies the number of partitions of the RDD returned.
The partitioner is used to specify the partitioning function
Scala > var rdd1 = sc.makerdd (Seq(1,2,2,3)) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[2] at makeRDD at <console>:24 scala> var rdd2 = Sc. MakeRDD (Seq,3,3,4,5 (2)) rdd2: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[3] at makeRDD at <console>:24 scala> var rdd3 = rdd1.intersection(rdd2).collect; rdd3: Array[Int] = Array(2, 3)Copy the code
subtract
def subtract(other: RDD[T]): RDD[T]
def subtract(other: RDD[T], numPartitions: Int): RDD[T]
def subtract(other: RDD[T], partitioner: Partitioner)(implicit ord: Ordering[T] = null): RDD[T]
Intersection this function is similar to intersection, but returns elements that appear in the RDD and do not appear in the otherRDD, without reweighting.
Parameter meaning is the same as intersection
Scala > var rdd1 = sc.makerdd (Seq(1,1,1,2,2,3,6,6,7)) org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[16] at makeRDD at <console>:24 scala> var rdd3 = rdd1.subtract(rdd2).collect; rdd3: Array[Int] = Array(1, 1, 1, 6, 6, 7)Copy the code
end