Doubling and actions of the RDD
Two RDD: One RDD contains {1, 2, 3} and the other RDD contains {3, 4, 5}.
The function name | function | example | The results of |
---|---|---|---|
map() | Apply the function to each element | rdd.map(x=>x+1) | {2,3,4,4} |
intersection() | Intersection RDD. Intersection computes (other) | {3} | |
subtract() | Take elements that exist in the first RDD but do not exist in the second RDD (use scenarios, machine learning, remove training sets) | rdd.subtract(other) | {1, 2} |
cartesian() | Cartesian product | rdd.cartesian(other) | {(1, 3), (1, 4),… (3, 5)} |
colletc() | Returns all elements of the RDD | rdd.collect() | ,2,3,3 {1} |
count() | count | rdd.count() | 4 |
countByValue() | Returns a map of the number of occurrences of unique elements | rdd.countByValue() | {(1,1),(2,1), (3,2)} |
take(num) | Returns num elements | rdd.take(2) | {1, 2} |
top(num) | Returns the first num elements | rdd.top(2) | {3, 3} |
takeOrdered(num)(ordering) | Returns the first few elements of the sorting algorithm based on | rdd.takeOrdered(2)(MyOrdering) | {3, 3} |
takeSample(withReplacement,num,[seed]) | Sample cases | rdd.takeSample(false, 1) | Not sure |
reduce(func) | Merge the elements in the RDD | rdd.reduce((x, y ) => x+y ) | 9 |
fold(zero)(func) | Zero value is provided similar to reduce() | rdd.flod(0)((x, y ) => x+y ) | 9 |
aggregate(zeroValue)(seqOp,combOp) | Similar to fold(), returns a different type | rdd.aggregate((0, 0) (x, y)) => (x._1 +y , x._2 + 1), (x, y)=>(x._1 +y ._1, x._2 + y._2), | (9, 4) |
foreach(func) | The function for each element of the RDD is not returned | rdd.foreach(func) | Nothing |