In order to sort multidimensional, we need to consider multiple conditions, which require us to customize the key

1 23
3 22
3 31
1 12
2 11
4 45
Copy the code

Second, use Java implementation

2.1. Customize keys

The Serializable interface is implemented using the Scala.math. Ordered interface

package com.chb.sparkDemo.secondarySort; import java.io.Serializable; import scala.math.Ordered; /** * Public class MyKey implements Ordered<MyKey>, Serializable{ private int firstKey; private int secondKey; public MyKey(int firstKey, int secondKey) { super(); this.firstKey = firstKey; this.secondKey = secondKey; } public int getFirstKey() { return firstKey; } public int getSecondKey() { return secondKey; } public void setFirstKey(int firstKey) { this.firstKey = firstKey; } public void setSecondKey(int secondKey) { this.secondKey = secondKey; } public boolean $greater(MyKey other) { if (this.getFirstKey() > other.getFirstKey()) { return true; }else if(this.getFirstKey() == other.getFirstKey() && this.getSecondKey() > other.getSecondKey()){ return true; }else { return false; } } public boolean $greater$eq(MyKey other) { if ($greater(other) || this.getFirstKey()==other.getFirstKey() && this.getSecondKey() == other.getSecondKey()) { return true; } return false; } public boolean $less(MyKey other) { if (this.getFirstKey() < other.getFirstKey()) { return true; }else if(this.getFirstKey() == other.getFirstKey() && this.getSecondKey() < other.getSecondKey()){ return true; }else { return false; } } public boolean $less$eq(MyKey other) { if ($less(other) || this.getFirstKey()==other.getFirstKey() && this.getSecondKey() == other.getSecondKey()) { return true; } return false; } public int compare(MyKey other) { if (this.getFirstKey() ! = other.getFirstKey()) { return this.getFirstKey()-other.getFirstKey(); }else { return this.getSecondKey() - other.getSecondKey(); } } public int compareTo(MyKey other) { if (this.getFirstKey() ! = other.getFirstKey()) { return this.getFirstKey()-other.getFirstKey(); }else { return this.getSecondKey() - other.getSecondKey(); } } @Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + firstKey; result = prime * result + secondKey; return result; } @Override public boolean equals(Object obj) { if (this == obj) return true; if (obj == null) return false; if (getClass() ! = obj.getClass()) return false; MyKey other = (MyKey) obj; if (firstKey ! = other.firstKey) return false; if (secondKey ! = other.secondKey) return false; return true; }}Copy the code

2.2 concrete implementation steps

The first step: Step 2: To load the data to be reordered, follow step 3 of the RDD in

format: Step 4: Remove the sorted key and retain only the sorted result
,>

2.2.1 Step 1: Customize key to implement Scala.math. Ordered interface and Serializeable interface

JavaPairRDD<MyKey, String> mykeyPairs = lines.mapToPair(new PairFunction<String, MyKey, String>() { private static final long serialVersionUID = 1L; public Tuple2<MyKey, String> call(String line) throws Exception { int firstKey = Integer.valueOf(line.split(" ")[0]); int secondKey = Integer.valueOf(line.split(" ")[1]); MyKey mykey = new MyKey(firstKey, secondKey); return new Tuple2<MyKey, String>(mykey, line); }});Copy the code

2.2.2 Step 3: Use sortByKey to perform secondary sorting based on user-defined keys

    JavaPairRDD<MyKey, String> sortPairs = mykeyPairs.sortByKey();
Copy the code

2.2.3 Step 4: Remove the sorted key and only retain the sorted result

JavaRDD<String> result = sortPairs.map(new Function<Tuple2<MyKey,String>, String>() { private static final long serialVersionUID = 1L; public String call(Tuple2<MyKey, String> tuple) throws Exception { return tuple._2; //line } }); Result. foreach(new VoidFunction<String>() {private static final Long serialVersionUID = 1L; public void call(String line) throws Exception { System.out.println(line); }});Copy the code

Complete code

package com.chb.sparkDemo.secondarySort; import io.netty.handler.codec.http.HttpContentEncoder.Result; import java.awt.image.RescaleOp; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; import org.apache.spark.api.java.function.PairFunction; import org.apache.spark.api.java.function.VoidFunction; import scala.Tuple2; /** * Step 1: Second step: To load the data that is to be Ordered second, RDD in <key, value> format * third step: Use sortByKey to double sort based on the custom key @author 12285 * */ public class SecordSortTest {public static void main(String[] args) {SparkConf conf = new SparkConf().setMaster("local").setAppName("WordCount"); JavaSparkContext JSC = new JavaSparkContext(conf); JavaRDD<String> lines = jsc.textFile("C:\ Users\ 12285\ Desktop\ test"); //hadoopRDD The data to be loaded in a second order is <key, JavaPairRDD<MyKey, String> mykeyPairs = lines.mapToPair(new PairFunction<String, MyKey, String>() { private static final long serialVersionUID = 1L; public Tuple2<MyKey, String> call(String line) throws Exception { int firstKey = Integer.valueOf(line.split(" ")[0]); int secondKey = Integer.valueOf(line.split(" ")[1]); MyKey mykey = new MyKey(firstKey, secondKey); return new Tuple2<MyKey, String>(mykey, line); }}); JavaPairRDD<MyKey, String> sortPairs = mykeypairs.sortBykey (); // Step 4: JavaRDD<String> result = sortPairs. Map (new Function<Tuple2<MyKey,String>, String>() { private static final long serialVersionUID = 1L; public String call(Tuple2<MyKey, String> tuple) throws Exception { return tuple._2; //line } }); Result. foreach(new VoidFunction<String>() {private static final Long serialVersionUID = 1L; public void call(String line) throws Exception { System.out.println(line); }}); }} Results: 1 12 1 23 2 11 3 22 3 31 4 45Copy the code

Fourth, use Scala implementation

4.1. Customize keys

class SecordSortKey(val firstKey: Int, val secondKey: Int)extends Ordered[SecordSortKey] with Serializable{ override def compare(that: SecordSortKey):Int = { if(this.firstKey ! = that.firstKey) { this.firstKey - that.firstKey }else { this.secondKey - that.secondKey } } }Copy the code

4.2 Concrete implementation

import org.apache.spark.SparkConf import org.apache.spark.SparkContext object SecordSortTest { def main(args: Array[String]): Unit = { val conf = new SparkConf().setMaster("local[2]").setAppName("SecordSort") val sc = new SparkContext(conf); val lines = sc.textFile("C:\\Users\\12285\\Desktop\\test"); // Step 2: The data to be loaded in a second order is <key, Value > RDD val pairSortKey = lines.map {line => (new SecordSortKey(line.split(" ")(0).toint, line.split(" ")(1).toInt), line ) }; Val sortPair = pairsortkey.sortByKey (false); val sortResult = sortPair.map(line=>line._2); sortResult.collect().foreach { x => print(x) }; }}Copy the code