1 Filter outliers in advance

2 Increase the parallelism of shuffle

3 Two-stage aggregation (local aggregation + global aggregation) [This operation is only applicable to shuffle operations of aggregation, and the scope is relatively narrow.]

4 Exception join…… On-rand ()*100000 random value

5 broadcast join

6 spark. SQL. The adaptive. SkewedJoin. Enabled = true, if a key records from the upstream stage output is greater than the threshold, will launch a few more reduce processing records of the key.

end