1 Filter outliers in advance
2 Increase the parallelism of shuffle
3 Two-stage aggregation (local aggregation + global aggregation) [This operation is only applicable to shuffle operations of aggregation, and the scope is relatively narrow.]
4 Exception join…… On-rand ()*100000 random value
5 broadcast join
6 spark. SQL. The adaptive. SkewedJoin. Enabled = true, if a key records from the upstream stage output is greater than the threshold, will launch a few more reduce processing records of the key.
end