This article enumerates several common problems and provides optimization solutions, and recommends two sets of test performance optimization tools

Question:

  1. Resource analysis of Spark task file initialization shows that the first stage takes up to 14 seconds, and the CPU and network communication costs a certain amount, which is inconsistent with the application code logic.
  2. Spark task scheduling optimizes resource analysis and finds that only one CPU of stage2 server is in use and the CPU of other servers is idle
  3. Task assignment algorithm tuning when doing log analysis found that there were always one or two non-local tasks that executors picked up at the end. For example, the last two tasks A [2,3,1] and B[1,3,4], Executor[1] [2], when Executor[1] receives task A, then Executor[2] receives task B as non-local. Solution: Sort tasks in partial order and reassign [spark-2193]
  4. A large number of server CPU resources are consumed by the SYS type. This is because some Linux versions of T ransparent Huge Page are in the enable state by default. Transparent Huge Pages Echo never > /sys/kernel/mm/ Transparent_hugePage /enabled Echo never > /sys/kernel/mm/ is disabled Transparent_hugepage/defrag Transparent Huge Page Transparent Huge Page Is enabled Transparent Huge Page is closed.
  5. Network adapter tuning resource analysis, found that a lot of work time consumed in network transmission.

www.slidestalk.com/s/Spark3674…