This article enumerates several common problems and provides optimization solutions, and recommends two sets of test performance optimization tools
Question:
- Resource analysis of Spark task file initialization shows that the first stage takes up to 14 seconds, and the CPU and network communication costs a certain amount, which is inconsistent with the application code logic.
- Spark task scheduling optimizes resource analysis and finds that only one CPU of stage2 server is in use and the CPU of other servers is idle
- Task assignment algorithm tuning when doing log analysis found that there were always one or two non-local tasks that executors picked up at the end. For example, the last two tasks A [2,3,1] and B[1,3,4], Executor[1] [2], when Executor[1] receives task A, then Executor[2] receives task B as non-local. Solution: Sort tasks in partial order and reassign [spark-2193]
- A large number of server CPU resources are consumed by the SYS type. This is because some Linux versions of T ransparent Huge Page are in the enable state by default. Transparent Huge Pages Echo never > /sys/kernel/mm/ Transparent_hugePage /enabled Echo never > /sys/kernel/mm/ is disabled Transparent_hugepage/defrag Transparent Huge Page Transparent Huge Page Is enabled Transparent Huge Page is closed.
- Network adapter tuning resource analysis, found that a lot of work time consumed in network transmission.
www.slidestalk.com/s/Spark3674…