Background on pit

The process is as follows:1. Obtain a DataFrame using SparkSQL; 2. Then map the DataFrame and call the GET interface to obtain IDs as a new DatdaFrame. 3. Map the DataFrame, invoke the Post interface in the map, and put the final result into the interface.

Hit the pit reason

Because of lazy, so when the spark – submit script directly copy another spark assignments submitted scripts, but never expected that this script, there is a set: – the conf “spark. Speculation = true”, didn’t notice that directly submitted to go up. Later, it was reported that the final Post interface had the problem of repeated calls. Some names would call the Post interface twice, while others would only call it once.

To solve

Finally, consult the boss to learn that when strictly executing only one calculation, you must turn off Spark detection execution! That is, don’t set spark. Speculation =true in code or scripts; spark is set to false by default.

why

When Spark starts the detection execution, it starts the second execution based on the execution time of the data slice. When data from Partition1 is executed over a certain period of time, but is still incomplete, an executor2 program will run data from Partition1, which runs first, and kill the remaining unfinished data to return the final result. Because my code, obtain the Post connection may be long, is beyond the scope of the test execution time, and part of the name outside of the realm of the test execution was launched executor2 calculation, although in the final return status results only a data, but it actually has invoked the twice, see return status has no results.

lesson

Don’t blind CV, all submitted things should be clear, careful thinking can be.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Spark pit vlog- Speculation run Spark. Speculation

Background on pit

Hit the pit reason

To solve

why

lesson

Spark pit vlog- Speculation run Spark. Speculation

Background on pit

Hit the pit reason

To solve

why

lesson

Related Posts

Technical practice how to solve the data error problem caused by uneven speed of asynchronous interface request?

Writing (11) outlining… Writing an outline is very important

50 creative card style website design appreciation