If possible, should parallel flows always be used? Recommended | Java Debug Notes

This article is participating in the Java Theme Month – Java Debug Notes Event, see the event link for details

If possible, should parallel flows always be used?

With Java 8 and Lambda, it’s easy to iterate over collections as streams, and it’s easy to use parallel streams. Two examples in Docs, the second using parallelStream:

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));
Copy the code

Is it always good to use parallelism as long as I don’t care about the order? One would argue that it would be quicker to divide work across more cores.

Any other considerations? When should parallel streams be used and when should non-parallel streams be used?

(Asking this question can start a discussion about how and when to use parallel streams, not because I think it’s a good idea to use parallel streams all the time.)

Answer:

$\color{purple} a lot of knowledge points, really need to write out just can master!! {~}$

Parallel streams are much more expensive than sequential streams. Coordinating threads takes a lot of time. I will use sequential streams by default and consider parallel streams only in the following cases

I work on a large number of projects (or the processing of each project takes time and can be parallelized) and I first encounter performance issuesCopy the code

I haven’t run the process in a multithreaded environment (for example, in a Web container, if I already have many requests being processed in parallel, adding an extra layer of parallelism to each request might have more negative effects than positive effects)

In your example, performance will be affected by the driver system.out.println () for synchronous access anyway, and there will be no impact, or even negative impact, on making this process parallel.

Also, keep in mind that parallel flows don’t magically solve all synchronization problems. If the predicates and functions used in this process use shared resources, you must ensure that everything is thread-safe. Side effects, in particular, you really have to worry about if you use them in parallel.

Whatever you do, don’t guess! Only metrics will tell you if parallelism is worth it.

Answer:

The Stream API is designed to simplify computation, simplify computation, and simplify switching between sequential and parallel.

But just because it’s easy doesn’t mean it’s always a good idea, in fact it’s a bad idea to put it all over the place just because you can. Parallel ().

First, note that parallelization has no benefit other than the possibility of faster execution when more kernels are available. Parallel execution always involves more work than sequential execution because, in addition to problem solving, it must also perform subtask dispatch and coordination. Hopefully, you can find out faster by spreading out the work of multiple processors. Whether this actually happens depends on a number of factors, including the size of the data set, the amount of computation performed on each element, and the nature of the computation (specifically, does the processing of one element interact with the processing of other elements?). , the number of processors available, and the number of other tasks competing with those processors.

Also, note that parallelism often exposes uncertainties in calculations that are often hidden by sequential implementations. Sometimes this doesn’t matter, or it can be mitigated by limiting the operations involved (that is, the reduction operators must be stateless and associative).

In fact, parallelism sometimes speeds up your computation, sometimes it doesn’t, and sometimes it slows it down. It is best to develop with sequential execution first, and then apply parallelism to it

(A) You know that improving performance is actually beneficial, and

(B) actually provides higher performance.

(A) A business problem, not A technical one. If you’re a performance expert, you can usually look at the code and determine (B), but the smart choice is to measure. (And, don’t bother even before you are sure (A); If the code is fast enough, it’s best to apply your brain loops elsewhere.)

The simplest performance model for parallelism is the “NQ” model, where N is the number of elements and Q is the amount of computation per element. In general, you need the product NQ to exceed a certain threshold before you can begin to gain performance advantages. For problems like low Q “from the sum of the numbers 1 to N”, you will see the average break-even between N=1000 and N=10000. For higher Q problems, you will see break-even at lower thresholds.

But the reality is rather complicated. So before you get expert insight, determine when sequential processing actually costs you, and then weigh whether parallelism helps.

The article translated from kgs4h5t57thfb6iyuz6dqtun5y ac4c6men2g7xr2a – stackoverflow – com. Translate. Goog/questions / 2…

The authors suggest: Don’t abuse parallel streams

$\color{red} Welcome to my column StackOverFlow, I will filter the quality of the interview test!! {~}$

$\color{red} has the latest, elegant implementation, and I will also write my opinion on this question at the end of the article {~}$

Thank you for reading this, if this article is well written and if you feel there is something to it

Ask for a thumbs up 👍 ask for attention ❤️ ask for share 👥 for 8 abs I really very useful!!

If there are any mistakes in this blog, please comment, thank you very much! ❤ ️ ❤ ️ ❤ ️ ❤ ️

If possible, should parallel flows always be used? Recommended | Java Debug Notes

Related Posts

Breadth-first and depth-first traversal of graphs (BFS and DFS)

JVM learning diary ⭐️ read the Shenandoah collector execution process ⭐️

[discussion] Why doesn’t Apollo just return the updated result from the Client’s long polling synchronization?