Sunday, 14 April 2019

Java 8 Streams.... Are ParallelStreams always effective?

Consider the following 4 points before going for parallel streams:
  • Splitting / decomposition costs :: Sometimes splitting is more expensive than just doing the work!
  • Task dispatch / management costs :: Can do a lot of work in the time it takes to hand work to another thread
  • Result combination costs :: Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.
  • Locality :: You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn't gain anything by parallelization. That's why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

In simpler words, go for parallelStream if and only if

  • I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)
  • I have a performance problem in the first place
  • I don't already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)
Also note a relatively simple formula to determine a chance of parallel speedup.

NQ Model:
N x Q > 10000
where, N = number of data items and Q = amount of work per item

No comments:

Post a Comment

Volatile keyword, Synchronized keyword and Lock interface

Volatile Marking a member variable as volatile, makes sure, when ever the member variable is accessed, its value is always read from the...