Introduction
I'm currently developing a program in which I use Java.util.Collection.parallelStream(), and wondering if it's possible to make it more Multi-threaded.
Several small map
I was wondering if using multiple map might allow the Java.util.Collection.parallelStream() to distribute the tasks better:
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(gson::toJson)
.map(Document::parse)
.map(InsertOneModel::new)
.toList();
Single big map
For example a better distribution than:
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(puzzle -> new InsertOneModel<>(Document.parse(gson.toJson(puzzle))))
.toList();
Question
Is there one of the solutions that is more suitable for Java.util.Collection.parallelStream(), or the two have no big difference?
CodePudding user response:
I looked into the Stream source code. The result of a map operation is just fed into the next operation. So there is almost no difference between one big map() call or several small map() calls.
And for the map() operation a parallel Stream makes no difference at all. Meaning each input object will be processed until the end by the same Thread in any case.
Also note: A parallel Stream only splits up the work if the operation chain allows it and there is enough data to process. So for a small Collection or a Collection that allows no random access, a parallel Stream behaves like a sequential Stream.
CodePudding user response:
I doubt that there is much different in performance, but even if you proved it did have quicker performance I would still prefer to see and use the first style in code I had to maintain.
The first multi-map style is easier for others to understand, it is easier to maintain and easier to debug - for example adding peek stages for any stage of the processing chain.
List<InsertOneModel<Document>> bulkWrites = puzzles.parallelStream()
.map(gson::toJson)
// easy to make changes for debug, moving peek up/down
// .peek(System.out::println)
.map(Document::parse)
// easy to filter:
// .filter(this::somecondition)
.map(InsertOneModel::new)
.toList();
If your requirements change - such as needing to filter the output, or capture the intermediate data by splitting to 2 collections, the first approach beats second every time.
