Stream API Java 8 Parallel Processing
I have 2 sets
phraseSet contains "eiffel tower", "tokyo tower"
wordSet contains words like "eiffel" , "tower"
How do I use Java 8 parallel stream to process logic like:
1. for each item in
phraseSet, tokenize it, see if all tokens exist in
wordSet, if so add that item to a new set called
In this example,
resultSet would contain "eiffel tower"
It's easy to do if i do using traditional for loop, but i am confused when attempting it using parallel stream, which i hope is faster too since it's processed in parallel.
allMatchwould be sufficient:
Set<String> phrases = new HashSet<>(Arrays.asList("eifel tower", "tokyo tower")); Set<String> words = new HashSet<>(Arrays.asList("eifel", "tower")); Pattern delimiter = Pattern.compile("\\s+"); Set<String> resultSet = phrases.parallelStream().filter( phrase -> delimiter.splitAsStream(phrase).allMatch(words::contains) ).collect(Collectors.toSet());
You could use
Set<String> resultSet = phraseSet.stream() .filter(s->wordSet.equals(Stream.of(s.split("\\s"))//wordSet.containsAll(...) .collect(Collectors.toSet()))) .collect(Collectors.toSet());
The simplest solution would be
Set<String> resultSet = phraseSet.stream() .filter(s -> wordSet.containsAll(Arrays.asList(s.split("\\s+")))) .collect(Collectors.toSet());
You may turn this to parallel processing by replacing
parallelStream(), but you would need a rather large input set to get a benefit from parallel processing.
Note that this simple solution may do unnecessary work if you have a lot of non-matching phrases as it will create all substrings before checking whether they are contained in
wordSet. A solution like Flown’s will defer the creation of the substrings, so it can be skipped when encountering a word not contained in
wordSet(also known as short-circuiting). Another performance improvement would be moving the creation of the
Patternout of the stream processing and re-using it (a
Patternis also created behind the scenes when using a method like
String.splitas in above solution).
Pattern whiteSpace = Pattern.compile("\\s+"); Predicate<String> inWordSet = wordSet::contains; Set<String> resultSet = phraseSet.stream() .filter(phrase -> whiteSpace.splitAsStream(phrase).allMatch(inWordSet)) .collect(Collectors.toSet());