Regex - match if at least 2 words out of N words, in any order
I'm trying to create a regex expression that will create a match if a string has at least 2 words out of N. For example, take the words ('one', 'two', 'three', 'four'). This regex should return a match for all these cases:
one two three four twothreeone two plus two is four
It should not return a match for:
one three plus three is three
I have tried something like this
'/^(?=.*one)(?=.*two)(?=.*three)(?=.*four).+/', but this will only match if all words
('one', 'two', 'three', 'four') are contained in the string.
Apologies for stealing someone's comment, but it does appear to work!
In Perl/PCRE you can use a reference to a subpattern in a capture group with (?n) where n is the number of the capture group. So: (one|two|three|four).*(?!\1)(?1). In the worst case, you don't have to type everything twice when you know the shortcuts ctrl+c and ctrl+v – Casimir et Hippolyte 4 hours ago
% pcretest PCRE version 8.35 2014-04-04 re> #(one|two|three|four).*(?!\1)(?1)# data> one one one No match data> one two one 0: one two 1: one data> one four 0: one four 1: one data> four four No match data> ^D %
Indeed, in pcre, which is a popular library used by nginx (the only dependency of the whole nginx port in OpenBSD ports!) and lots of other software, you can use something like
(?-1)) to refer to the previous pattern, so, you don't have to copy-paste the thing several times, as well as the negative look-ahead, which is just standard fare.
Here's the docs on the features at stake — you may want to look into the
pcresyntaxmanual pages, sections as below:
(?!...) negative look ahead
(?n) call subpattern by absolute number
Search for two copies of a target word, but capture the first and apply a negative lookahead on the second word using a back reference to the first group to assert that a different word appeared in the second group - making (at least) 2 in total.
See live demo.
- Capture one of the words in the group
- Find any amount of characters
- If you find what was matched in the last group don't match
- Unless you find another match of the group one behind this one (recursive subpattern)
This will mean when you edit it, you'll be able to just edit one capture group, assuming you're using PCRE regex (with say, PHP).
Check out the demo