Regex - match if at least 2 words out of N words, in any order

I'm trying to create a regex expression that will create a match if a string has at least 2 words out of N. For example, take the words ('one', 'two', 'three', 'four'). This regex should return a match for all these cases:

one two three four
twothreeone
two plus two is four

It should not return a match for:

one
three plus three is three

I have tried something like this'/^(?=.*one)(?=.*two)(?=.*three)(?=.*four).+/', but this will only match if all words ('one', 'two', 'three', 'four') are contained in the string.

3 answers

  • answered 2018-02-13 07:03 cnst

    Apologies for stealing someone's comment, but it does appear to work!

    In Perl/PCRE you can use a reference to a subpattern in a capture group with (?n) where n is the number of the capture group. So: (one|two|three|four).*(?!\1)(?1). In the worst case, you don't have to type everything twice when you know the shortcuts ctrl+c and ctrl+v – Casimir et Hippolyte 4 hours ago

    % pcretest 
    PCRE version 8.35 2014-04-04
    
      re> #(one|two|three|four).*(?!\1)(?1)#
    data> one one one
    No match
    data> one two one
     0: one two
     1: one
    data> one four
     0: one four
     1: one
    data> four four
    No match
    data> ^D
    %
    

    Indeed, in pcre, which is a popular library used by nginx (the only dependency of the whole nginx port in OpenBSD ports!) and lots of other software, you can use something like (?1) (or (?-1)) to refer to the previous pattern, so, you don't have to copy-paste the thing several times, as well as the negative look-ahead, which is just standard fare.

    Here's the docs on the features at stake — you may want to look into the pcrepattern and pcresyntax manual pages, sections as below:

    etc.

    In general, the http://www.pcre.org/original/pcre.txt and http://www.pcre.org/pcre2.txt pages include complete documentation, and are helpful in searching up that syntax you've seen somewhere.

  • answered 2018-02-13 07:34 Bohemian

    Search for two copies of a target word, but capture the first and apply a negative lookahead on the second word using a back reference to the first group to assert that a different word appeared in the second group - making (at least) 2 in total.

    (one|two|three|four).*(?!\1)(one|two|three|four)
    

    See live demo.

  • answered 2018-02-13 10:42 Kyle Fairns

    (one|two|three|four).*(?!\1)(?-1)
    

    Explanation:

    • Capture one of the words in the group
    • Find any amount of characters
    • If you find what was matched in the last group don't match
    • Unless you find another match of the group one behind this one (recursive subpattern)

    This will mean when you edit it, you'll be able to just edit one capture group, assuming you're using PCRE regex (with say, PHP).

    Check out the demo