Replacing values that fail a match test using Python's regex library

I have a very large string consiting of a series of numbers separated by one or more spaces. Some of the numbers are equal to -123, and the rest can be any random number.

example_string = "102.3  42.89  98  812.7  374  5  -123  8  -123  13  -123  21..."

I would like to replace the values that are not equal to -123 with 456 in the most efficient way possible.

updated_example_string = "456  456  456  456  456  456  -123  456  -123  456  -123  456..."

I know that python's regular expression library has a sub method that will replace matching values quite efficiently. Is there a way to replace values that DO NOT match? As I mentioned, this is a rather large string, coming from a source file around 100MB. Assuming there's a way to use re.sub to accomplish this task, is that even the correct/most efficient way of handling such problem?

2 answers

  • answered 2019-01-11 06:00 Nick

    You can use this regex:

    (^|\s)(?!-123(\s|$))-?[0-9.]+(?=\s|$)
    

    It looks for the start of string or a space, not followed by -123 and space of end of string (using a negative lookahead) then some number of digits or a ., followed by either a space or end of string.

    Then you can replace with \g<1>456 to turn all those numbers into 456. The \g<1> in the replacement preserves any space captured by the first group.

    Demo on regex101

    In Python:

    import re
    string = "102.3  42.89 -1234 98  -812.7  374  5  -123  8  -123  13  -123  21 -123"
    print re.sub(r'(^|\s)(?!-123(\s|$))-?[0-9.]+(?=\s|$)', '\g<1>456', string)
    

    Output

    456  456 456 456  456  456  456  -123  456  -123  456  -123  456 -123
    

    Demo on rextester

  • answered 2019-01-11 13:34 The fourth bird

    You could match only the numbers between whitspace boundaries and the use re.sub with a callback function to check if the match is not -123. If it not, relace it with 456

    (?<!\S)-?\d+(?:\.\d+)?(?!\S)
    

    Explanation

    • (?<!\S) Negative lookbehind to assert what is on the left is not a non-whitespace character
    • -? Optional -
    • \d+(?:\.\d+)? Match 1+ digits with an optional part that matches a . and 1+ digits
    • (?!\S) Negative lookahead to assert what is on the right is not a non-whitespace character

    Example

    import re
    pattern = r"(?<!\S)-?\d+(?:\.\d+)?(?!\S)"
    s = "102.3  42.89  98  812.7  374  5  -123  8  -123  13  -123  21"
    
    print(re.sub(pattern, lambda m: "456" if m.group() != "-123" else m.group(), s))
    

    Result

    456  456  456  456  456  456  -123  456  -123  456  -123  456
    

    See the Regex demo | Python demo