Extract a matching substring in a python string

I'm trying to extract a substring from a large string that matches my pattern.

text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'

pattern = 'dumbweb.com'

here i'm trying to find the string that matches pattern

theLink = re.findall(pattern, text)
print(theLink)  //output: dumbweb.com

but i'm only able to find the exact text that i'm searching with, i'm trying to get the full string split by space

desired output:

theLink //www.dumbweb.com/Dumbo

i tired searching for similar question but i'm not able to phrase it right, i even looked up the Python Regex still not able to achieve what i'm looking for.

5 answers

  • answered 2021-06-23 07:06 anubhava

    You may consider this approach:

    import re
    text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
    pattern = 'dumbweb.com'
    
    rex = re.compile(r'\b' + r'\S*' + re.escape(pattern) + r'\S*')
    print (rex.findall(text))
    

    Output:

    ['dumbweb.com/Dumbo']
    

    Explanation:

    • re.compile(...): compiles a given string regex pattern
    • r'\b': Word boundary
    • r'\S*': Match 0 or more non-whitespace characters
    • re.escape(pattern): Perform regex escape of the given string
    • r'\S*': Match 0 or more non-whitespace characters

  • answered 2021-06-23 07:06 mousetail

    You could try this:

    [^ ]*dumbweb\.com[^ ]*
    

    Note that in regex a . matches any character. You need to use \. to match only a literal period

  • answered 2021-06-23 07:07 Saravanan

    Your pattern should be

    pattern = "www\.dumbweb\.com[^\\s]*"
    

    This will print the link starting from www.dumbweb.com until there's a trailing space

  • answered 2021-06-23 07:11 Jacek Błocki

    Try this:

    re.search('dumbweb.com[\S]*', text).group() 
    # matches your string followed by any character but white space 
    

  • answered 2021-06-23 07:12 kelyen

    Probably not the cleanest solution:

    text = 'This is a large subsring. bla bla bla AND www.dumbweb.com/Dumbo and www.otherLinks.com...'
    
    pattern = 'dumbweb.com'
    
    for word in text.split():
        if word.find(pattern) > 0:
            print(word)