I want to replace letters/words, but I am facing challenges in one aspect of my code

I will be using lloll as an example word.

Here's my code:

mapping = {'ll':'o','o':'ll'}
string = 'lloll'
out = ' '.join(mapping.get(s,s) for s in string.split())
print(out)

The output should be ollo, but I get lloll. When I write ll o ll, it works, but I don't want spaces in between the ll and the o, and I don't want to do something like mapping = {'lloll':'ollo'}.

2 answers

  • answered 2022-05-06 17:18 timgeb

    Not sure if this accounts for all edge cases (what about overlapping matches?), but my current idea is to re.split the string by the mapping's keys and then apply the mapping.

    import re
    mapping = {'ll':'o','o':'ll'}
    string = 'lloll'
    choices = f'({"|".join(mapping)})'
    result = ''.join(mapping.get(s, s) for s in re.split(choices, string))
    

  • answered 2022-05-06 19:41 MusicalNinja

    Using Template.substitute

    I would find each occurrence of the sub-strings to be replaced and pre-fix them with a $. Then use substitute(mapping) from string.Template to effectively do a global find and replace on them all.

    Most importantly in this approach, you can control the way that potentially overlapping mappings are handled by using sorted() or reversed() with a key to sort the order in which they are applied.

    findall then split_string

    This way you also get a few nice extra generator functions to findall occurrences of a substring and to split_string into segments at given indices which may help with whatever larger task you are performing. If they aren't valuable there is shorter version at the bottom.

    Given that it uses generators throughout it should be pretty fast and memory efficient.

    from itertools import chain, repeat
    from string import Template
    
    
    def CharReplace(string: str, map):
        if ("$" in map) or ("$" in string):
            raise ValueError("Cannot handle anything with a $ sign")
        
        for old in map: #"old" is the key in the mapping
            locations = findall(string, old) #index of each occurance of "old"
            bits = split_string(string, locations) #the string split into segments, each starting with "old"
            template = zip(bits, repeat("$")) #tuples: (segment of the string, "$")
            string = ''.join(chain(*template))[:-1] #use chain(*) to unpack template so we get a new string with a $ in front of each occurance of "old" and strip the last "$" from the end
    
        template = Template(string)
        string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping
    
        return string
    
    
    def findall(string: str, sub: str):
        i = string.find(sub)
        while i != -1:
            yield i
            i = string.find(sub, i + len(sub))
    
    def split_string(string: str, indices):
        for i in indices:
            yield string[:i]
            string = string[i:]
        yield string
    

    This approach will not handle any strings with "$" in them without some extra code to escape them.

    It will run through the string from front to back, one key at a time in whatever order the dict iterates them. You could add some form of sorted() on the keys in the line for old in map in order to handle keys in a specific order (eg longest first, alphabetically).

    It will handle repeated occurrences of a key such that llll will be recognised as ll ll and lll as ll l

    In your original case this turn the original string first into $llo$ll then into $ll$o$ll before using substitute to get ollo

    Single Generator to add delimiters

    If you prefer your code to be shorter:

    def CharReplace2(string: str, map):
        if ("$" in map) or ("$" in string):
            raise ValueError("Cannot handle anything with a $ sign")
        
        for old in map: #"old" is the key in the mapping
            string = ''.join(add_delimiters(string, old)) #add a $ before each occurrence of old
        template = Template(string)
        string = template.substitute(map) #substitute replaces substrings which follow "$" based on a mapping
    
        return string
    
    def add_delimiters(string: str, sub: str):
        i = string.find(sub)
        while i != -1:
            yield string[:i] #string up to next occurrence of sub
            string = ''.join(('$',string[i:])) #add a dollar to the start of the rest of the string (starts with sub)
            i = string.find(sub, i + len(sub)) #and find the next occurrence
        yield(string) #don't forget to yield the last bit of the string
    

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum