parse string for key, value pairs with a known key delimiter

How can I convert a string to a dict, if key strings are known substrings with definite delimiters? Example:

s = 'k1:text k2: more text k3:andk4: more yet'
key_list = ['k1','k2','k3']
(missing code)
# s_dict = {'k1':'text', 'k2':'more text', 'k3':'andk4: more yet'}  

In this case, keys must be preceded by a space, newline, or be the first character of the string and must be followed (immediately) by a colon, else they are not parsed as keys. Thus in the example, k1,k2, and k3 are read as keys, while k4 is part of k3's value. I've also stripped trailing white space but consider this is optional.

1 answer

  • answered 2018-02-21 05:45 coldspeed

    You can use re.findall to do this:

    >>> import re
    >>> dict(re.findall(r'(?:(?<=\s)|(?<=^))(\S+?):(.*?)(?=\s[^\s:]+:|$)', s))
    {'k1': 'text', 'k2': ' more text', 'k3': 'andk4: more yet'}
    

    The regular expression requires a little trial-and-error. Stare at it long enough, and you'll understand what it's doing.

    Details

    (?:          
       (?<=\s)   # lookbehind for a space 
       |         # regex OR
       (?<=^)    # lookbehind for start-of-line
    )     
    (\S+?)       # non-greedy match for anything that isn't a space
    :            # literal colon
    (.*?)        # non-greedy match
    (?=          # lookahead (this handles the third key's case)
       \s        # space  
       [^\s:]+   # anything that is not a space or colon
       :         # colon
       |
       $         # end-of-line
    )