Pythonic way to get index of items where two list intersect

Say I have two list: one is a string -- 'example' and another is the alphabet. I'd like to find a more pythonic way where every position in the alphabet list each letter of the string list 'example' intersects and put these indices in a list. I.E.

  • e : 4
  • x : 23
  • a : 0
  • m : 12

etc...

So far I have:

import string
alphabet = list(string.ascii_lowercase)
key = list('example')

def convert(string, alphabet):
    table_l = []
    for char in string:
        for letter in alphabet:
            if letter == char:
                table_l.append(alphabet.index(letter))
    return table_l

convert(key, alphabet)

I've tried using set intersection, but the string 'key' can contain more than 1 of each letter, and I'm looking for indices, not which letters match.

So far, the best I've tried is:

for x in key:
    listed.append(set(alphabet).intersection(x))

I've no clue how to append the keys of alphabet where the value intersects with each letter of key.

Thanks

5 answers

  • answered 2018-02-13 00:44 John H

    Use sets.

    overlapKeys = set(alphabet) & set(key)
    listOfIndices = [alphabet.index(key) for key in overlapKeys]
    

    Also,

    key = list('example')
    

    is unneccessary. Strings are lists of characters. Use

    key = 'example'
    

  • answered 2018-02-13 00:45 juanpa.arrivillaga

    You want a mapping from letters to numbers, so use a mapping data-structure, e.g. a dict:

    >>> alphamap = dict(zip(alphabet, range(len(alphabet)))
    >>> alphamap
    {'h': 7, 'e': 4, 'g': 6, 'n': 13, 'm': 12, 's': 18, 'x': 23, 'r': 17, 'o': 14, 'f': 5, 'a': 0, 'v': 21, 't': 19, 'd': 3, 'j': 9, 'l': 11, 'b': 1, 'u': 20, 'y': 24, 'q': 16, 'k': 10, 'c': 2, 'w': 22, 'p': 15, 'i': 8, 'z': 25}
    >>> def convert(string, map_):
    ...     return  [map_[c] for c in string]
    ...
    >>> convert('example', alphamap)
    [4, 23, 0, 12, 15, 11, 4]
    

    Note, your original approach could be simplified to:

    >>> list(map(alphabet.index, 'example'))
    [4, 23, 0, 12, 15, 11, 4]
    

    However, using alphabet.index is less efficient than using a mapping (since it has to do a linear search each time rather than a constant-time hash).

    Also, note I've iterated over strings directly, no need to put them into a list, strings are sequences just like list objects. They can be iterated over, sliced, etc. However, they are immutable.

    Finally, the above approach will fail if there isn't a corresponding value, i.e. a special, non-alphabetic character.

    >>> convert("example!", alphamap)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in convert
      File "<stdin>", line 2, in <listcomp>
    KeyError: '!'
    

    This may or may not be desirable. Alternatively, you can approach this by using .get with a default-value, e.g:

    >>> def convert(string, map_, default=-1):
    ...     return  [map_.get(c, default) for c in string]
    ...
    >>> convert("example!", alphamap)
    [4, 23, 0, 12, 15, 11, 4, -1]
    

  • answered 2018-02-13 00:48 Guy

    If it’s all ascii, something like below should work - convert letter to numeric representation, then subtract 97 as that’s ‘a’ in ascii

    a = ord(‘a’)
    [ord(c)-a for c in ‘example’.lower()]
    

  • answered 2018-02-13 00:53 Kanak

    Somehow in the same spirit as Guy, what about counting in base 36 (and following DyZ's and mhawke's advices),

    >>> a = int('a', 36)
    >>> [int(c, 36) - a for c in 'example']
    [4, 23, 0, 12, 15, 11, 4]
    


    Note that this method is case insensitive, and works if it’s all ascii (which appears to be the case since you play with string.ascii_lowercase).

  • answered 2018-02-13 00:57 mhawke

    Your example seems a little off... wouldn't x be 23, m 12, etc?

    >>> s = 'example'
    >>> [(c, string.ascii_lowercase.index(c)) for c in s]    # as a list of tuples
    [('e', 4), ('x', 23), ('a', 0), ('m', 12), ('p', 15), ('l', 11), ('e', 4)]
    

    This would be a little inefficient for longer strings because the use of index() effectively makes this an O(n**2) solution.

    A better way is to use a lookup dictionary to convert from a character to its index. Because a dict lookup is O(1) the resulting solution will be O(n), which is much better.

    # create a dict that maps characters to indices
    indices = {c: index for index, c in enumerate(string.ascii_lowercase)}
    # perform the conversion
    >>> s = 'example'
    >>> [(c, indices.get(c, -1)) for c in s]
    [('e', 4), ('x', 23), ('a', 0), ('m', 12), ('p', 15), ('l', 11), ('e', 4)]
    

    If you wanted just the indices:

    >>> [indices.get(c, -1) for c in s]
    [4, 23, 0, 12, 15, 11, 4]