Finding Palindromes in a large text

So we got a text with about 600000 characters, that is without "space" and fullstops. I've removed those from the text. Now i gotta find all palindromes of length greater than 7 in that text, and I need a little help of how to get that done.

I have already tried one thing, but that was way too slow.

from string import ascii_letters, digits

s = open("pg1150.txt").read()
s = s.lower()
s = "".join([c for c in s if c in ascii_letters or c in digits])
for i in range(len(s)):
    for j in range(i + 6, len(s) + 1):
        t = s[i:j]
        if t == t[::-1]: 
            print(t)

The input text is: http://www.gutenberg.org/cache/epub/1150/pg1150.txt

1 answer

  • answered 2019-02-10 13:03 Ria

    Note the if a string s0...sn is a palindrome, then s1...sn-1 is also a palindrome

    In short, iterate through your file searching every valid palindrome of length 7 and length 8 (thanks to @tobias_k that noted that otherwise you'll only get the odd palindromes), but instead of printing it, save its index to a separate list.

    for i in range(len(s) - 8):
        t1 = s[i:i+7]
        t2 = s[i:i+8]
    
        if t1 == t1[::-1]: 
            index_table.append(i)
        if t2 == t2[::-1]: 
            index_table.append(i)
    
    #You have to ensure that the only substring of length 7 that goes unchecked isn't a palindrome
    if s[-7:] == s[-7:][::-1]:
        index_table.append(len(s) - 7)
    
    

    Now that you have your "base" for all future palindromes, it's easy to use the recursive relationship mentioned earlier to construct all other palindromes.

    for i in index_table:
        n = 0
        while (s[i-n] == s[i+6+n]):
            print(s[i-n:i+6+n])