Finding Palindromes in a large text
So we got a text with about 600000 characters, that is without "space" and fullstops. I've removed those from the text. Now i gotta find all palindromes of length greater than 7 in that text, and I need a little help of how to get that done.
I have already tried one thing, but that was way too slow.
from string import ascii_letters, digits s = open("pg1150.txt").read() s = s.lower() s = "".join([c for c in s if c in ascii_letters or c in digits]) for i in range(len(s)): for j in range(i + 6, len(s) + 1): t = s[i:j] if t == t[::-1]: print(t)
The input text is: http://www.gutenberg.org/cache/epub/1150/pg1150.txt
Note the if a string s0...sn is a palindrome, then s1...sn-1 is also a palindrome
In short, iterate through your file searching every valid palindrome of length 7 and length 8 (thanks to @tobias_k that noted that otherwise you'll only get the odd palindromes), but instead of printing it, save its index to a separate list.
for i in range(len(s) - 8): t1 = s[i:i+7] t2 = s[i:i+8] if t1 == t1[::-1]: index_table.append(i) if t2 == t2[::-1]: index_table.append(i) #You have to ensure that the only substring of length 7 that goes unchecked isn't a palindrome if s[-7:] == s[-7:][::-1]: index_table.append(len(s) - 7)
Now that you have your "base" for all future palindromes, it's easy to use the recursive relationship mentioned earlier to construct all other palindromes.
for i in index_table: n = 0 while (s[i-n] == s[i+6+n]): print(s[i-n:i+6+n])