Problem in writing german text on Image using python and PIL
I want to read a text file and extract text from it which is in German and write it on a png image using PIL and python 2.7 but when writing to image using .text() I get unknown text whenever Ü or some foreign character comes. I have used arialunicodems.ttf for font.
Firstly, I was extracting text from image using Microsoft azure cognitive vision and using .encode('utf-8') on each word and combining words to make a sentence in English and then converting to German using mtranslate library of python. Then I used arialunicodems.ttf as font and used .text() function of PIL Image to draw text on png. It was drawing properly for German, Chinese, hindi etc. But then I wanted to add a functionality for the user to be able to change the translated text if it was not correctly translated. For that, I saved the original text and translated text in a .txt file and display the content of the txt file to user where user changes it if needed and the changed text is again saved to txt file. Then using another python program, I added text to the image. But, this time the text is coming gibberish whenever its Ü, it draws Ã☐ on the image. For hindi, its all gibberish. What could be the problem?
Working code: part where I was concatenating words to make a sentence(saved in variable text).
for word in word_infos: bbox = [int(num) for num in word["boundingBox"].split(",")] if bbox>=x and bbox>=y and bbox+bbox<=x+w and bbox+bbox<=y+h: text = text+word["text"].encode('utf-8')+" "
part where I was writing the text to image
im = Image.open("check.png") d = ImageDraw.Draw(im) helvetica = ImageFont.truetype("arialunicodems.ttf",10) d.text((x,y), mtranslate.translate(text, sys.argv, sys.argv), font=helvetica, fill=(0,0,0))
Not working code: Part where I was saving the extracted text to txt file
for word in word_infos: bbox = [int(num) for num in word["boundingBox"].split(",")] if bbox>=x and bbox>=y and bbox+bbox<=x+w and bbox+bbox<=y+h: text = text+word["text"].encode('utf-8')+" " file.write("orignaltext:"+text+"\n")
part where I was extracting text from txt file and writing on image
im = Image.open("check.png") d = ImageDraw.Draw(im) file2 = open("1.txt","r") printframe = file2.readlines() #j and traceorig is defined to extract text in loop orig = printframe[j*6+3][traceorig:len(printframe[j*6+3])-1].encode('utf-8') #xstr,ystr,r,g,b are extracted from image d.text((int(xstr),int(ystr)), mtranslate.translate(orig,"de","en").encode('utf-8'), font=helvetica, fill=(int(r), int(g), int(b)))
For "Overview" in english, I want
In german: Überblick
In hindi: अवलोकन
In updated code, when I print on terminal, it prints correctly but on image it writes
In german: Ã☐berblick
In hindi: Not able to find characters, please see the image link Hindi translated image.
A sample code to generate similar result
#!/usr/bin/python # -*- coding: utf-8 -*- from PIL import Image, ImageDraw, ImageFont, ImageFilter import cv2 import numpy as np import sys import os reload(sys) sys.setdefaultencoding('utf8') #file has only one line with text "Überblick" file1 = open("write.txt","w+") file1.write("Überblick") file1.close() file2 = open("write.txt","r") content = file2.readlines() file2.close() img = np.zeros((300,300,1), np.uint8) cv2.imwrite("stack.png",img) im = Image.open("stack.png") d = ImageDraw.Draw(im) helvetica = ImageFont.truetype("arialunicodems.ttf",50) d.text((0,100), content.encode('utf-8'), font=helvetica, fill="white") im.save("processed.png") os.remove("stack.png")
See the processed.png for the output. arialunicodems.ttf file