Python and encoding handling (Strings)


I reached my frustration threshold. I am writing on python a small app that extracts hundreds of strings from a model. I have not problem retrieving the data. My problem is when I try to create and write csv's using that data. It seems like I come across lots of special characters like Unicode, ascii, iso-8859-1, etc. So, when I try to use "write", I get errors that say unknown u'\xe7 (for instance).

text = FAÇADE
while open (filepath,'w') as f:

This results on an error where that particular character "Ç" is unknown. I manage to find this character and handle it using text.decode('iso-8859-1'), but this only fixes one-word amount thousands. Now, I have one character that, as I could understand from my research on the internet is a null, is shown as unknown character u'\x00. My datasets are not static, meaning they get updated weekly, and we get new projects that have new files that get updated weekly, we are talking of thousands of strings. So, tracking down error by error does not seem like a sustainable solution.

I am wondering if there is a way to handle all these specials characters. I fail to find a solution on internet, in most of the cases, the person has already de character decoded, and they one to be able to write it. In my case, all the characters are getting encoded, and I have not idea how to identify them and handle them. Any idea??

Thank you in advance.


  • It is python 2.