Boisclair: Python webpage source read with special characters

Sunday, 18 August 2013

Python webpage source read with special characters

Python webpage source read with special characters

I am reading a page source from a webpage, then parsing a value from that
source. There I am facing a problem with special characters.
In my python controller file iam using # -*- coding: utf-8 -*-. But I am
reading a webpage source which is using charset=iso-8859-1
So when I read the page content without specifying any encoding it is
throwing error as UnicodeDecodeError: 'utf8' codec can't decode byte 0xfc
in position 133: invalid start byte
when I use string.decode("iso-8859-1").encode("utf-8") then it is parsing
data without any error. But it is displaying the value as 'F\u00fcnke'
instead of 'Fünke'.
Please let me know how I can solve this issue. I would greatly appreciate
any suggestions.

Boisclair

Sunday, 18 August 2013

Python webpage source read with special characters

No comments:

Post a Comment