web hosting and development forums
Web Design and Development Forums Archive


Index - Python programming forum

troubles with unicode


Post reply

Hi guys. I got some problems with parsing a xml file while running a script on both linux and windows. Should I save the file as utf-8 on both platforms?
Jackil, what is the encoding of the xml file? what is giving you the option of saving as another encoding And yes utf-8 should be OK as long as you read it back in as utf-8, and the xml parser accepts that
BigDaddy, hmm.. the encoding is specified as utf-8, but the file is fetched from a webserver though.
How are you fetching it?
I use urllib.urlopen
The result is a string, right?
Well, a filehandle Which I then have created a small xml parser for
When reading it in, you might want to get it into unicode with stringjustread.decode('utf-8')
Hmm.. interesting.
To answer the original question, whatever worked on windows should work on linux, except that the system default encoding for 8-bit strings might be "windows-1252" and not "ascii"
the system default encoding on window here was ascii I actually got some problems when parsing the xml-file, because it also contains a string to save another file.
Huh? what do you mean "a string to save another file" Embedded python code? a xml reference to another file?
sorry, one of the fields contains a filename Which might contain characters not found in ascii
How do you want to handle those? will your small homebuilt xml parser accept unicode strings?
I have no idea, I'm using xml.saxtils to create an xml handler. But I'm really confused if I should enforce utf-8 to avoid problems when both storing and reading the file.
The transition to unicode is a pain in the ass. luckily im an american and can just ignore it
BigDaddy, I'm really just looking for a solution to both be able to store filenames with local characters on both linux and windows, and the be able get the names back in a readable form. But this whole character thing really confuses me...
Jackil, if the xml file is coming back as utf-8 you need to be aware of that and read it in as utf-8 (by reading the string and then getting a new unicode string from that with .decode('utf-8')) Jackil, or verify (or teach) the xml parser knows utf-8 (i doubt it)
BigDaddy, oki, the filesystem on linux is using utf-8. But would it be easier to just let the web-server enforce iso?
The other issue is, is the other end really putting utf-8 into the xml file, or is it just the local filename in whatever encoding the filesystem or whatever generated the xml is using Because if the server is SAYING its utf-8 but its actually Latin-1 in the fields, you are in trouble ;)
yeah, I got a lot of errors with filenames and some Unicode coerce stuff
1. make sure the xml file is really utf8 by trying to decode it into Unicode with .decode('utf-8')... if this works there are at least no utf-8 errors 2. either convert the unicode to ascii with .encode('ascii', 'xmlcharrefreplace') to replace funny chars with &uXXXX; Or 3. run the unicode through the xml parser and hope it works Or 4. convert it to nationalized 8-bit text with .encode('latin-1') or somesuch, then run it through, then convert back to utf-8 later (maybe with .decode('latin-1').encode('utf-8'))
thank you :) I will surely investigate this.
Http://www.reportlab.com/i18n/python_unicode_tutorial.html Http://www.jorendorff.com/articles/unicode/python.html
yeah, been looking at those urls lately. Just one more question, do you know of a certain way to identify what encoding a file has?
That is a hard problem.... if it has 8-bit data in it and the utf-8 decoder works, its probably utf-8 If there is no 8-bit data it is most likely ascii
Oki, I see
If it has a unicode byte order mark (0xFFFE or 0xFEFF) at the beginning it is probably raw UCS-16 If you are in norway and its not utf-8 and not ascii, it might be iso-8859-15 Or who knows, it might be chinese in some two-byte encoding
Right... I just opened a can of worms ;)




Dubai Forum | Paris Forum | Webmaster Forum | Vegan Forum | Brisbane Forum | 3D Forum | Jobs in Dubai | Jobs in London | London UK Classifieds
Archive script by RedHo.com