redho home | products | services
Web Design Forums

Web Design Forums  


Web Hosting, Web Design, Software and Web Development Forums  
 FAQFAQ   MemberlistArchive  Log inLog in   RegisterRegister 
         

troubles with unicode


Goto page Previous  1, 2, 3
 
Post new topic   Reply to topic    Web Design Forums -> Python programming forum
View previous topic :: View next topic  
Author Message
Jackil



Joined: 24 May 2006
Posts: 97
yeah, I got a lot of errors with filenames and some Unicode coerce stuff
  Reply with quote


BigDaddy



Joined: 26 May 2006
Posts: 147
1. make sure the xml file is really utf8 by trying to decode it into Unicode with .decode('utf-8')... if this works there are at least no utf-8 errors
2. either convert the unicode to ascii with .encode('ascii', 'xmlcharrefreplace') to replace funny chars with &uXXXX;
Or 3. run the unicode through the xml parser and hope it works
Or 4. convert it to nationalized 8-bit text with .encode('latin-1') or somesuch, then run it through, then convert back to utf-8 later (maybe with .decode('latin-1').encode('utf-8'))
  Reply with quote
Jackil



Joined: 24 May 2006
Posts: 97
thank you :) I will surely investigate this.
  Reply with quote
BigDaddy



Joined: 26 May 2006
Posts: 147
Http://www.reportlab.com/i18n/python_unicode_tutorial.html
Http://www.jorendorff.com/articles/unicode/python.html
  Reply with quote
Jackil



Joined: 24 May 2006
Posts: 97
yeah, been looking at those urls lately. Just one more question, do you know of a certain way to identify what encoding a file has?
  Reply with quote
BigDaddy



Joined: 26 May 2006
Posts: 147
That is a hard problem.... if it has 8-bit data in it and the utf-8 decoder works, its probably utf-8
If there is no 8-bit data it is most likely ascii
  Reply with quote
Jackil



Joined: 24 May 2006
Posts: 97
Oki, I see
  Reply with quote
BigDaddy



Joined: 26 May 2006
Posts: 147
If it has a unicode byte order mark (0xFFFE or 0xFEFF) at the beginning it is probably raw UCS-16
If you are in norway and its not utf-8 and not ascii, it might be iso-8859-15
Or who knows, it might be chinese in some two-byte encoding
  Reply with quote
Jackil



Joined: 24 May 2006
Posts: 97
Right... I just opened a can of worms ;)
  Reply with quote
Page 3 of 3 Goto page Previous  1, 2, 3
Post new topic   Reply to topic    Web Design Forums -> Python programming forum


Dubai Forums - Expat Help | Vegan Chat | Java Programming | Free 3D tutorials and 3d textures | Paris Forum | EU Forum
Free Dubai Classifieds | Free London Classifieds | Jobs in London

High Quality, Custom 3d animation and Web Design solutions Royal Quality Web Hosting Services Vegetarian and Animal Rights news

Powered by phpBB © 2001, 2005 phpBB Group