Showing posts with label Internationalization. Show all posts
Showing posts with label Internationalization. Show all posts

Friday, October 14, 2011

Notes of Internationalization (I18N) in Python

How to run xgettext, msgfmt and msguniq on Windows
  • GNU tools xgettext, msgfmt and msguniq are usually used together to extract translatable messages from source codes. They are also used in Python or Django, and shipped with most popular Linux distributions like Ubuntu. 
  • On Windows, you can install GnuWin32or MinGW, and add the 'xxx\bin' into the system PATH. Note, if you only add it to your user PATH, it will fails to be called by Django.
Add system language setting
  • On Windows, you need to add a system-wide environment variable LANG=en_US, if it's not there yet.
  • On Linux, the environment variable LANG may already be there, looks like en_US, or en_US.utf8. You can check this by shell command: "echo $LANG". However, apache on Linux is likely overwriting this setting to something else. If this is the case, please make a change to /etc/apache2/envvars (if apache is installed there) as the following: "export LANG=xxx" => "export LANG=en_US", where xxx possibly is C.
Change Python's default encoding
  • Python's default encoding is ASCII, which needs to be changed to UTF-8. We can change this setting in sitecustomize.py, which is automatically loaded every time python starts. 
  • Make sure that there is a sitecustomize.py in your PYTHONPATH, otherwise create one. Add following codes to it:
import sys
sys.setdefaultencoding('utf-8')

Wednesday, October 5, 2011

msgfmt fatal error: UTF-8 with BOM

When we use msgfmt to compile a translation file, i.e., *.po file, it's easily be to occur an error:
msgfmt: found 1 fatal error *.po :1:2: syntax error
Usually we are encoding our translation files in UTF-8. However, if you try to save a text file in UTF-8 on Windows, e.g., in Notepad, the file will be prepended with a BOM. BOM (Byte Order Mark) appears in HEX format as "EF BB BF". However, msgfmt only accepts UTF-8 without BOM, and BOM will bring msgfmt a fatal error. This error is often frustrating because BOM's invisibility in normal text editor. Therefore, if you're editing a translation file, i.e., *.po file, try to use Notepad++. After editing, just go to "Encoding" and do "Convert to UTF-8 without BOM". Your translation file will be good to go now.

Wednesday, September 14, 2011

Solve the UnicodeError: ASCII Decoding/Encoding Error in Python

Sometimes we will get the Unicode Error in python, which means you are try to encode or decode a string which is not encoded in ASCII. ASCII is python default encoding, and what we need to do is just overwriting it to utf-8.
Put following codes in sitecustomize.py and put this file in your PYTHONPATH, which means it can be imported. This file may already exists somewhere in your PYTHONPATH, if so, just append the following codes in it. The file sitecustomize.py is a special file which is automatically imported when python restarts.
# sitecustomize.py
# this file can be anywhere in your Python path
import sys
sys.setdefaultencoding('utf-8')

Thursday, July 21, 2011

Internationalization example with python gettext module

There is a plenty of documents on i18n with python gettext module on websites, but here is a minimal example showing how to translate a hard-coded string from English to Chinese. This example may be helpful for python developers to get a quick experience on the usage of gettext module. 

1. First, we need to tell python which string should be translated by marking all translatable strings in source codes with a prefix '_()', and for example:

#example.py
import gettext
t = gettext.translation('cn', 'C:\locale', fallback=True)
_ = t.ugettext 

print _('Hello!') 
Note: 'C:\locale' tells the python interpreter where to search for the cn.mo file, which contains python readable translated strings.

2. Use pygettext.py to extract all marked strings from .py file to a .pot file, and pygettext.py can be found at \Tools\i18n within the python installation directory. For example:
python pygettext.py -d cn -o cn.pot example.py 
Note: cn is the domain name, which usually represents a language name.

3. Generate a .po file from the .pot file by modifying the header fields and adding the translated strings in the empty fields. In the cn.pot file, we need to make following changes:
"CHARSET" -> "gb2312"
"ENCODING" -> "utf8"
msgstr "" -> msgstr "你好!"

4. Get a gettext readable .mo file from the .po file by msgfmt.py, for example: 
python msgfmt.py -o cn.mo cn.po

5. Run python example.py, it will give you the following output: 
你好!


Some useful notes of Internationalization in Python is here.