[Techtalk] Internationalization issues

Telsa Gwynne hobbit at aloss.ukuu.org.uk
Mon Jan 19 16:52:01 EST 2004


On Mon, Jan 19, 2004 at 03:06:02PM +0200 or thereabouts, Eeva Järvinen wrote:
> On Mon, Jan 19, 2004 at 12:37:42PM +0100, Dan Richter wrote:
> > An O'Reilly interview[1] with the developers of Plone includes this quote:
> > 
> > > We've been very focused on internationalization from the outset -
> > > it's not something you can easily retrofit into your application
> > > afterward, as most US companies painfully discover when they try
> > > to conquer the EU markets.
> > 
> > Does anyone have experience with internationalization? Besides the 
> > obvious fact that you can't hard-code any text, what kinds of issues arise?
> 
> The whole locale-specific bunch...  

:) 

> Hyphenation works sometimes very differently in different languages;
> some characters might split into two, and if joined, merge back into
> one.

Some languages have things like "ch" as a separate letter, so
to separate the c and the h is Just Wrong. Alphabetical sorting
become great fun then! In my Welsh dictionary, "addas" comes
after "adnabod", because "dd" is a letter in between "d" and "e".
Worse, "ng" can be a single letter, but it can also be two
separate letters. I would not like to write the hyphenation
rules for that. 

That is more localisation than internationalisation, I suppose:
internationalisation is writing something so that it is not
specific to a particular locale. Localisation is taking that
and making it specific to a locale. (i18n and l10n after the
number of letters omitted, btw: easier to write!)

> Case conversion had better be done with libraries.

I have been told that there is at least one language/alphabet
where you can't safely do upper to lower and back, because two
letters in one case become the same letter in the other. I think
it's Turkish.

> Character sets: UTF-8, ISO-8859-x et cetera.  Your program might
> encounter differently encoded text; it's generally not safe to assume
> everything is ISO-8859-something, neither is it safe to assume UTF-8.
> Emacs is generally quite proficient at this guesswork.

Joel on Software did a piece about character encodings recently
which is rather good. I wrote a little bit on the newchix list
about it too. I like the "do everything in unicode" approach 
because I am not a programmer. So it is not my problem to 
implement it, and I really like the results when it works! 

A little bit more about unicode is at 
http://wiki.freedesktop.org/Software/utf-8 
And a _lot_ more about unicode is at 
http://www.cl.cam.ac.uk/~mgk25/unicode.html


> GTK2 input methods, and they are cool.  You can switch your keyboard
> to produce unicode classical greek by just right-clicking+choosing
> from menu in the editor window.  Back to Latin, right-click and choose
> from menu.

If you get _really_ ambitious, there is a unicode character map
which has all kinds of symbols in it. I have also been browsing
the XKB lists of how to create interesting characters in XFree86,
too. Fascinating stuff.
 
> Sign language.  Best conveyed as videoclips.  Many deaf do not read or
> write anything else fluently.

Oh, gosh. This is new to me. 

Some more: when you send something to the printer, does it
come out formatted for A4 paper or letter? Is "9,999" nine
point nine nine nine or nine thousand, nine hundred and
ninety-nine? What about 10.000? Ten thousand, or ten point
nothing? Is 02/03/04 the second of March 2004 or the third
of February 2004? When a web page for an internet commerce
site asks for a "zip code", can I put my UK post code in,
despite the formats being different? (AA0 0AA where A is 
a letter and 0 a digit.) 

Finally, here are some GNOME resources on internationalisation 
and localisation. KDE uses much the same tools. 

How to internationalise your app using gettext:
http://www.gnome.org/~malcolm/i18n/

Gnome Documentation Project Style Guide pages about writing for
translation: 
http://developer.gnome.org/documents/style-guide/c1795.html
http://developer.gnome.org/documents/style-guide/x1829.html
http://developer.gnome.org/documents/style-guide/x2517.html

(Writing both the strings in the program and the documentation
about it.) 

Telsa



More information about the Techtalk mailing list