[Techtalk] umlauts

Maria Blackmore mariab at cats.meow.at
Wed Sep 3 17:04:51 EST 2003


On Wed, 3 Sep 2003, Eeva Järvinen wrote:

> Mail systems seem to strip away characters they don't understand...

Hello,

This may appear to be true, but isn't strictly accurate ... there's a very
specific way that they won't understand it.

> Here's a list of UTF-8 stuff: (snipped due to non-unicode terminal
> atm).  I wonder if any of those get past the mail servers (my local
> sendmail seems to pass them to local accounts)?

They certainly made it here intact, but this cannot be relied upon.

> If they do non-destructive autoconversion,

Mail servers should not actually change anything in the contents/body of a
message.  At all.  Ever.

> this mail should be encodable in at least UTF-8, UTF-16, JIS2022-7 and
> iso-10646, but NOT in any of the iso-8859-x encodings.

If you want to pass around any unicode or JIS encoding then you need to
put them inside a MIME attachment as text/plain.  The point is that email
is not 8 bit clean.  If an email server isn't 8 bit clean, it will simple
"forget" the 8th bit as it passes the email through, and you'll end up
with an email full of crud if you're not careful.  Some servers are 8 bit
clean, but the fact cannot be relied upon to any great degree unless you
run all the servers or have a big stick and physical access to the people
that do :)  Since physical violence and other "persuasive" techniques are
frequently frowned on, you're usually much better off putting them inside
MIME (which uses Base64) and avoiding all the hassle in the first place :)

Maria



More information about the Techtalk mailing list