[Techtalk] Web page downloads with null characters
katie at katie-and-rob.org
Tue Oct 21 16:11:09 EST 2003
On Tue, Oct 21, 2003 at 03:57:22PM -0400, Brenda Bell wrote:
> In the case of IE, my assumption is that they're making an educated guess
> and rendering the file correctly.
Bingo! According to Joel on Software:
"What do web browsers do if they don't find any Content-Type, either
in the http headers or the meta tag? Internet Explorer actually does
something quite interesting: it tries to guess, based on the
frequency in which various bytes appear in typical text in typical
encodings of various languages, what language and encoding was used.
Because the various old 8 byte code pages tended to put their
national letters in different ranges between 128 and 255, and
because every human language has a different characteristic
histogram of letter usage, this actually has a chance of working.
It's truly weird, but it does seem to work often enough that naïve
web-page writers who never knew they needed a Content-Type header
look at their page in a web browser and it looks ok, until one day,
they write something that doesn't exactly conform to the
letter-frequency-distribution of their native language, and Internet
Explorer decides it's Korean and displays it thusly, proving, I
think, the point that Larry Wall's quote about being "strict in what
[you] emit and liberal in what [you] accept" is quite frankly not a
good engineering principle. Anyway, what does the poor reader of
this website, which was written in Bulgarian but appears to be
Korean (and not even cohesive Korean), do? He uses the View |
Encoding menu and tries a bunch of different encodings (there are at
least a dozen for Eastern European languages) until the picture
comes in clearer. If he knew to do that, which most people don't."
Katie Bechtold http://katie-and-rob.org/
[Wisdom] is a tree of life to those laying
hold of her, making happy each one holding her fast.
-- Proverbs 3:18, NSV
More information about the Techtalk