[Techtalk] Web page downloads with null characters
Dominik Schramm
dominik.schramm at gmxpro.net
Wed Oct 22 00:59:07 EST 2003
Brenda Bell wrote:
>Quoting Hamster <hamster at hamsternet.org>:
>
>> This is my guess. The page I see is unicode - two bytes for each char,
>> hence the square. The XML page is declared as iso-8859-1, but I'm
>> wondering
>> if something screwy is happening in the conversion from xml -> html
>>
>> I'm just guessing though... if someone else comes up with a different
>> theory, believe them :-)
>
>Not only is it unicode -- it's unicode big endian (the bytes are flipped).
> Furthermore, it's missing the FFFE (or FEFF) that should be at the
>beginning of the file to identify it as such.
>
>In the case of IE, my assumption is that they're making an educated guess
>and rendering the file correctly.
>
>
Hmm, strange.
I was curious and tried it out:
for me, it's plain latin-1, both with lynx -source, netcat, Mozilla 1.4,
Opera 7.0 and Konqueror (KDE 2.2).
This is what netcat shows:
$ echo -n -e 'HEAD /QA/Tips/iso-date
HTTP/1.1\r\nhost:www.w3.org\r\naccept-language:en,de\r\n\r\n' | nc
www.w3.org 80
HTTP/1.1 200 OK
Date: Tue, 21 Oct 2003 22:47:19 GMT
Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: iso-date.html
[...blablabla...]
Content-Type: text/html; charset=iso-8859-1
Requesting iso-date.html yields the same result.
dominik
More information about the Techtalk
mailing list