[Techtalk] Web page downloads with null characters

Wed Oct 22 00:59:07 EST 2003

Brenda Bell wrote:

>Quoting Hamster <hamster at hamsternet.org>:
>
>> This is my guess. The page I see is unicode - two bytes for each char,
>> hence the square. The XML page is declared as iso-8859-1, but I'm
>> wondering
>> if something screwy is happening in the conversion from xml -> html
>> 
>> I'm just guessing though... if someone else comes up with a different
>> theory, believe them :-)
>
>Not only is it unicode -- it's unicode big endian (the bytes are flipped).
> Furthermore, it's missing the FFFE (or FEFF) that should be at the
>beginning of the file to identify it as such.
>
>In the case of IE, my assumption is that they're making an educated guess
>and rendering the file correctly.
>  
>
Hmm, strange.
I was curious and tried it out:
for me, it's plain latin-1, both with lynx -source, netcat, Mozilla 1.4,
Opera 7.0 and Konqueror (KDE 2.2).

This is what netcat shows:
$ echo -n -e 'HEAD /QA/Tips/iso-date 
HTTP/1.1\r\nhost:www.w3.org\r\naccept-language:en,de\r\n\r\n' | nc 
www.w3.org 80
HTTP/1.1 200 OK
Date: Tue, 21 Oct 2003 22:47:19 GMT
Server: Apache/1.3.28 (Unix) PHP/4.2.3
Content-Location: iso-date.html
[...blablabla...]
Content-Type: text/html; charset=iso-8859-1

Requesting iso-date.html yields the same result.

dominik