[prog] Reading data file

Daniel. cristofd at hevanet.com
Sun Jan 25 03:03:12 EST 2004


>  > Great. The two lists match exactly.
>
>Oh, I didn't understand that. Thanks for pointing that out.

"A << B", in C and C++ and the like, means "shift the bits of A to 
the left a distance of B bits, filling in the resulting gaps on the 
right with zeroes". As it happens, this is equivalent to multiplying 
A by 2 to the Bth power. In C++, the << operator has been hijacked 
("overloaded") into performing some i/o tasks as well...

>  > If the problem is something else, dunno.
>
>The other problem right now is that I don't understand how to get the
>32-bit messageflags from the file? I know how to read info line by
>line from textfiles, but I don't understand how to read the file and
>get the 32-bits from it?

Thinking about it, that probably is not a job for bash but for some 
program running under bash's control. I don't know what the standard 
UNIX program (as opposed to a system call) to do "get the Xth 32-bit 
int from file Y" would be--anyone?--but it would be trivial to write 
one in C anyway. It would introduce a lot of overhead, and someone 
interested in making a really good-quality mail-exporting program to 
be used by many people would probably abandon bash right now, but it 
should get the job done; if nobody suggests a pre-existing program, 
I'll write one.

>I don't understand the difference between
>textfiles and binary files...

A file in general is a sequence of bytes which hold numbers from 0 to 
255; the numbers only have meaning insofar as they are interpreted by 
a program or a person. In a text file, the bytes hold the ASCII codes 
for successive characters; and "binary" in this context just means 
any file other than a text file. If a file is to be seen as a 
succession of 32-bit ints, as the kind you're dealing with is, four 
bytes are run together to make each int; so the first four bytes form 
the version number, the next four form a message number, and so on.

An important question here is what order the bytes are to be taken 
in. For instance, is a thousand (binary representation 
00000000000000000000001111101000) stored as four bytes in the order
00000000 00000000 00000011 11101000
or in the order
11101000 00000011 00000000 00000000
? The first is called "big-endian" and the second "little-endian", 
and it's important to know which way the file format you're dealing 
with stores them. The documentation where you got the flag values 
should say.

-Daniel.
-- 
        ()  ASCII ribbon campaign      ()    Hopeless ribbon campaign
        /\    against HTML mail        /\  against gratuitous bloodshed


More information about the Programming mailing list