[prog] C++ object creation
Meredydd
meredydd at everybuddy.com
Wed May 28 11:02:09 EST 2003
On Wednesday 28 May 2003 04:05, Sue Stones wrote:
> Could some one please explain what the heap is, and also compare it to the
> stack.
OK, I'll give this a shot. I'm not a heavyweight assembler programmer, so I
won't answer in a too hardware-orientated sense. That said, Diggy appears to
do an excellent job of explaining how it actually *works*, although I'm still
not sure how it squares with virtual memory... Oh yes, and please correct me
if I'm wrong here. I don't think I am, but for heaven's sake don't let me get
away with it!
Anyway, from a user's point of view:
The stack is where you keep local variables. As you go into a function, each
of that function's local variables is "pushed" onto the stack. That is to
say, space is allocated on the stack for it. So, if I enter a function:
void myfunc(void)
{
int a, b;
char c;
...
As the flow of control goes into this function, I would have nine bytes pushed
onto the stack - four to hold each integer, and one for the char (these
numbers vary - under DOS, that would be 5 bytes, and on a Game Boy it would
be three). Anyway, the real trick to the stack is that it behaves just like a
stack of blocks. Each time you need new space, you push on extra bytes. So,
if I were to call another function from within myfunc(), it would push its
variable onto the stack on top of the 9 bytes I've already got on there for
myfunc. It would also push on the address to which execution should return
when that function is finished. In fact, this is how function calls work at a
basic level. Computers can only understand jump instructions - GOTOs,
effectively. However, the C compiler converts something like this:
fputs('c', stdout);
...to:
- Push a byte onto the stack, put the value 'c' into it
- Push four bytes onto the stack, put the value of the pointer "stdout" there
- Push the address of the next instruction to execute once that function is
done onto the stack
- Jump to the first instruction in the function "fputs"
<-- Execution continues here when the function fputs returns (executing a JUMP
to the pointer you just pushed onto the stack)
This is in fact quite an achievement - it transforms a paradigm which only has
a concept of the GOTO into one in which you can create complicated, nested
subroutines. Nesting is in fact achieved by the stack's structure - each new
function's variables are just pushed in on top of the previous function's.
When a function returns, it just "pops" its variables off the stack (although
it will sometimes leave one on there to hold its return value).
Ever started up a program in GDB, interrupted it somehow, and typed "bt full"?
That's called a stack trace for a reason - it just prints out the contents of
the stack, interpreting the raw data and figuring out the values of the
variables from it.
OK, so that's the stack. The heap is only a slightly different beast. It's
called that because it's best visualised as an unstructured heap of memory in
the middle of the system, from which all programs grab bits as and when they
need it. You make a call to malloc() (or new, which is a C++ wrapper around
malloc()), and the memory management system marks a small stretch of that
heap as yours. You then muck around with it, doing whatever you like there,
confident that no other program will use that piece of memory. When you're
done with it, you call free() (or its wrapper, delete). This signals that
you're done with that memory, and that other programs are now free to grab it
(with malloc()) and use it for themselves. Of course, it is possible -
indeed, very easy to accidentally read from or write to a section of memory
you haven't been allocated. Under some operating systems (Windows 9x, Mac OS
9 and below), this just clobbers other applications' data. The OS may or may
not detect what you've just done, but whatever happens, you've still made
that change (hence the infamous "this may have destabilised your computer,
please reboot" message on Mac OS - once you've overwritten bits of memory,
it's just a matter of time before the program whose data you've clobbered
tries to read from that bit of memory, and starts acting strangely). On other
operating systems (Windows NT/2000/XP, UNIX, just about anything modern), the
allocation procedure is actually a little more complicated than just
remembering not to give that chunk of the heap to anyone else. A processor
feature (memory mapping) is used, which prevents other processes from even
*seeing* that memory. A process trying to read or write outside its stack or
allocated heap space is terminated with the infamous SIGSEGV - Segmentation
Fault.
An interesting artefact of the way that the heap works is that you can only
ever have pointers to heap memory. Any variable declared in the normal way is
put on the stack (even global ones, right down there at the bottom). Another
somewhat confusing thing is that the actual *pointer* is almost always on the
stack. So:
int a;
creates a new integer stored on the stack.
int * a=(int *)malloc(sizeof(int));
OR
int * a=new int;
allocates an integer-sized block of memory on the heap, and stores its address
in a pointer variable. The pointer, like any other local variable, lives on
the stack.
So - back to ed's original question: Why do you choose one over the other? The
answer is that a variable on the stack is popped off when the function in
which it is defined returns. If you want the data to stick around past the
end of the function which defined it, you want heap memory. Another thing is
that heap memory is allocated dynamically. If you put something on the stack,
say, a char array, it's there, its size is set, right from compile-time. If
you allocate the memory to store your chars on the fly, then you can allocate
as much as you need and no more, which rapidly becomes useful in a whole load
of applications.
One last observation - not all languages are as flexible as C. Java, for
example, stores *everything* on the heap, and just relies on its garbage
collector to clean things up - even local variables within a function. This
is, I am given to understand, one of the big reasons for its slowness.
Gah. The above looks something of a slovenly mess. Ability to program does not
a technical writer make! Hope you can wade through it and get something
useful out of it anyway...
Meredydd
More information about the Programming
mailing list