[prog] C++ object creation

Wed May 28 11:02:09 EST 2003

On Wednesday 28 May 2003 04:05, Sue Stones wrote:
> Could some one please explain what the heap is, and also compare it to the
> stack.

OK, I'll give this a shot. I'm not a heavyweight assembler programmer, so I 
won't answer in a too hardware-orientated sense. That said, Diggy appears to 
do an excellent job of explaining how it actually *works*, although I'm still 
not sure how it squares with virtual memory... Oh yes, and please correct me 
if I'm wrong here. I don't think I am, but for heaven's sake don't let me get 
away with it!

Anyway, from a user's point of view:

The stack is where you keep local variables. As you go into a function, each 
of that function's local variables is "pushed" onto the stack. That is to 
say, space is allocated on the stack for it. So, if I enter a function:

void myfunc(void)
{
  int a, b;
  char c;
  ...

As the flow of control goes into this function, I would have nine bytes pushed 
onto the stack - four to hold each integer, and one for the char (these 
numbers vary - under DOS, that would be 5 bytes, and on a Game Boy it would 
be three). Anyway, the real trick to the stack is that it behaves just like a 
stack of blocks. Each time you need new space, you push on extra bytes. So, 
if I were to call another function from within myfunc(), it would push its 
variable onto the stack on top of the 9 bytes I've already got on there for 
myfunc. It would also push on the address to which execution should return 
when that function is finished. In fact, this is how function calls work at a 
basic level. Computers can only understand jump instructions - GOTOs, 
effectively. However, the C compiler converts something like this:

fputs('c', stdout);

...to:

- Push a byte onto the stack, put the value 'c' into it
- Push four bytes onto the stack, put the value of the pointer "stdout" there
- Push the address of the next instruction to execute once that function is 
done onto the stack
- Jump to the first instruction in the function "fputs"
<-- Execution continues here when the function fputs returns (executing a JUMP 
to the pointer you just pushed onto the stack)

This is in fact quite an achievement - it transforms a paradigm which only has 
a concept of the GOTO into one in which you can create complicated, nested 
subroutines. Nesting is in fact achieved by the stack's structure - each new 
function's variables are just pushed in on top of the previous function's. 
When a function returns, it just "pops" its variables off the stack (although 
it will sometimes leave one on there to hold its return value).

Ever started up a program in GDB, interrupted it somehow, and typed "bt full"? 
That's called a stack trace for a reason - it just prints out the contents of 
the stack, interpreting the raw data and figuring out the values of the 
variables from it.

OK, so that's the stack. The heap is only a slightly different beast. It's 
called that because it's best visualised as an unstructured heap of memory in 
the middle of the system, from which all programs grab bits as and when they 
need it. You make a call to malloc() (or new, which is a C++ wrapper around 
malloc()), and the memory management system marks a small stretch of that 
heap as yours. You then muck around with it, doing whatever you like there, 
confident that no other program will use that piece of memory. When you're 
done with it, you call free() (or its wrapper, delete). This signals that 
you're done with that memory, and that other programs are now free to grab it 
(with malloc()) and use it for themselves. Of course, it is possible - 
indeed, very easy to accidentally read from or write to a section of memory 
you haven't been allocated. Under some operating systems (Windows 9x, Mac OS 
9 and below), this just clobbers other applications' data. The OS may or may 
not detect what you've just done, but whatever happens, you've still made 
that change (hence the infamous "this may have destabilised your computer, 
please reboot" message on Mac OS - once you've overwritten bits of memory, 
it's just a matter of time before the program whose data you've clobbered 
tries to read from that bit of memory, and starts acting strangely). On other 
operating systems (Windows NT/2000/XP, UNIX, just about anything modern), the 
allocation procedure is actually a little more complicated than just 
remembering not to give that chunk of the heap to anyone else. A processor 
feature (memory mapping) is used, which prevents other processes from even 
*seeing* that memory. A process trying to read or write outside its stack or 
allocated heap space is terminated with the infamous SIGSEGV - Segmentation 
Fault.

An interesting artefact of the way that the heap works is that you can only 
ever have pointers to heap memory. Any variable declared in the normal way is 
put on the stack (even global ones, right down there at the bottom). Another 
somewhat confusing thing is that the actual *pointer* is almost always on the 
stack. So:

int a;

creates a new integer stored on the stack.

int * a=(int *)malloc(sizeof(int));
OR
int * a=new int;

allocates an integer-sized block of memory on the heap, and stores its address 
in a pointer variable. The pointer, like any other local variable, lives on 
the stack.

So - back to ed's original question: Why do you choose one over the other? The 
answer is that a variable on the stack is popped off when the function in 
which it is defined returns. If you want the data to stick around past the 
end of the function which defined it, you want heap memory. Another thing is 
that heap memory is allocated dynamically. If you put something on the stack, 
say, a char array, it's there, its size is set, right from compile-time. If 
you allocate the memory to store your chars on the fly, then you can allocate 
as much as you need and no more, which rapidly becomes useful in a whole load 
of applications.

One last observation - not all languages are as flexible as C. Java, for 
example, stores *everything* on the heap, and just relies on its garbage 
collector to clean things up - even local variables within a function. This 
is, I am given to understand, one of the big reasons for its slowness.

Gah. The above looks something of a slovenly mess. Ability to program does not 
a technical writer make! Hope you can wade through it and get something 
useful out of it anyway...

Meredydd