Skip to content

Heaps and Memory Errors

The Heap

The heap is an area of memory that the programmer has to manually manage, at least in lower level languages like C. We have to manually allocate memory to the program, using a call such as malloc, then free the memory once we no longer need it.

When allocating memory within C, we specify the size of the memory that we need. If there is an error allocating memory, we return a null pointer (pointing to address 0), which allows us to check, as the programmer, whether there is enough memory for us.

void *malloc(size_t size);

int* x = malloc(sizeof(int));     // Allocate memory to hold an integer
int* arr = malloc(N*sizeof(int)); // Allocate memory to hold an array size N of integers

We then free the memory from use once we no longer need it, using void *free(void *ptr);, and if we forget to do this, things can go a bit wrong.

Toy Program Showing Different Memory Addresses

Writing a simple toy program, and making use of the reference and dereference operators (& and *), we can look at the memory addresses of things we allocate.

#include <stdio.h>
#include <stdlib.h>

void main() {
        int x = 3;
        int y = 255;

        int* u = malloc(sizeof(int));
        int* v = malloc(2*sizeof(int));

        *u = 16;
        v[0] = 24;
        *(v+1) = 25;

        printf("Var\tAddress\t\tValue\n");
        printf("x:\t0x%08x\t%d\n", &x, x);
        printf("y:\t0x%08x\t%d\n", &y, y);
        printf("u:\t0x%08x\t%d\n", u, *u);
        printf("v[0]:\t0x%08x\t%d\n", &v[0], v[0]);
        printf("v[1]:\t0x%08x\t%d\n", &v[1], v[1]);

}
gives
gcc ./toy_example.c -o toy_example && ./toy_example
Var     Address         Value
x:      0x023ed870      3
y:      0x023ed874      255
u:      0x608582a0      16
v[0]:   0x608582c0      24
v[1]:   0x608582c4      25

Toy Program Showing Error Handling and Segmentation Faults

Even running this program, my editor warns me that I'm not freeing u or v, as this is a really common thing to forget. Proper convention would be to first check that we were actually allocated some memory, instead of the null pointer, then at the end free both the addresses allocated. The following program segfaults because we try to write to address zero, but besides this, we correctly free and error handle:

#include <stdio.h>
#include <stdlib.h>

void main() {

    // ...

        int* u = malloc(sizeof(int));
        if(!u) {
                // Error Handling
        }

        // Simulate error allocating memory
        int* v = NULL;

    // ...

        free(u);
        free(v);

}

Variables that are stored in the heap can be accessed globally and will continue to live until they are explicitly freed. Stack variables are only local and die when they are taken out of scope.

Heap variables are not limited in size, the only thing is if there is not enough memory to allocate, then we'll probably be given a null pointer. The heap variables can be resized after allocation, which is handy if the array or whatever is dynamically sized.

Allocating Memory for Dynamically Sized Variables

We have already seen in the labs that allocating a fixed buffer for variables is not best practice, as these can be overflowed, or don't store the full string or whatever datatype it is that we need. Take another sample program:

#include <stdio.h>
#include <string.h>

int main(int argc, char* argv[]) {
    if (argc != 2) {
        printf("Syntax: %s <string>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    char *p;
    sprintf(p, "Input: %s", argv[1]);
    printf("%s\n", p);
    return 0;
}
In the above example, we only declared p, not allocate anything to it. We could instead have declared a fixed buffer, p[50], but then without a length check this would be vulnerable to a buffer overflow.

We therefore need to allocate some memory to store the string:

char *p = malloc(strlen(argv[1])+1);
// Error handling
We need to allocate an additional byte to terminate the string, and of course error handle it! We now, however, have a memory leak. This is because we don't free the memory after using it. Not a big deal in this instance as the program terminates right after we print it, but say this was in a recursively called function that was being called many thousands of times. Then, we'd be allocating some buffer to store this string for the entire length of the program, and the heap would eventually run out of space to store this buffer.

If the memory leak gets big enough, then the OS has to start rapidly paging (if there is a swap partition or file), which is orders of magnitude slower than other memory access. This leads to a process called thrashing, and the computer grinds to a halt. All we have to do to prevent this is install enough RAM in the system and ensure that the memory we are allocating is freed once no longer needed.

Debugging

It can be quite hard to establish the state of some code when not being run. Debuggers, such as GDB, Valgrind, and Dr. Memory allow us to inspect program state whilst running, and they are specialised to different tasks. Unlike static analysis, debuggers are an online process and they instrument binaries with symbols to aid in analysis. GDB is useful for inspection of the code and to find things such as segementation faults.

Valgrind focuses more on memory leak errors and threading bugs, and Dr. Memory is like Valgrind but also runs on Windows.

Optimization

Compilers are wonderful pieces of technology and do a lot of the heavy lifting to make our code run faster. We can control the level of optimization of the code by setting the -O flag to a value 1-4 which optimizes for speed or size of the code. Different optimization levels enable different optimizations.

Stack Overflow

This is where the call stack pointer exceeds the stack bound. Stack frames have a limited amount of space, which is often determined at the start of the program. The size of stack is dependant on the programming language, architecture, multithreading, and total available memory.

Toy Example

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void foo(int index) {
    if(index % 10000 == 0) {
        printf("Iteration %d\n", index);
    }
    foo(index+1);
}

int main() {
    foo(0);
    return 0;
}

Without optimizations, this program segfaults on Linux eventually, and Windows just doesn't execute the code at all. However, if we compile it with -O2 (and thus optimizing tail calls), we transform the recursion problem into an iteration problem.