Heaps and Memory Errors
The Heap
The heap is an area of memory that the programmer has to manually manage, at least in lower level languages like C. We have to manually allocate memory to the program, using a call such as malloc
, then free
the memory once we no longer need it.
When allocating memory within C, we specify the size of the memory that we need. If there is an error allocating memory, we return a null pointer (pointing to address 0), which allows us to check, as the programmer, whether there is enough memory for us.
void *malloc(size_t size);
int* x = malloc(sizeof(int)); // Allocate memory to hold an integer
int* arr = malloc(N*sizeof(int)); // Allocate memory to hold an array size N of integers
We then free
the memory from use once we no longer need it, using void *free(void *ptr);
, and if we forget to do this, things can go a bit wrong.
Toy Program Showing Different Memory Addresses
Writing a simple toy program, and making use of the reference and dereference operators (&
and *
), we can look at the memory addresses of things we allocate.
#include <stdio.h>
#include <stdlib.h>
void main() {
int x = 3;
int y = 255;
int* u = malloc(sizeof(int));
int* v = malloc(2*sizeof(int));
*u = 16;
v[0] = 24;
*(v+1) = 25;
printf("Var\tAddress\t\tValue\n");
printf("x:\t0x%08x\t%d\n", &x, x);
printf("y:\t0x%08x\t%d\n", &y, y);
printf("u:\t0x%08x\t%d\n", u, *u);
printf("v[0]:\t0x%08x\t%d\n", &v[0], v[0]);
printf("v[1]:\t0x%08x\t%d\n", &v[1], v[1]);
}
gcc ./toy_example.c -o toy_example && ./toy_example
Var Address Value
x: 0x023ed870 3
y: 0x023ed874 255
u: 0x608582a0 16
v[0]: 0x608582c0 24
v[1]: 0x608582c4 25
Toy Program Showing Error Handling and Segmentation Faults
Even running this program, my editor warns me that I'm not freeing u
or v
, as this is a really common thing to forget. Proper convention would be to first check that we were actually allocated some memory, instead of the null pointer, then at the end free both the addresses allocated. The following program segfaults because we try to write to address zero, but besides this, we correctly free and error handle:
#include <stdio.h>
#include <stdlib.h>
void main() {
// ...
int* u = malloc(sizeof(int));
if(!u) {
// Error Handling
}
// Simulate error allocating memory
int* v = NULL;
// ...
free(u);
free(v);
}
Variables that are stored in the heap can be accessed globally and will continue to live until they are explicitly freed. Stack variables are only local and die when they are taken out of scope.
Heap variables are not limited in size, the only thing is if there is not enough memory to allocate, then we'll probably be given a null pointer. The heap variables can be resized after allocation, which is handy if the array or whatever is dynamically sized.
Allocating Memory for Dynamically Sized Variables
We have already seen in the labs that allocating a fixed buffer for variables is not best practice, as these can be overflowed, or don't store the full string or whatever datatype it is that we need. Take another sample program:
#include <stdio.h>
#include <string.h>
int main(int argc, char* argv[]) {
if (argc != 2) {
printf("Syntax: %s <string>\n", argv[0]);
exit(EXIT_FAILURE);
}
char *p;
sprintf(p, "Input: %s", argv[1]);
printf("%s\n", p);
return 0;
}
p
, not allocate anything to it. We could instead have declared a fixed buffer, p[50]
, but then without a length check this would be vulnerable to a buffer overflow.
We therefore need to allocate some memory to store the string:
We need to allocate an additional byte to terminate the string, and of course error handle it! We now, however, have a memory leak. This is because we don'tfree
the memory after using it. Not a big deal in this instance as the program terminates right after we print it, but say this was in a recursively called function that was being called many thousands of times. Then, we'd be allocating some buffer to store this string for the entire length of the program, and the heap would eventually run out of space to store this buffer.
If the memory leak gets big enough, then the OS has to start rapidly paging (if there is a swap partition or file), which is orders of magnitude slower than other memory access. This leads to a process called thrashing, and the computer grinds to a halt. All we have to do to prevent this is install enough RAM in the system and ensure that the memory we are allocating is freed once no longer needed.
Debugging
It can be quite hard to establish the state of some code when not being run. Debuggers, such as GDB, Valgrind, and Dr. Memory allow us to inspect program state whilst running, and they are specialised to different tasks. Unlike static analysis, debuggers are an online process and they instrument binaries with symbols to aid in analysis. GDB is useful for inspection of the code and to find things such as segementation faults.
Valgrind focuses more on memory leak errors and threading bugs, and Dr. Memory is like Valgrind but also runs on Windows.
Optimization
Compilers are wonderful pieces of technology and do a lot of the heavy lifting to make our code run faster. We can control the level of optimization of the code by setting the -O
flag to a value 1-4
which optimizes for speed or size of the code. Different optimization levels enable different optimizations.
Stack Overflow
This is where the call stack pointer exceeds the stack bound. Stack frames have a limited amount of space, which is often determined at the start of the program. The size of stack is dependant on the programming language, architecture, multithreading, and total available memory.
Toy Example
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void foo(int index) {
if(index % 10000 == 0) {
printf("Iteration %d\n", index);
}
foo(index+1);
}
int main() {
foo(0);
return 0;
}
Without optimizations, this program segfaults on Linux eventually, and Windows just doesn't execute the code at all. However, if we compile it with -O2
(and thus optimizing tail calls), we transform the recursion problem into an iteration problem.