Functions and Stacks in C

The general form of a function in C is:

return_type function_name(type parameter, ...) {
  // function body
}

The body contains statements that define what the function does. Within a function, we can have comments, variable declarations, variable assignments, control flow loops, etc.

For a statement that has conditional logic, if the following line only has one statement, you can omit the curly braces {}:

if (num1 > num2) {
  result = num1;
} else {
  result = num2;
}

is equivalent to:

if (num1 > num2)
  result = num1;
else
  result = num2;

This only works for conditionals with one line of following logic.

Declarations

Whilst there is no requirement to declare a function before defining it, it can sometimes be useful. We declare the function similar to defining, but omit the body and possibly the parameter names, e.g., int max(int num1, int num2); can also be written as int max(int, int);, as there is no actual reference to the variables that is needed in a declaration, it just makes it easier to read.

Local Variables

When calling a function, we might assume that the variables we call it with are the ones used in the function, e.g., void add(int a) { a += 1; }; we would expect would increment a, then we could use this function to manipulate variables...

Unfortunately, (or fortunately?) this is not how it works. When we call a function with some local variables, a copy of the variable is created on the stack frame for that function call. When we call add(), it takes a copy of a then works on the local copy. Any changes within that function are then destroyed.

Recursion and the Stack

Recall that a simple iterative definition of the factorial function can be created:

\[ \text{factorial}(n) = \begin{cases} 1 &n \le 1\\ \text{factorial}(n-1) & \text{otherwise} \end{cases} \]

Implemented in C, this would look as such:

int factorial(unsigned int n) {
  if(n<=1)
    return 1;
  return i * factorial(n-1);
}

This definition allows us to call the function, create a stack frame for it, then recursively call factorial(). This all goes well until a point, when the stack grows too far down, and begins to overflow. We can fix this using tail recursion, and indeed the compiler would handle this for us:

// Main function
int factorial(unsigned int n) {
  return factorial_aux(n, 1);
}

// Aux recursive function
int factorial_aux(unsigned int n, unsigned int acc) {
  if(n<=1)
    return acc;
  return factorial_aux(n-1, acc*n);
}

// Transformed into this by compiler
int factorial_aux(unsigned int n, unsigned int acc) {
loop:
  if(n<=1)
    return acc;
  acc *= n--;
  goto loop;
}

Call-by-Value vs Call-by-Reference

Call-by-Value

Call-by-value is the way that all the function calls listed thus far have operated. This is where passing the arguments to the function copies the value of the argument into the function. Changes inside the function to these variables affect the variable that has been copied onto the stack, but not the variable that was passed in by the caller function.

The only way for a function to affect the caller in this way would be through a return value. Take, for example, our simple add() example. The only way for us to update the a outside the function would be:

#include <stdio.h>

// Original implementation, doesn't work
void add(int a) { a += 1; };

// New implementation, returning modified a
int add(int a) { return a += 1; };

// Simple driver program
void main() {
  int a = 1;
  printf("%d", a);
  a = add(a);
  printf("%d", a);
}

This has an issue in that we can only modify one value, and then can't return anything else. Now comes good old call-by-reference!

Call-by-Reference

Instead of passing the value into the function, we pass the address the variable is stored at, known as the pointer, to then access the actual variable outside the function call. Therefore, any changes inside the function will affect the variables outside the function.

To help with this, C introduced the dereferencing operator * and the referencing operator &. We have sort of already touched on these in main(int argc, char** argv). The * operator allows us to get the value of the pointer at that address, and the & operator allows us to pass the address of the value into a function.

We can therefore redefine the simple add function, such that it takes a pointer to the variable and manipulates that pointer:

#include <stdio.h>

// New function using pointers
void add(int* a) {
  *a += 1;
}

// Updated driver program
void main() {
  int a = 1;
  printf("%d", a);
  add(&a); // Pass in the address to a
  printf("%d", a);
}

The lecture slides have an alternative with swap(int* a, int* b), which takes the value of a and swaps it with the value of b. Without pointers, this would be impossible.

Pointers

What we have just introduced in the call-by-reference section are what are known as pointers. At the low level, we have a pointer type, which simply stores the memory address we are referencing. Pointers are a very powerful part of the C language, and are one of the key features that allow it to be so useful for low level operations.

Arithmetic

Just like other types, pointers can be manipulated. They are quite clever in this way, as they allow us to increment and decrement in terms of the size in bytes of the address that they reference.

The address at (pa+i), where pa is a pointer to a, given a has type T* would simply be (pa) + [i* sizeof(T)]. From this, we have *(pa+i) == (*pa)[i]. Remember that when we are working in C, we use the logical memory layout, unlike the underlying memory on the computer which often employs ASLR (for security), paging, and wildly different physical addresses.

Null Pointers

Another important concept to remember when writing C is to assign pointers to NULL initially, such that the pointer points to memory address 0 of the program. We do this as part of error handling. For example, if malloc fails, it will point to NULL, and as good programmers, we should check to see if the pointer is null.

Also, in C, recall that if() evaluates to true for a nonzero value, so if the pointer references 0, then we fail the conditional.