Skip to content

Solving Linear Systems of Equations

Very Quick Recap

Linear equations are of the form \(A \vec{x} = \vec{b}\), or expressed in their constituent components:

\[ \begin{align} a_{11} + a_{12}x_2 + \dots + a_{1n}x_n &= b_1\\ a_{21} + a_{22}x_2 + \dots + a_{2n}x_n &= b_2\\ \dots &= \dots\\ a_{m1} + a_{m2}x_2 + \dots + a_{mn}x_n &= b_m\\ \end{align} \]
\[ A = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots\\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{pmatrix}, B= \begin{pmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{pmatrix} \]

Vectors comprise \(m\) rows and \(n\) columns. If we have an \(m \times n\) matrix, then that matrix has \(m\) rows and \(n\) columns. For matrix multiplication, we go row wise on the first matrix and column wise on the second matrix.

Matrix-vector multiplication is just a specific case of this. We add each row/column pair together.

In two dimensions, each equation represents a line. In three dimensions, this is a plane, then a hyperplane etc. For 2D systems:

  • Exactly one crossing point gives a unique solution (non-singular)
  • Lines are parallel and don't cross gives no solution (singular)
  • Lines overlap completely gives infinitely many solutions

We tend to focus on square matricies. For an \(n \times n\) matrix, it is non singular if:

  • \(A^{-1}\) exists
  • \(\det (A) \ne 0\)
  • \(\text{rank} (A) = n\)
  • \(\forall \vec{z} \ne 0 \iff A\vec{z} \ne 0\)

The determinant of a matrix is easy to calculate for \(2\times2\) and \(3\times3\):

\[ \det A = \begin{vmatrix} a & b \\ c & d \\ \end{vmatrix} = ad - bc \]

and for \(3\times3\):

\[ \det A = \begin{vmatrix} a & b & c \\ d & e & f \\ g & h & i \\ \end{vmatrix} = aei + bfg + cdh - ceg - bdi - afh \]

This can be generalised to larger square matrices, but this doesn't seem to be covered in particular by the module. The rank of a matrix is the dimension of the vector space generated by its columns. This is all to do with the number of independent columns of the matrix.

If a \(3\times3\) matrix has 3 rows, or 3 columns, and two of these rows/columns are dependent, then the rank of the matrix differs to the number of rows/columns (column and row rank are the same).


This was the model that Google used to use to determine the order of its search results. It is used to determine who is most popular. We ask each person of Jane, Charlie and Fred to list their friends (in pagerank, this would be through something like backlinks).

In our example, J lists C, C lists J and F. F lists only J. If we write this into a matrix, where each row has a 1 if the respective column lists a person as their friend, we get the following:

\[ \begin{pmatrix} 0 & 1 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ \end{pmatrix} \]

where we list in terms of J, then C, then F. We can then compute the total over each column of the matrix, to get the total number of people that someone lists.

As we know, a node or person is more popular the more people list it as friends, and the less people this person lists as their friends. Having a more popular friend can make you more popular than having 10 unpopular friends. (This is depressing).

If people have more friends on their friends list, then we can compensate for this by normalising each column by its sum, such that each newly formed column also sums to one.

We then want to find a vector \(\vec{p} = (j, c, f)\), which assigns a popularity to each person. Their popularity is then the weighted sum of the popularities of the people who mention it.

We can then define the notion of \(p = Lp\), where people's popularities relate to the number of people they mention. We then know that:

\[ \begin{align} j &= 0.5c + f\\ c &= j \\ f &= 0.5c \\ \end{align} \]

The system is underdetermined from these equations (as \(c=j\)), so we replace \(c\) with \(j\), then set \(j=1\) as a scaling factor and then solve the system:

\[ \begin{align} j &= 0.5j + f\\ f &= 0.5j\\ \text{set }j&=1\\ (j,c,f) &= (1,1,0.5)\\ \end{align} \]

From this, we deduce that \(j\) and \(c\) are popular, and \(f\) is less popular.

The same idea can be applied to pagerank itself. If a page backlinks to a lot of other sites, then each of the sites it backlinks to get boosted, but less so dependent on how many backlinks there are.

We can further add contextual information \(c\) to search queries, such that we have \(p = Lp +c\), again, where \(L\) is the linking matrix.

The system will always have positive solutions \(p\), and this is thanks to the Frobenius-Perron Theorem.

Setting Prices in an Economy

We can deduce the incomes of each person in a closed economy, provided that we have no outside supply or demand and all produced is consumed. If we know the consumption of different people, e.g., the farmer, tailor, carpenter, coal miner, and slacker (denoted \(F,T,CA,CO,S\) respectively), then we can deduce a system of linear equations. Using the following table, we get the percentages of each person:

Food Clothes Housing Energy High Quality 100 Proof Moonshine
Farmer 0.25 0.15 0.25 0.18 0.20
Tailor 0.15 0.28 0.18 0.17 0.05
Carpenter 0.22 0.19 0.22 0.22 0.10
Coal Miner 0.20 0.15 0.20 0.28 0.15
Slacker Bob 0.18 0.23 0.15 0.15 0.50

We can then create a system of linear equations:

\[ \begin{align*} p_F &= 0.25 p_F + 0.15 p_T + 0.25 p_{CA} + 0.18 p_{CO} + 0.20 p_S \\ p_T &= 0.15 p_F + 0.28 p_T + 0.18 p_{CA} + 0.17 p_{CO} + 0.05 p_S \\ p_{CA} &= 0.22 p_F + 0.19 p_T + 0.22 p_{CA} + 0.22 p_{CO} + 0.10 p_S \\ p_{CO} &= 0.20 p_F + 0.15 p_T + 0.20 p_{CA} + 0.28 p_{CO} + 0.15 p_S \\ p_S &= 0.18 p_F + 0.23 p_T + 0.15 p_{CA} + 0.15 p_{CO} + 0.50 p_S \end{align*} \]

This is then a system which is underdetermined, so we can set one price somewhat arbitrarily. This is much more annoying to solve by hand, so we use a computer to do it for us. One method we can use is called Gaussian Elimination.

Gaussian Elimination

This is the process of somewhat methodically solving the equations. We do this by multiplication, addition and subtractions of different equations, which allows us to eliminate variables.

The most convenient form we can use for Gaussian elimination is triangular, as we work down to an individual variable at the bottom, then can substitute it into the next one up, etc.

Take a system of equations:

\[ \begin{align*} 2x + y - z &= 8\\ -3x -y + 2z &= -11 \\ -2x + y + 2z &= -3 \\ \end{align*} \]

We can then replace the 2nd row by multiplying it by \(2/3\) then add it to the first. This yields \(y/3 + z/3 = 2/3\). We can then add the third row to the first, giving \(2y + z = 5\), then multiply our new second one by -6 and add to the newly formed third one. This is all getting a bit complicated to explain.


The solutions given on the slides don't seem to work for the system that was given. Following my own manual analysis, \((2') \Rightarrow (2) + 3/2(1), (3') \Rightarrow (3) + (1)\), then \((2'') \Rightarrow (2') \times 2, (3'') \Rightarrow (3')-4(2')\), we get solutions \(x,y,z=2,3,-1\), respectively.

This can be verified by substitution into the initial system of equations to ensure the results are the same.

Augmented with Identity Matrix

We can then diagonalise matrices by appending the identity matrix to our coefficient matrix, then applying all operations to that too. If the then have two matrices next to each other, and start with the coefficients on the left and the identity on the right, then work to make the left an identity matrix instead, we ought to get the inverse of the initial matrix, which can be useful to find \(x\).

It is useful to have \(A^{-1}\), as we may have many \(b\) for \(Ax=b\), and solving this is complicated to solve for \(x\), as in the above way. Therefore, we simply rearrange to \(x = A^{-1}b\). To verify that we have computed \(A^{-1}\) correctly, we can use the property that \(A_DA_{D}^{-1} = I_D\) to ensure that our computations are correct.


If the "pivot", i.e., the bit we are working to isolate is 0 or inconvenient mathematically, we can always reorder the rows. Computational complexity for Gaussian elimination is also \(O(n^3)\), which causes issues when the matrix starts to get big.

If we have something of the form \(Ax=b\), and have to solve with the same \(A\) and different \(b\), then we could invert \(A\). Therefore, we'd calculate \(x=A^{-1}b\).

Furthermore, \(A\) is often a "sparse" matrix, meaning that it doesn't contain many non-zero elements. The inverses of sparse matrices are often dense. Sparse matrices can be stored more efficiently on computers than their dense counterparts.

Gaussian elimination is also very sensitive to small differences in \(b\), assume the real \(b\) is \(b\pm \delta b\) instead.

LU Decomposition

This is another way to solve the system of linear equations. We can write \(A\) as a product of two triangular matrices \(L\) for lower, and, you guessed it, \(U\) for upper:

\[ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix} = \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \\ \end{bmatrix} \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \\ \end{bmatrix} \]

We can then rewrite \(Ax = b\) as \(LUx = b\), then create \(Ly = b\) and then solve that, then \(y = Ux\) and solve that. \(Ly=b\) is easy, because we only have the lower factors and the \(y\) elements:

\[ \begin{bmatrix} l_{11} & 0 & 0 \\ l_{21} & l_{22} & 0 \\ l_{31} & l_{32} & l_{33} \\ \end{bmatrix} \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ \end{bmatrix} = \begin{bmatrix} l_{11} y_{1} \\ l_{21} y_{1} + l_{22} y_{2} \\ l_{31} y_{1} + l_{32} y_{2} + l_{33} y_{3} \\ \end{bmatrix} = \begin{bmatrix} b_{1} \\ b_{2} \\ b_{3} \\ \end{bmatrix} \]

and then solving \(y = Ux\):

\[ \begin{bmatrix} u_{11} & u_{12} & u_{13} \\ 0 & u_{22} & u_{23} \\ 0 & 0 & u_{33} \\ \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ x_{3} \\ \end{bmatrix} = \begin{bmatrix} u_{11} x_{1} + u_{12} x_{2} + u_{13} x_{3} \\ u_{22} x_{2} + u_{23} x_{3} \\ u_{33} x_{3} \\ \end{bmatrix} = \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ \end{bmatrix} \]

LU decomposition is essentially just a variant of Gaussian elimination. Sometimes we might have to exchange rows, which we can do with a pivot matrix \(P\), or columns and rows (\(P\) for rows and \(Q\) for columns). This would give \(PA=LU\) for a partial pivot, and \(PAQ=LU\) for a full pivot.


I don't really understand this bit!

Any square matrix allows us to perform an LUP factorization. Further to this, a positive definite symmetric matrix (aka Hermitian matrix) has a similar decomposition to LUP, and this is the Cholesky decomposition \(A = LL^*\).

Doolittle Algorithm

This is a way to generate the LUP of a matrix. It is similar to Gaussian elimination in that we remove entries of \(A\) under the diagonal. This is done by multiplying \(A\) with a matrix \(L'\), from the left:

\[ \begin{bmatrix} 1 & 0 & 0\\ \color{red}{-a_{21}/a_{11}} & 1 & 0 \\ \color{red}{-a_{31}/a_{11}} & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ \end{bmatrix} = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ 0 & -a_{21}/a_{11}a_{12}+a_{22} & -a_{21} / a_{11}a_{13} + a_{23} \\ 0 & -a_{31}/a_{11}a_{12} + a_{32} & -a_{31} / a_{11}a_{13} + a_{33} \\ \end{bmatrix} \]

For the algorithm, we introduce some notation: \(A^{(0)}\) denotes the first matrix, then we have \(A^{(1), \dots, (n)}\), where at stage \(n-1\) we left multiply \(A^{(n-1)}\) with a matrix \(L_n\), with diagonal entries 1 and only entries in the \(n\)th column under the diagonal (as shown in red in the above matrix).

The entries in matrix \(L_n\) are given as:

\[ l^{(n)}_{i,n} = \frac{-a^{(n-1)}_{i,n}}{a^{(n-1)}_{nn}} \]

This means that for each \(L_n\), we take the values of \(A\) at specific indices from the previous matrix we use. At the end of this process, after \(N-1\) steps, where \(N\) is the total rows required in the matrix, we will obtain the upper triangular matrix \(U = A^{(N-1)}\)

We also know from the way we process through \(L_n\) that the original \(A\) matrix is simply the inverse application of \(L_n\) in the following way:

\[ A = L_1^{-1} L_2^{-1}\dots L_{N-1}^{-1} A^{(N-1)} \]

Products and inverses of lower triangular matrices are also lower triangular, so we can effectively extract \(L= L_1^{-1} L_2^{-1}\dots L_{N-1}^{-1}\) from the above formula. It is then easy to see that \(A=LU\) still.

When looking at building \(L\) from the inverses, we know that \(L_n\) is the identity matrix, bar the column that we are interested in transforming.

Taking the inverse of a matrix \(L\), which is the identity matrix with the lower diagonal of one column is simply the negative of what is already there.

The inverse of \(L\) totally is just the identity matrix with the lower half as a sort of addition of all the different \(L_x\) bits. The lecture slides make more sense here.

Other Algorithms

In addition to Gaussian elimination and the Doolittle algorithm for LU decomposition, there are many other methods that can be used to factor these systems of linear equations and solve them.

Important Notes

  • The order of multiplication of matrices is important. Matrix multiplication is non-commutative.
  • \(P\) and \(Q\) are both types of permutation matrices, which can be used to rearrange the order of rows and columns

Iterative Approaches

The above algorithms show a way to obtain the perfect analytical solutions to these systems of equations, but these can be very complex to solve when we get to models with potentially millions of parameters. Gaussian elimination, in its most naive form is \(O(n^3)\) time complexity.

There are faster methods which can obtain approximate solutions, much like the numerical methods we will see in the solving systems of non-linear equations part of the course.

These models do not achieve exact solutions, and the tradeoff is the possibility that they might not converge. For diagonally dominant methods, we have the Jacobi method.

Jacobi Method

We want to solve \(Ax=b\). We write \(A = D + R\), where \(D\) is the diagonal elements of \(A\), and \(R\) is the rest of the elements. We give a notion of what 'diagonally dominant' means with the idead that \(D\) is large in comparison to \(R\):

\[ | a_{ii} | > \sum_j | a_{ij} | \]

We can then rearrange \(Ax=b\) as \((D+R)x = b\), then get \(x\) alone with \(x = D^{-1} (b - Rx)\). We can then iterate this to get \(x_{n+1} = D^{-1}(b-Rx_n)\).

Considering the system:

\[ \begin{align} 2x+y &= 11 \\ 5x+7y &= 13 \\ \end{align} \Rightarrow A = \begin{bmatrix}2 & 1 \\ 5 & 7 \end{bmatrix}, b = \begin{bmatrix}11\\13\end{bmatrix}, x_0 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \]

We can then begin to iterate on this, using the iterative formula given above, and rearranging a bit to make it easier to understand what is going on by defining some auxiliary matricies \(c\) and \(T\):

\[ \begin{align} c &= D^{-1}b\\ T &= D^{-1}R\\ \therefore x_{n+1} &= c - Tx_n\\ \text{and for }A,b&\text{ we have:}\\ D &= \begin{bmatrix}2 & 0 \\ 0 & 7\end{bmatrix}\\ R &= \begin{bmatrix} 0 & 1\\ 5 & 0 \end{bmatrix}\\ D^{-1} &= \begin{bmatrix}1/2 & 0 \\ 0 & 1/7 \end{bmatrix}\\ c &= D^{-1}b = \begin{bmatrix}11/2 \\ 13/7 \end{bmatrix}\\ T &= D^{-1}R = \begin{bmatrix}0 & 1/2 \\ 5/7 & 0 \end{bmatrix}\\ x_{1} &= \begin{bmatrix}11/2 \\ 13/7 \end{bmatrix} - \begin{bmatrix}0 & 1/2 \\ 5/7 & 0 \end{bmatrix}\begin{bmatrix} 1 \\ 1 \end{bmatrix}\\ &= \begin{bmatrix}5 \\ 8/7 \end{bmatrix} \end{align} \]

This iterative process can be continued as many times as we like, until we are happy with the numerical approximation. In this instance, this was a toy example, and the equations can be solved analytically using a method like Gaussian elimination.

Gauss-Seidel Method

The Jacobi method only works on matrices which are diagonally dominant, but Gauss-Seidel can be used on general matrices too. Convergence is not guaranteed unless the matrix is diagonally dominant or symmetric and positive definite.

The main motivation behind this method is that \(x_{n+1} = L^{-1}(b-Ux_n)\). This is not covered in any further detail in the lectures.

Software Packages

There are two different main pieces of software which can be used to solve systems of linear equations automatically. The first is LINPACK, which can solve a wide variety of linear systems and special systems. This is the standard benchmark for comparing the performance of computers. There is also LAPACK, which is a modern replacement of LINPACK. This works much better on more modern computers (such as those with the SIMD and SSE extensions), and can also scale horizontally as well as vertically.