2014/15
Q1
Part A
Discuss the motivation for feature extraction in image processing and give examples of features that could be useful in an image classification. (8 marks)
Feature extraction is the process of taking an image and using automated techniques to extract various features (e.g., width, height, area, perimeter, moments, edges, etc.) from images, then use these features as inputs to a classification algorithm. Without feature extraction on images, there would be a huge feature space for each image (e.g., \(128 \times 128 \times 128 = 2,000,000\) features for a 128x128 image grayscale).
Feature extraction is combined with preprocessing techniques and is then typically combined with some dimensionality reduction technique such as PCA to allow the algorithm to extract the most useful features from an image.
Part B
Describe three different metrics that could be used to measure the distance of image examples in feature space. (9 marks)
Euclidean distance
The Euclidean distance is the shortest straight line distance from one feature to another in N-dimensional space. For a 2D feature space, this would be calculated as the length of the hypotenuse of a right-angled triangle, with each of the adjacent and opposite sides corresponding to some scalar value of a basis vector. In higher dimensioned spaces, the same thing applies.
Hamming distance
The Hamming distance is the number of bits in a stream of bits that differ between inputs, or could be generalised to be characters or specific information. Where two strings \(X,Y\) are both length \(N\), and \(X,Y \in \{0,1\}\), then the hamming distance +=1 wherever \(X_i \ne Y_i\).
Manhattan distance
The Manhattan distance is the shortest distance between two points in some vector space as a sum of scalar values of the basis vectors. For the 2D grid, the basis vectors are \((1,0)^T\) and \((0,1)^T\), so a point at \((1,1)\) would have Manhattan distance of 2.
Part C
Describe the k-nearest neighbour rule by giving a pseudo code implementation. (9 marks)
Before giving an implementation, want to recall what the algorithm does. First it plots all the features in the feature space, with assigned labels to each of the features in that space. This is the training set. From here, we want to classify a new point. We plot this point in the feature space, as with the other points, but it is currently not assigned a label. Previous data is assigned a label as this is a supervised learning algorithm.
With this point plotted, we can use a metric as described in (b) to compute the distances between this point and all neighbours (naïvely). We can then sort this list to get the nearest neighbours. The parameter \(k\) then gives the number of items for us to consider in a majority voting idea. For \(k=5\), we choose the 5 nearest items then assign a class based on the majority of those items. For even values of \(k\), or where there are two different classes with the same number of votes, we can use a tiebreaker strategy.
As pseudocode for 2D points, it would look like:
type point (int x, int y)
type graph (point,class)[]
def knnClassify(graph nodes, point p, int k):
distances = []
for q,c in nodes:
distances += (distance(p, q), c)
distances = sort(distances)[:k]
numEach = countOfEachClass(distances)
numEach = sort(numEach).reverse()
if numEach[0] == numEach[1]: # more than 1 class with top score
return distances[0].class
return numEach[0].class
Part D
Describe how you could build a face recognition system using ideas in (a), (b) and (c). (7 marks)
A face recognition system would start with background removal from an image. Following this, some kernels could be used to pull out edges in an image, e.g., Sobel. From here, we then use wavelets at varying angles which will output different numbers for different faces. We can combine these with geometric properties (e.g., the perimeter, area, moments, width, height) of the face (ideally using those invariant to rotation, scale, translation).
From that, we can use the extracted features and use a principle component analysis algorithm to extract those features with the highest variance. Dependant on the system we are using, we either need a labelled or unlabelled classification algorithm, but as we typically use face recognition to verify or match a person to a known model.
In the instance that we have many faces which have been manually labelled either through an enrolment stage, or a labelled dataset, we can then perform recognition with KNN. We alter the parameters \(k\) and the number of components in the PCA to tune our cumulative match curve. If we were to
TBC