
The objective in this Master's thesis has been to be able to represent the dominating content of an image collection of about 100 images using only a carefully selected group of five. To achieve this, SIFT descriptors and colour hues found in a large training set were clustered using k-means into 1000 and 200 classes respectively. In each analysed image, descriptors were extracted along with the colours of the pixels where descriptors were found as well as a selection of pixels from a Gaussian distribution, and these descriptors and colours were then classified. The 1000-bin histogram of descriptors and the two 200-bin histograms of colours from descriptor pixels and Gaussian pixels respectively were normalised, concatenated and given experimentally decided weights of $\sqrt{0.725}$ for descriptors and $\sqrt{0.1375}$ for each colour histogram, yielding a feature vector with a Euclidean norm of 1.
For comparing distances between image feature vectors, a weighted cosine similarity was used. To compensate for the exaggerated similarity of images with mediocre content, an experimentally decided factor of 0.8 times a feature vector equal to the normalised mean of the training data was subtracted from every image before angles were calculated. At the same time, the data was transformed into principal components and had its dimension reduced to 30. The finding of representative structures utilised a two-step method, with the initial search for a subgroup large enough for its content to be considered representative and secondly picking the five most similar images from that group. Experiments and good judgement ensured that the best way of handling the first step was by finding the 30 most similar images in the collection.
The final algorithm was tested with images returned from ten different search queries, with the precision of both the initial selection of 30 and the final selection of 5 images measured. Results from using a combination of descriptors and colour information were compared to the use of one at a time, and the combined approach was the most successful one. The methods worked very well on images of the same specific object and also improved the precision when used on collections of images of different objects belonging to the same category. The average precision of the selections of five representative images using the combined descriptor and colour histograms was 0.94.
As a complementary task, k-means clustering of a few image collections was performed, showing that several different structures could be captured in different clusters.
Questions: webmaster
Senast uppdaterad: 2009-06-29