Lost Visions: A Computational Overview

29/7/2014

The "Lost Visions" project is creating an internet-based system for enabling users to interact with digitized content in a variety of different ways. A key objective is to allow digitized content to be searchable using crowd sourced, bibliographic and content-based features. These features are captured as keywords, either based on text provided by a user (as part of a crowdsourcing activity), keywords extracted from key positions in the bibliographic data (e.g. name of illustrator, engraver, photographer, book title etc.), or those obtained by the outcome of an image-processing algorithm. A website has been implemented for the project which links into a high performance computing environment at Cardiff University to support image storage and processing.

A key novelty of this project is the combined use of these features to facilitate search.

The system enables users:

(i)                  to view and interact with digitized images;

(ii)                to tag images using keywords or to make use of a pre-defined taxonomy;

(iii)               to add images of interest into a "personal" archive. This archive does not allow images to be downloaded, but primarily to be recorded into a "personal space" on the web site;

It also enables comparison between images to be carried out using image-processing algorithms. These algorithms originate from work in content-based image retrieval, enabling full-image features to be recorded and compared across images. A "geo" element has also been included in the project, which enables search to be carried out based on a geographic bounding box, or text search for place-names via a gazetteer.

The current prototype has been implemented using PostgreSQL (with the PostGIS add on). The user interface is implemented in PyCharm (Python).

Various image processing libraries have been investigated to discover what data can be retrieved from images in order to compare and sort images automatically. So far, a "Bag of Words" method of processing the SIFT descriptors of images. This is using code found at:
https://github.com/shackenberg/Minimal-Bag-of-Visual-Words-Image-Classifier
and modified to perform a random selection from a selection of training images, and produce a confusion matrix based on its success. The SIFT descriptors only need to be calculated once, and can be stored in a separate file per image. Subsequently, K-means histograms are calculated, and these are also saved to disk for reuse.

A Support Vector Machine (SVM) is then trained and is also saved to a file. However this training needs to be performed every time the training set changes, and this is the most processor intensive (and so time consuming) action by far. Additionally, as the SVM is categorizing images based on their "closeness" to a point on a multi-dimensional plane, a threshold will be placed on how far from this point is acceptable. This distance can be considered as a "confidence" value, and will be an indicator as to which images should be further analysed, either via crowdsourcing or with different descriptor processors.

Current work on the image analysis has not yet made full use of our high-end computing capability: this is expected to be the next step in our implementation.

0 Comments

Lost Visions: A Computational Overview

Leave a Reply.

Archives

Categories

Links