D. Koubaroulis (1) , J. Matas (1,2) and J. Kittler(1)
We have proposed a new representation of object colour structure based on the concept of the Multimodal Neighbourhood Signature (MNS). Local object colour structure is represented by illumination invariant features computed from image neighbourhoods with a multimodal colour density function. For details, download our ECCV'00 paper (ps.gz 2Mb, bibtex) or other related publications .
Motivation & Objective
Many practical applications require searching an image collection for entries depicting an object of interest, annotating a video sequence based on the objects that appear in the images or recognising a previously seen object in images acquired by a (moving) sensor (e.g. a robot camera). For instance, in a retrieval (searching) application, users could be interested in finding all frames in a video sequence showing a specific event or actor. In other situations, e.g. archiving large video databases, it is interesting to annotate the video content with a set of automatically computed concise descriptions (meta-data) which allow for fast access, ordering and retrieval of the stored information.
The objective is to establish the presence of the object of interest in the imaged scene using information from image examples. Sometimes, localising the object in the image is also desirable. The object of interest may appear in the sought images as a fractional part of the frame and its appearance may be very different than the one previously seen. A model of object appearance that provides a stable object description under various appearance transformations is sought. A set of test images is supposed to be given(the database). A set of examples of the object of interest are also assumed available. From the examples a model of object appearance is instantiated in the form of an image description which we call a signature.
You can look at a number of retrieval results using MNS.
As an example, consider the athlete in the images below. The objective of a hypothetical application could be to find all frames in a video sequence showing the athlete. The only information available to the system is a single image region.The appearance of the skater in the test images may differ dramatically from the query prototype.
3. Object Recognition
For each of the test objects on the left find the corresponding (model) object on the right based solely on colour appearance. (M. Swain's database)
3. Object Localisation
Given an example of object appearance (see retrieval above), localise the object in (a sequence of) images.
4. Video annotation
Given a set of examples from each sport discipline and a dictionary of keywords, assign keywords to unknown video frames. The keywords can then be used for classification, archiving and content-based retrieval of video data.
|Tae Kwon Do
||Track & Field
The MNS approach : Details
Multimodal (as opposed to unimodal) neighbourhoods are more informative in terms of colour content. Furthermore, computing a wide range of invariant features from modes of the density function of such neighbourhoods is straightforward within the proposed framework. Local processing guarantees graceful degradation of performance with respect to occlusion and background clutter without a need for image segmentation or edge detection. The mode values used for the computation of the invariants are robustly filtered, stable values, efficiently located in the RGB colour space with the mean shift algorithm [Fukunaga-IT75]. The representation of object appearance is flexible, effectively using the constraints about the illumination change model dictated by the application environment. Instead of the widely used histogram matching, an efficient signature comparison strategy is proposed. Signature matching is posed as an assignment problem for feature pairs. A modified version of the stable marriage matching algorithm [Gusfield89] is used to identify and approximately localise the query object in the database images.
This work is funded as part of a research project titled ASSAVID (automatic annotation of sport video sequences).