Automatic Selection of Optimal Views in Multi-View Object Recognition
Multi-view representation techniques for 3-D free-form objects have not yet successfully dealt with the following fundamental issues:
We propose a method for automatic selection of optimal views of a free-form 3-D object. In order to represent an object efficiently, we eliminate similar views and select a relatively small number of views using an optimisation algorithm. This number varies from 5 to 25 depending on the complexity of the object and the measure of expected accuracy.
- What is the optimal number of views for each object?
- How to select the optimal views for each object?
Initially, a video camera is used to obtain a large number of views of the objects from all possible viewpoints. When an object is pictured from a large number of viewpoints, it is likely that some of the resulting images are similar and convey no additional information. As a result, an algorithm is required to identify the optimal number of images needed to represent an object. Using an arbitrary shape descriptor and its associated matching method, one can measure the similarity between two different images grabbed from a single object. The algorithm which selects the most suitable views according to an arbitrary representation is as follows.
At the end of this process, the set C will contain the full set of characteristic views of the input object determined automatically. This algorithm was tested on a set of 15 free-form 3-D objects. One view of each object of our database is shown in the figure below.
- Obtain many views of the object from different viewpoints. The jth object will have n(j) views:
V1(Oj), V2(Oj), ... , Vn(j)(Oj)
- Segment the images obtained in step 1 to recover the boundary contours. For each boundary, compute the descriptors to obtain:
DES(V1(Oj)), DES(V2(Oj)), ... , DES(Vn(j)(Oj))
- Select a threshold value t which will be used to define which views are similar. If the matching cost between two views is less than t they are marked as similar.
(Mcost(DES(Vj),DES(Vk)) <= t)   ==>   Sim(Vj,Vk) = TRUE   else   Sim(Vj,Vk) = FALSE
- Calculate the matching cost between each representation, obtained from a contour in step 1, and all other representations, obtained from other contours. If the matching cost is less than t, declare the two views as similar. Assign a rank r to each view defined as the number of views that are similar to it.
r(Vj) = sizeof {Vk   |   Sim(Vj,Vk) = TRUE }
- Create a sorted list L of all views. Each view will have a pointer to other views similar to it.
L = {Vi,Vj, ... ,Vk   |   r(Vi) >= r(Vj) >= ... >= r(Vk)}
- Start from the top of L and place the first view in the set C of characteristic views. Remove all views similar to the first view of L to obtain a reduced list.
- Move down the reduced list L and repeat the procedure in the previous step until the end of L is reached.
![]()
Initially, about 50 views of each object were grabbed. That initial set was then substantially reduced using the optimal view selection algorithm described above. The following figure shows a number of query results using arbitrary views of various objects in our database as input queries. The shape descriptor used in this experiment was the Curvature Scale Space representation, which has been selected for MPEG-7 standardization.
![]()
The value of t affects the number of views selected for each object. Generally speaking, a larger value of t results in a smaller number of views. This relationship has been shown in the following figure for the objects in our database:
![]()
Furthermore, generally speaking, a better success rate can be observed when a larger number of views are selected for each object. This relationship has been shown in the following figure for the objects in our database. Note that N indicates the number of observed outputs.
![]()
This work was supported by EPSRC Research Grant # GR/L76754/01. For further details about the optimal view selection algorithm, see the following publications:
- BMVC-2000, volume 1, pages 272-281.
- ICPR-2000, volume 1, pages 13-16.
F.Mokhtarian@ee.surrey.ac.uk June 2001