Although extremely reliable methods of biometric personal identification exist, eg. fingerprint analysis, retinal or iris scans, most of these methods are considered unacceptable by users in all but high-security scenarios. Personal identification system based on analysis of speech, frontal or profile images of face are non-intrusive and therefore user-friendly. Moreover, personal identity can by often ascertained without client's assistance. However, the speech and image-based systems are less robust to imposter attack, especially if the imposter possesses information about a client, eg. a photograph or a recording of client's speech. Multi-modal personal verification is one of the most promising approaches to user-friendly (hence acceptable) highly secure personal verification systems.

Recognition and verification system need training; the larger the training set, the better the performance achieved. The volume of data required for training a multi-modal system based on analysis of video and audio signals is in the order of TBytes (1000 GBytes); technology allowing manipulation and effective use of such amounts of data has only recently become available in the form of digital video.

At the Centre for Vision, Speech and Signal Processing we have captured a large multi-modal database which will enable the research community to test their multi-modal face verification algorithms on a high-quality large dataset. In aquiring the XM2FDB database 295 volunteers from the University of Surrey visited our recording studio four times at approximately one month intervals. On each visit (session) two recordings (shots) were made. The first shot consisted of speech whilst the second consisted of rotating head movements. Digital video equipment was used to capture the entire database. At the third session a high-precision 3D model of the subjects head was built using an active stereo system provided by the Turing Institute.

At present only selected image subsets grabbed from the database and all the audio files from the database, stored on CDROM, are available at production cost. However, the entire database will be made available as a set of 20 1-hour DV Mini Cassettes fairly soon as will the 3D datasets.


Any comments or general enquiries please contact: Dr Chi Ho Chan (Chiho.Chan@surrey.ac.uk)