Datasets Produced by LILiR

LILiR main dataset - under construction, this is a big task and will be available at some point in the future.

Image capture rig

One of the main tasks of LILiR is to provide a large English audio visual dataset. This dataset is currently under preparation. It will consist 20 speakers reading 200 sentences per speaker within the Resource Management domain of discourse. The dataset will be captured at multiple viewing angles in both HD and SD and will also provide some natural conversational speech.

LILiR TwoTalk corpus

Example frames from corpus

The LILiR TwoTalk corpus is a collection of four conversations involving 2 persons engaged in casual conversation for 12 minutes each. 527 short clips were extracted and annotated by multiple observers to investigate non-verbal communication in conversations.

The data set and annotations are available for download.

Multilingual Audio-visual dataset - recording in progress at the University of East Anglia and will be available when completed.

The Multilingual audio-visual dataset is designed to allow language identification, it currently consists of 19 bi/tri-lingual speakers reading the UN declaration of human rights. Each subject reads in both their native language and English. Video resolution 480x640, 25fps, audio 48KHz.

Other Datasets of Interest

AVLetters1 - This dataset was generated in 1998 at the University of East Anglia and is available here.

AVLetters1 contains letters 'a' to 'z' spoken by 10 speakers with 3 repetitions for each letter totalling 780 utterances, 18562 frames with the mouth region 60x80 pixels in greyscale.

AVLetters2 - This dataset was recorded at University of Surrey in the summer of 07. It will be made available shortly... check back soon.

AVLetters2 contains letters 'a' to 'z' spoken by 6 individuals with 7 repetitions for each letter. This dataset was recorded in colour using High Definition cameras 1920x1080 RGB.

The GRID audio-visual sentence corpus - kindly provided by Jon Barker of the University of Sheffield was used in some of our tests. For more details see here.

The GRID Corpus consists of 34 speakers and 1000 sentences with a fixed and short artificial grammar and small vocabulary.

  • XM2VTS - Audio-visual biometric database from the University of Surrey.
  • BANCA - Audio-visual biometric database from the University of Surrey.