Related Datasets
As part of the LILiR project a number of datasets were/are used or generated:
AVLetters1 - This dataset was generated in 1998 at the University of East Anglia and is available here.
AVLetters1 contains letters 'a' to 'z' spoken by 10 speakers with 3 repetitions for each letter totalling 780 utterances, 18562 frames with the mouth region 60x80 pixels in greyscale.
AVLetters2 - This dataset was recorded at University of Surrey in the summer of 07. It will be made available shortly... check back soon.
AVLetters2 contains letters 'a' to 'z' spoken by 6 individuals with 7 repetitions for each letter. This dataset was recorded in colour using High Definition cameras 1920x1080 RGB.
The GRID audio-visual sentence corpus - kindly provided by Jon Barker of the University of Sheffield was used in some of our tests. For more details see here.
The GRID Corpus consists of 34 speakers and 1000 sentences with a fixed and short artificial grammar and small vocabulary.
Multilingual Audio-visual dataset - recording in progress at the University of East Anglia and will be available when completed.
The Multilingual audio-visual dataset is designed to allow language identification, it currently consists of 19 bi/tri-lingual speakers reading the UN declaration of human rights. Each subject reads in both their native language and English. Video resolution 480x640, 25fps, audio 48KHz.
LILiR main dataset - under construction, this is a big task and will be available at some point in the future.
One of the main tasks of LILiR is to provide a large English audio visual dataset. This dataset is currently under preparation. It will consist 20 speakers reading 200 sentences per speaker within the Resource Management domain of discourse. The dataset will be captured at multiple viewing angles in both HD and SD and will also provide some natural conversational speech.
Other datasets of interest: