Visual Active Memory application: understanding tennis

Introduction

This work is being carried out initially as part of the IST-2001-34401 VAMPIRE project: Visual Active Memory Processes and Interactive REtrieval, and subsequently under the IST MUSCLE Network of Excellence. The project sets out to explore how an active memory system, operating at several different semantic levels, might lead to a degree of machine understanding when applied to specific scenarios.

We can demonstrate the Active Memory concept by applying it to the analysis and browsing of a tennis video. Annotation can be provided at all levels, from shot detection to a complete breakdown of the scoring during the match. At present the system will automatically analyse a tennis video to the extent that it can identify the outcome of each tennis serve and point played, with reasonable accuracy (84% at 22/7/2005).

The system is in two parts: a short-term and a long-term memory. The short-term memory analyses the video in a bottom-up fashion, making available relevant results to the long-term memory, and "forgetting" the rest. The long-term memory stores only the information that might be useful in the long term, and provides a graphical interface for browsing this information in a top-down fashion.

Future work: We are in the process of adding an audio module to detect audio cues that are characteristic of tennis.

The short-term memory

The analysis of the game is performed in a sequence of memory modules that plug into the memory system, each acting at different semantic and contextual levels. The contents of each module is created by a processing element that digests information from other modules. The modules all have the same programming interface, and are run concurrently. We call this part of the memory the "short-term" memory: once information created in this memory is finished with, it can be "forgotten". This happens either when (a) other modules have finished with it, and (b) any information deemed to be of sufficient interest for subsequent browsing has been passed to a "long-term" memory.

The system is assembled in a top-down fashion by launching a target module (or modules); this module then launches those modules needed to generate its input data, in a recursive fashion, until all of the modules needed for the application have been assembled. The complete set of modules that make up the tennis annotation system, with corresponding data flow, is shown here. The analysis of the video then proceeds in a bottom-up manner: the raw video is progressively digested by memory modules of increasing semantic content and broadening contextual scope. The modules are summarised next.

Low-level processing and mosaicing

The individual camera shots are detected, and classified into "play", "other court" and "close-ups". One of the intermediate results of the "play" shots is an image mosaic of each play shot, where the images from the shot are aligned (to compensate for camera motion) and merged into a single image, which removes any moving foreground objects:

[mosaic]

Camera location

The tennis court markings can be found from this image; they are used to determine the camera location, focal length and orientation. We are then able to identify the location of events in the real world, and construct new views:

[projection]

Tracking

The players and the ball are also tracked, in order to identify significant ball events:

[tracking result]

(or play tracking movie)

High-level reasoning

Once the camera location is known, the relationship between the image and the court surface is determined. We can use this to relate the ball and player tracks to the real world events. High-level events are detected by applying the rules of tennis to this sequence of events. This can be done on a shot-by-shot basis:

Applying tennis rules [applying tennis rules]

However, most points are played over a sequence of shots, so the system can be improved by structuring the rules, grouping shots into points, games etc.:

Structured tennis rules [structured tennis rules]

The long-term memory

The long-term memory retains useful information generated by the short-term memory, and provides the means to browse it. Unlike the short-term memory design, it is (currently) specific to the application - here in the form of a tennis browser. The annotation system can be run directly from the browser; alternatively the browser can browse precomputed results.

The main panel provides a top-down summary of the game so far, and can display the corresponding video clips, optionally with the ball and player tracks.

[Main tennis browser]

A separate panel can be used to display lower-level information (mosaic, shot boundary / classification etc.) about the progress of individual shots.

[shot browser]

Bill Christmas
Last modified: Thu Dec 15 09:56:09 GMT 2005