Content-Based Motion Compensation

and its application to Video Coding

PhD research undertaken by
Marc Servais
at the
Centre for Vision, Speech and Signal Processing
University of Surrey





Content-based approaches to motion compensation offer the advantage of being able to adapt to the spatial and temporal characteristics of a scene. Three such motion compensation techniques were researched in detail, with one of the methods having been integrated into a video codec.

The first approach operates by performing spatial-temporal segmentation of a frame. A split and merge approach is then used to ensure that motion characteristics are relatively homogeneous within each region. Region shape information is coded (by approximating the boundaries with polygons) and a triangular mesh is generated within each region. Translational and affine motion estimation are then performed on each triangle within the mesh. This approach offers an improvement in quality when compared to a regular mesh of the same size. However, it is difficult to control the number of triangles, since this depends on the segmentation and polygon approximation stages. As a result, this approach is difficult to integrate into a rate-distortion framework.

The second method involves the use of variable-size blocks, rather than a triangular mesh. Once again, a frame is first segmented into regions of homogeneous motion, which are then approximated with polygons. A grid of blocks is created in each region, with the block size inversely proportional to the motion compensation error for that region. This ensures that regions with complex motion are populated by smaller blocks. Following this, bi-directional translational and affine motion parameters are estimated for each block. In contrast to the mesh-based approach, this method allows the number of blocks to be easily controlled. Nevertheless, the number and shape of regions remains very sensitive to the segmentation parameters used.

The third technique also uses variable size blocks, but the spatio-temporal segmentation stage is replaced with a simpler and more robust binary block partitioning process. If a particular block does not allow for accurate motion compensation, then it is split into two using the horizontal or vertical line that achieves the maximum reduction in motion compensation error. Starting with the entire frame as one block, the splitting process is repeated until a large enough binary tree of blocks is obtained. This method causes partitioning to occur along motion boundaries, thus substantially reducing blocking artifacts compared to regular block matching. In addition, small blocks are placed in regions of complex motion, while large blocks cover areas of uniform motion. The proposed technique provides significant gains in picture quality when compared to fixed size block matching at the same total rate.

The binary partition tree method has been integrated into a hybrid video codec. (The codec also has the option of using fixed-size blocks or H.264/AVC variable-size blocks.) Results indicate that the binary partition tree method of motion compensation leads to improved rate-distortion performance over the state-of-the-art H.264/AVC variable-size block matching. This advantage is most evident at low bit-rates, and also in the case of bi-directionally predicted frames.

Marc Servais