SurfCap: Surface Motion Capture

J. Starck and A. Hilton

These web-pages give an overview of the SurfCap project at the Centre for Vision, Speech and Signal Processing, University of Surrey, UK. Surface motion capture, or SurfCap, captures a shape and appearance model from a performer in a multiple camera studio to create 3D digital models that are indistinguishable from reality.  

Digital content production traditionally requires highly skilled artists and animators to first manually craft shape and appearance models and then instill the models with a believable performance. Motion capture (MoCap) technology is increasingly used to record the articulated motion of a real human performance to increase the visual realism in animation. Motion capture is however limited to recording only the skeletal motion of the human body using a specialist suit and lacks the secondary surface dynamics for example in cloth and hair that provides the visual realism of a live performance. Conventional digital models often fall into Uncanny Valley, never quite seeming real. SurfCap instead captures the complete performance of an actor in full wardrobe to create highly realistic character models that reproduce the appearance of a real person recorded in multiple view video footage.  

The multiple-view video and scene reconstruction data are available as a resource to the computer vision and computer graphics research community. 

[Results] [Data] [Publications]


Latest results rendered using the Axiom game engine:

[popping cycle (avi, 2.3MB)]
[locking cycle (avi, 2.3MB)]
[kick freeze frame (avi, 1.1MB)]
[character control (avi, 1.8MB)]

For the movies you will need the xvid codec.

SurfCap Data

Surface capture is recorded in a dedicated blue-screen studio. In our studio, eight HD cameras are equally spaced around a circle of 8m diameter at a height of 2m above the studio floor. This gives a performance volume of 4󫶖m with a wide-baseline 45degree angle between adjacent camera views. Performances are captured using Thomson Viper cameras in HD-SDI 20-bit 4:2:2 format with 1920 1080 resolution at 25Hz progressive scan. Synchronized video from all eight cameras are recorded uncompressed direct to disk with eight dedicated PC capture boxes using DVS HD capture cards.  

The multiple-view video, camera calibration, foreground mattes and surface reconstruction is available to the research community, please contact J. Starck for more information. This data was captured with thanks to JP Omari at streetfunk. The geometry for the sequences can be seen here:  locking, popping, flashkick, freestyle, headstand, kickup, and pop2lock plus lock2pop transitions.

Selected Publications


Surface Capture for Performance-Based Animation
J. Starck and A. Hilton.
IEEE Computer Graphics and Applications (CG&A), 2007.
[pdf] [BibTeX]


Volumetric Stereo with Silhouette and Feature Constraints
J. Starck, G. Miller and A. Hilton.
British Machine Vision Conference (BMVC), 2006.
[pdf] [BibTeX]


Spherical Matching for Temporal Correspondence of Non-Rigid Surfaces
J. Starck, A. Hilton.
IEEE International Conference on Computer Vision (ICCV), 2005.
[pdf] [BibTeX]
Video-Based Character Animation
J. Starck, G. Miller and A. Hilton.
ACM SIGGRAPH Symposium on Computer Animation (SCA), 2005.
[pdf] [BibTeX]

[Home] [Results] [Data] [Publications]