Content area
Full Text
Computer vision researchers are developing new approaches to object recognition and detection that are based almost directly on images and avoid the use of intermediate three-dimensional models. Many of these techniques depend on a representation of images that induces a linear vector space structure and in principle requires dense feature correspondence. This image representation allows the use of learning techniques for the analysis of images (for computer vision) as well as for the synthesis of images (for computer graphics).
The synthesis problem, the classical problem of computer graphics, can be formulated as the problem of generating novel images corresponding to an appropriate set of parameters that describe the camera viewpoint and aspects of the scene. The inverse analysis problem, that of estimating object labels as well as scene parameters from images, is the classical problem of computer vision. Since the 1980s, researchers in both fields have used intermediate, physically based models to approach their respective problems of synthesis and analysis. In computer graphics, sophisticated three-dimensional (3D) modeling and rendering techniques have been developed that effectively simulate the physics of rigid and nonrigid solid objects and the physics of imaging ( I ). Some research in computer vision has followed a parallel path; most object recognition algorithms use 3D object models and exploit the properties of geometrical and physical optics to match images to the database of models (2). More recently, researchers in the two key areas of object recognition and object detection (3) have used a rather different approach that has been called image-based or view-based (48). In this approach, the image to be analyzed is compared directly, possibly after a simple filtering stage, with a set of example images.
The Role of Correspondence
The key underlying mathematical assumption of the image-based approach is that the images form a linear vector space (9). However, images are just arrays of numbers or pixels, not vectors. A set of raw images-- say, of similar objects--does not have the structure of a vector space, because operations like addition do not have a welldefined meaning for raw images (10). In pattern recognition, a standard technique for associating a vector to an image is to derive the vector components from an ordered set of measurements on the image (11)....