A system and method for real-time multi-view (i.e. not just frontal view) face detection. The system and method uses a sequence of detectors of increasing complexity and face/non-face discriminating thresholds to quickly discard non-faces at the earliest stage possible, thus saving much computation compared to prior art systems. The detector-pyramid architecture for multi-view face detection uses a coarse-to-fine and simple-to-complex scheme. This architecture solves the problem of lengthy processing that precludes real-time face detection effectively and efficiently by discarding most of non-face sub-windows using the simplest possible features at the earliest possible stage. This leads to the first real-time multi-view face detection system which has the accuracy almost as good as the state-of-the-art system yet 270 times faster, allowing real-time performance.
CROSS-REFERENCE TO RELATED APPLICATIONS
This is a continuation of a prior application entitled "A SYSTEM AND METHOD FOR MULTI-VIEW FACE DETECTION" which was assigned Ser. No. 10/091,100 and was filed Mar. 4, 2002, now U.S. Pat. No. 7,050,607 which claimed the priority of provisional application No. 60/339,545 filed on Dec. 8, 2001"
Methods for image processing for detecting and recognizing an image object include detecting an image object using pose-specific object detectors, and performing fusion of the outputs from the pose-specific object detectors. The image object is recognized using pose-specific object recognizers that use outputs from the pose-specific object detectors and the fused output of the pose-specific object detectors; and by performing fusion of the outputs of the pose-specific object recognizers to recognize the image object.
An image processing apparatus for tracking faces in an image stream iteratively receives a new acquired image from the image stream, the image potentially including one or more face regions. The acquired image is sub-sampled (112) at a specified resolution to provide a sub-sampled image. An integral image is then calculated for a least a portion of the sub-sampled image. Fixed size face detection (20) is applied to at least a portion of the integral image to provide a set of candidate face regions. Responsive to the set of candidate face regions produced and any previously detected candidate face regions, the resolution at which a next acquired image is sub-sampled is adjusted.
An image processing apparatus for tracking faces in an image stream iteratively receives an acquired image from the image stream potentially including one or more face regions. The acquired image is sub-sampled at a specified resolution to provide a sub-sampled image. An integral image is then calculated for a least a portion of the sub-sampled image. Fixed size face detection is applied to at least a portion of the integral image to provide a set of candidate face regions. Responsive to the set of candidate face regions produced and any previously detected candidate face regions, the resolution is adjusted for sub-sampling a subsequent acquired image.
An image processing apparatus for tracking faces in an image stream iteratively receives a new acquired image from the image stream, the image potentially including one or more face regions. The acquired image is sub-sampled (112) at a specified resolution to provide a sub-sampled image. An integral image is then calculated for a least a portion of the sub-sampled image. Fixed size face detection (20) is applied to at least a portion of the integral image to provide a set of candidate face regions. Responsive to the set of candidate face regions produced and any previously detected candidate face regions, the resolution at which a next acquired image is sub-sampled is adjusted.
A database includes an identifier and associated parameters for each of a number of faces to be recognized. A new acquired image from an image stream is received potentially including one or more face regions. Face detection is applied to at least a portion of the acquired image to provide a set of candidate face regions each having a given size and a respective location. Using the database, face recognition is selectively applied to at least one of the candidate face regions to provide an identifier for a face recognized in a candidate face region. A portion of the image is stored including the recognized face in association with at least one image of the image stream.