Regionlets for Generic Object Detection

Generic object detection is confronted by dealing with different degrees of variations in distinct objectclasses with tractable computations. This demands for descriptive and flexible object representations that are also efficient to evaluate for many locations.We propose to model an object class by a cascaded boosting classifier which integrates various types of features from competing local regions, named as regionlets. A regionlet is a base feature extraction region defined proportionally to a detectionwindow at an arbitrary resolution (i.e., size and aspect ratio).

These regionlets are organized in small groups with stable relative positions to delineate fine-grained spatial layouts inside objects. Their features are aggregated to a one-dimensional feature within one group so as to tolerate deformations. Then we evaluate the object bounding box proposals generated from segmentation cues, limiting the evaluation locations to thousands. Our approach achieves very competitive performance on popular multi-class detection benchmark datasets with a single method, without any contexts. It achieves thedetection mean average precision of 41.7% on the PASCAL VOC 2007 dataset, 39.7% on the VOC 2010 for 20 object categories. We further develop support pixel integral images to efficiently augment regionlet features with the responses learned by deep convolutional neural networks. Our regionlet based method achieved 20.9% mean average precision on 200 object categories in the ImageNet Large Scale Visual Object Recognition Challenge (ILSVRC 2013).