This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

Learning Semantic Object Parts for Object Categorization

B. Leibe, A. Ettlin, B. Schiele
Image and Vision Computing
Vol. 26, No. 1, pp. 15-26, January 2008


Appearance-based approaches to object recognition mostly rely on measuring the visual similarity of objects based on global or local descriptors. They have shown great success in object identification but often do not generalize to the more challenging case of object categorization, where category membership is often decided not only on a level of appearances, but also on a semantic level. It has been argued that model-based approaches are better suited to this problem, since they allow to inject high-level knowledge, for example about the constituting object parts and possible configurations. Postulating a set of object parts is problematic, though, since it is not guaranteed that those parts can be reliably extracted from real-world images. There is a need for a middle layer, forming an interface between the visual information readily available from the image and the higher-level semantic information that can be used by reasoning processes. In this work, we investigate how such an interface can be learned. As the appearance of object parts may vary considerably, this cannot be achieved by relying on visual similarity alone. Rather, this paper proposes to also use co-location and co-activation, together with weak top-down constraints, such as alignment, as guiding principles for learning the appearance of local object parts. The learned structures generalize beyond the appearance of single objects and often correspond to semantically plausible object parts, such as wheels, trunks, or windshields of cars. In a later stage, a Bayesian network of those extracted structures is used to verify object hypotheses successfully in difficult scenes.

Download in pdf format
  author = {B. Leibe and A. Ettlin and B. Schiele},
  title = {Learning Semantic Object Parts for Object Categorization},
  journal = {Image and Vision Computing},
  year = {2008},
  month = {January},
  pages = {15-26},
  volume = {26},
  number = {1},
  keywords = {}