This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

Weakly supervised learning of interactions between humans and objects

A. Prest, C. Schmid, and V. Ferrari
IEEE Transactions on Pattern Analysis and Machine Intelligence
Vol. 34, No. 3, pp. 601-614, March 2012


We introduce a weakly supervised approach for learning human actions modeled as interactions between humans and objects. Our approach is human-centric: we first localize a human in the image and then determine the object relevant for the action and its spatial relation with the human. The model is learned automatically from a set of still images annotated only with the action label. Our approach relies on a human detector to initialize the model learning. For robustness to various degrees of visibility, we build a detector that learns to combine a set of existing part detectors. Starting from humans detected in a set of images depicting the action, our approach determines the action object and its spatial relation to the human. Its final output is a probabilistic model of the human-object interaction, i.e. the spatial relation between the human and the object. We present an extensive experimental evaluation on the sports action dataset from Gupta et al., the PASCAL Action 2010 dataset, and a new human-object interaction dataset.

Download in pdf format
  author = {A. Prest and C. Schmid and and V. Ferrari},
  title = {Weakly supervised learning of interactions between humans and objects},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2012},
  month = {March},
  pages = {601-614},
  volume = {34},
  number = {3},
  keywords = {Action Recognition, Weakly Supervised Learning, Object Detection}