This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Search for Publication

Year(s) from:  to 
Keywords (separated by spaces):

Hierarchical Attention and Context Modeling for Group Activity Recognition

L. Kong, J. Qin, D. Huang, Y. Wang and L. Van Gool
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
April 2018, in press


Group activity recognition in videos is a challenging task, with two major issues, i.e. attending to those persons and their body parts that contribute significantly to the activity, and modeling contextual person structures in the group. Most previous approaches fail to provide a practical solution to jointly address both issues, however. In this paper, we propose to simultaneously deal with both issues via a hierarchical attention and context modeling framework based on Long Short-Term Memory (LSTM) networks. For the former, we propose 'Hierarchical Attention Networks' applied at the part/person level, capable of attending distinctively to different persons and their body parts. For the latter, we build 'Hierarchical Context Networks' that take the attentively pooled person-level features as input and recurrently model intra/inter-group contextual structures. The attentive and contextual representations are concatenated and fed into another LSTM to generate high-level discriminative temporal representations for group activity recognition. Extensive experiments on two widely-used group activity datasets demonstrate the effectiveness and superiority of the proposed framework.

Download in pdf format
  author = {L. Kong and J. Qin and D. Huang and Y. Wang and L. Van Gool},
  title = {Hierarchical Attention and Context Modeling for Group Activity Recognition},
  booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2018},
  month = {April},
  keywords = {},
  note = {in press}