Supervisors: Dr. Zhiwu Huang and Dr. Danda Pani Paudel
Categorizing facial expression into seven basic categories requires capturing subtle variations in local facial features. We believe that second order statistics such as covariance is better able to capture the variations in local facial features. In this work, we first explore the benefits of using covariance pooling for extracting image features. By employing covariance pooling after several of convolutional layers, we were able to achieve stat-of-art accuracy of 79% on RAF-DB validation set and 56.18% on SFEW 2.0 validation set. Covariance pooling can also be used to capture the temporal evolution of per-frame features. Next, we explore use of covariance pooling on per-frame features generated by convolutional neural networks and the end-to-end learning framework for videos based on covariance pooling. We present experimental results based on AFEW dataset and discuss possible improvements in our approach.