Deep learning

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. …

Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process – feature extraction and …

Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS …

Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

Temporal action detection (TAD) is an important yet challenging task in video analysis. Most existing works draw inspiration from image …

Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Tackle the problem of network compression and acceleration in a novel perspective: enabling inference on thumbnail images without compromising accuracy. Propose supervised image downscaling, distillation-boosted supervision and feature-mapping regularization.

Chen Zhao, Bernard Ghanem

ThumbNet: One Thumbnail Image Contains All You Need for Recognition