1

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles …

Lama Alssum, Juan Leo ́n Alca ́zar, Merey Ramazanova, Chen Zhao, Bernard Ghanem

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. …

Merey Ramazanova, Victor Escorcia, Fabian Caba Heilbron, Chen Zhao, Bernard Ghanem

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

Class-Incremental Learning(CIL) struggles with catastrophic forgetting when learning new knowledge, and Data-Free CIL (DFCIL) is even …

Qiankun Gao, Chen Zhao, Bernard Ghanem, Jian Zhang

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process – feature extraction and …

Juan Leon Alcazar, Moritz Cordes, Chen Zhao, Bernard Ghanem

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS …

Guocheng Qian, Xuanyang Zhang, Guohao Li, Chen Zhao, Yukang Chen, Xiangyu Zhang, Bernard Ghanem, Jian Sun

MAD: A scalable dataset for language grounding in videos from movie audio descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable …

Mattia Soldan, Alejandro Pardo, Juan Leon Alcazar, Fabian Caba Heilbron, Chen Zhao, Silvio Giancola, Bernard Ghanem

Ego4D: Around the World in 3,000 Hours of Egocentric Video

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video …

Chen Zhao, with other 84 authors

Ego4D: Around the World in 3,000 Hours of Egocentric Video

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

Temporal action detection (TAD) is an important yet challenging task in video analysis. Most existing works draw inspiration from image …

Chen Zhao, Merey Ramazanova, Mengmeng Xu, Bernard Ghanem

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

Video Self‑Stitching Graph Network for Temporal Action Localization

Short actions are critical and challenging in the task of action localization. We target this problem and propose a video self-stitching graph network (VSGN), which enhances short action by video self-stitching (VSS) and a cross-scale graph pyramid network (xGPN).

Chen Zhao, Ali Thabet, Bernard Ghanem

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Tackle the problem of network compression and acceleration in a novel perspective: enabling inference on thumbnail images without compromising accuracy. Propose supervised image downscaling, distillation-boosted supervision and feature-mapping regularization.

Chen Zhao, Bernard Ghanem

ThumbNet: One Thumbnail Image Contains All You Need for Recognition