End-to-End Active Speaker Detection
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process -- feature extraction and spatio-temporal context aggregation. In this paper, we …
Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process -- feature extraction and spatio-temporal context aggregation. In this paper, we …
The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which …
The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In …
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (house-hold, …
Temporal action detection (TAD) is an important yet challenging task in video analysis. Most existing works draw inspiration from image object detection and tend to reformulate it …
Short actions are critical and challenging in the task of action localization. We target this problem and propose a video self-stitching graph network (VSGN), which enhances …
Tackle the problem of network compression and acceleration in a novel perspective: enabling inference on thumbnail images without compromising accuracy. Propose supervised image …
Temporal action detection is a fundamental yet challenging task in video understanding. Video context is a critical cue to effectively detect actions, but current works mainly …