Deep Learning

ETAD: Training Action Detection End to End on a Laptop featured image

ETAD: Training Action Detection End to End on a Laptop

Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. Because of long video durations and …

Shuming liu
Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context featured image

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. Recent efforts have been made to perform …

Merey ramazanova
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning featured image

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained …

Lama alssum
End-to-End Active Speaker Detection featured image

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process -- feature extraction and spatio-temporal context aggregation. In this paper, we …

Juan leon alcazar
When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search featured image

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which …

Guocheng qian
SegTAD: Precise Temporal Action Detection via Semantic Segmentation featured image

SegTAD: Precise Temporal Action Detection via Semantic Segmentation

Temporal action detection (TAD) is an important yet challenging task in video analysis. Most existing works draw inspiration from image object detection and tend to reformulate it …

avatar
Chen Zhao
ThumbNet: One Thumbnail Image Contains All You Need for Recognition featured image

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Tackle the problem of network compression and acceleration in a novel perspective: enabling inference on thumbnail images without compromising accuracy. Propose supervised image …

avatar
Chen Zhao