Bernard ghanem

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries featured image

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory …

Jinjie mai
A Unified Continual Learning Framework with General Parameter-Efficient Tuning featured image

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

The 'pre-training → downstream adaptation' presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved …

Qiankun gao
Large-capacity and Flexible Video Steganography via Invertible Neural Network featured image

Large-capacity and Flexible Video Steganography via Invertible Neural Network

Video steganography is the art of unobtrusively concealing secret data in a cover video and then recovering the secret data through a decoding protocol at the receiver end. …

Chong mou
ETAD: Training Action Detection End to End on a Laptop featured image

ETAD: Training Action Detection End to End on a Laptop

Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. Because of long video durations and …

Shuming liu
Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context featured image

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. Recent efforts have been made to perform …

Merey ramazanova
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning featured image

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained …

Lama alssum
R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning featured image

R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning

Class-Incremental Learning(CIL) struggles with catastrophic forgetting when learning new knowledge, and Data-Free CIL (DFCIL) is even more challenging without access to the …

Qiankun gao
End-to-End Active Speaker Detection featured image

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon a two-stage process -- feature extraction and spatio-temporal context aggregation. In this paper, we …

Juan leon alcazar
When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search featured image

When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search

The key challenge in neural architecture search (NAS) is designing how to explore wisely in the huge search space. We propose a new NAS method called TNAS (NAS with trees), which …

Guocheng qian
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions featured image

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In …

Mattia soldan