1

Ego-Exo4D: Understanding Skilled Human Activity from First-and Third-Person Perspectives

We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric and exocentric …

Chen Zhao

• Dec 2, 2023 • 1 min read

Deep Learning

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited …

Shuming liu

• Nov 29, 2023 • 1 min read

Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization featured image

Deep Learning

Re<sup>2</sup>TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Temporal action localization (TAL) requires long-form reasoning to predict actions of various durations and complex content. Given limited GPU memory, training TAL end to end …

Chen Zhao

• Jul 25, 2023 • 1 min read

Deep Learning

FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are …

Jiwen yu

• Jul 15, 2023 • 1 min read

Deep Learning

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory …

Jinjie mai

• Jul 15, 2023 • 1 min read

Deep Learning

A Unified Continual Learning Framework with General Parameter-Efficient Tuning

The 'pre-training → downstream adaptation' presents both new opportunities and challenges for Continual Learning (CL). Although the recent state-of-the-art in CL is achieved …

Qiankun gao

• Jul 14, 2023 • 1 min read

Deep Learning

Large-capacity and Flexible Video Steganography via Invertible Neural Network

Video steganography is the art of unobtrusively concealing secret data in a cover video and then recovering the secret data through a decoding protocol at the receiver end. …

Chong mou

• Jun 10, 2023 • 1 min read

Deep Learning

ETAD: Training Action Detection End to End on a Laptop

Untrimmed video understanding such as temporal action detection (TAD) often suffers from the pain of huge demand for computing resources. Because of long video durations and …

Shuming liu

• Jun 2, 2023 • 1 min read

Deep Learning

Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context

Temporal action localization (TAL) is an important task extensively explored and improved for third-person videos in recent years. Recent efforts have been made to perform …

Merey ramazanova

• Jun 1, 2023 • 1 min read

Deep Learning

Just a Glimpse: Rethinking Temporal Information for Video Continual Learning

Class-incremental learning is one of the most important settings for the study of Continual Learning, as it closely resembles real-world application scenarios. With constrained …

Lama alssum

• Jun 1, 2023 • 1 min read

No results found

1