Silvio giancola

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field …

Shuming liu

• Mar 2, 2025 • 1 min read

Deep Learning

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

With the recent advances in video and 3D understanding, novel 4D spatio-temporal methods fusing both concepts have emerged. Towards this direction, the Ego4D Episodic Memory …

Jinjie mai

• Jul 15, 2023 • 1 min read

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In …

Mattia soldan

• Mar 2, 2022 • 1 min read

No results found

Silvio giancola

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions