Juan leon alcazar

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection featured image

OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection

Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field …

Shuming liu
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions featured image

MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions

The recent and increasing interest in video-language research has driven the development of large-scale datasets that enable data-intensive machine learning techniques. In …

Mattia soldan