Invertible Diffusion Models for Compressed Sensing
While deep neural networks (NNs) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch …

While deep neural networks (NNs) significantly advance image compressed sensing (CS) by improving reconstruction quality, the necessity of training current CS NNs from scratch …
Large video-language models (VLMs) have demonstrated promising progress in various video understanding tasks. However, their effectiveness in long-form video analysis is …
Exposure correction is a fundamental problem in computer vision and image processing. Recently, frequency domainbased methods have achieved impressive improvement, yet they still …
Masked video modeling, such as VideoMAE, is an effective paradigm for video self-supervised learning (SSL). However, they are primarily based on reconstructing pixellevel details …
Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by …
Temporal action detection (TAD) is a fundamental video understanding task that aims to identify human actions and localize their temporal boundaries in videos. Although this field …
CLIP is a powerful spatial feature extractor trained on a large dataset of image-text pairs. It exhibits strong generalization when extended to other domains and modalities. …
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (house-hold, …
Publish your data science and research directly from Jupyter Notebooks. No screenshots required.
Movie trailers are an essential tool for promoting films and attracting audiences. However the process of creating trailers can be time-consuming and expensive. To streamline this …