Publications

Towards Automated Movie Trailer Generation
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Towards Automated Movie Trailer Generation
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Ego-Exo4D: Understanding Skilled Human Activity from First-and Third-Person Perspectives
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Ego-Exo4D: Understanding Skilled Human Activity from First-and Third-Person Perspectives
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Re2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
International Conference on Computer Vision (ICCV), 2023.
FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
International Conference on Computer Vision (ICCV), 2023. [Won the first place in Ego4D VQ3D Challenge 2023, Oral].
EgoLoc: Revisiting 3D Object Localization from Egocentric Videos with Visual Queries
A Unified Continual Learning Framework with General Parameter-Efficient Tuning
International Conference on Computer Vision (ICCV), 2023.
A Unified Continual Learning Framework with General Parameter-Efficient Tuning
Large-capacity and Flexible Video Steganography via Invertible Neural Network
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
Large-capacity and Flexible Video Steganography via Invertible Neural Network
ETAD: Training Action Detection End to End on a Laptop
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2023. [Oral].
ETAD: Training Action Detection End to End on a Laptop
Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2023.
Owl (observe, watch, listen): Localizing actions in egocentric video via audiovisual temporal context
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2023. [Best Paper Award, Oral].
Just a Glimpse: Rethinking Temporal Information for Video Continual Learning
R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning
European Conference on Computer Vision (ECCV), 2022.
R-DFCIL: Relation-Guided Representation Learning for Data-Free Class Incremental Learning
End-to-End Active Speaker Detection
European Conference on Computer Vision (ECCV), 2022.
End-to-End Active Speaker Detection
Evaluation of Diverse Convolutional Neural Networks and Training Strategies for Wheat Leaf Disease Identification with Field-Acquired Photographs
Remote Sensing, 2022.
Evaluation of Diverse Convolutional Neural Networks and Training Strategies for Wheat Leaf Disease Identification with Field-Acquired Photographs
When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search
IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), 2022.
When NAS Meets Trees: An Efficient Algorithm for Neural Architecture Search
MAD: A scalable dataset for language grounding in videos from movie audio descriptions
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
MAD: A scalable dataset for language grounding in videos from movie audio descriptions
Ego4D: Around the World in 3,000 Hours of Egocentric Video
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022. [Best Paper Nominee, Oral].
Ego4D: Around the World in 3,000 Hours of Egocentric Video
SegTAD: Precise Temporal Action Detection via Semantic Segmentation
European Conference on Computer Vision Workshop (ECCVW), 2022.
SegTAD: Precise Temporal Action Detection via Semantic Segmentation
Video Self‑Stitching Graph Network for Temporal Action Localization
IEEE International Conference on Computer Vision (ICCV), 2021.
Video Self‑Stitching Graph Network for Temporal Action Localization
ThumbNet: One Thumbnail Image Contains All You Need for Recognition
ACM International Conference on Multimedia (ACM MM), 2020.
ThumbNet: One Thumbnail Image Contains All You Need for Recognition
Improve Baseline for Temporal Action Detection: HACS Challenge 2020 Solution of IVUL‑KAUST team
CVPR Workshop of HACS Temporal Action Localization Challenge, 2020.
Improve Baseline for Temporal Action Detection: HACS Challenge 2020 Solution of IVUL‑KAUST team
G‑TAD: Sub‑Graph Localization for Temporal Action Detection
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
G‑TAD: Sub‑Graph Localization for Temporal Action Detection
Optimization‑Inspired Compact Deep Compressive Sensing
IEEE Journal of Selected Topics in Signal Processing (JSTSP), 2020.
Logistic Regression is Still Alive and Effective: The 3rd YouTube 8M Challenge Solution of the IVUL‑KAUST team
ICCV Workshop of the 3rd YouTube-8M Large-Scale Video Understanding, 2019.
Logistic Regression is Still Alive and Effective: The 3rd YouTube 8M Challenge Solution of the IVUL‑KAUST team
CREAM: CNN-REgularized ADMM framework for compressive-sensed image reconstruction
IEEE Access, 2018.
BoostNet: A Structured Deep Recursive Network to Boost Image Deblocking
IEEE Visual Communications and Image Processing (VCIP), 2018. [Oral].
Better and Faster, when ADMM Meets CNN: Compressive-sensed Image Reconstruction
The Pacific-Rim Conference on Multimedia (PCM), 2017. [Oral].
Reducing Image Compression Artifacts by Structural Sparse Representation and Quantization Constraint Prior
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2017.
Video Compressive Sensing Reconstruction via Reweighted Residual Sparsity
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2017.
CONCOLOR: COnstrained Non-Convex Low-Rank Model for Image Deblocking
IEEE Transactions on Image Processing (TIP), 2016.
Nonconvex Lp Nuclear Norm based ADMM Framework for Compressive Sensing
Data Compression Conference (DCC), 2016. [Oral (acceptance rate < 10%)].
Compressive-Sensed Image Coding via Stripe-based DPCM
Data Compression Conference (DCC), 2016. [Oral (acceptance rate < 10%)].
A Dual Structured-Sparsity Model for Compressive-Sensed Video Reconstruction
IEEE International Conference on Visual Communications and Image Processing (VCIP), 2015. [Oral].
基于云数据的高效图像编码方法
计算机学报, 2016. [Recommended from NCMT, Best Paper Award in NCMT].
Thousand to one: An image compression system via cloud search
IEEE 17th International Workshop on Multimedia Signal Processing (MMSP) 2015.
Adaptive intra-refresh for low-delay error-resilient video coding
Journal of Visual Communication and Image Representation (JVCIR), 2015.
Video Compressive Sensing via Structured Laplacian Modelling
IEEE International Conference on Visual Communications and Image Processing (VCIP), 2014. [Oral].
Adaptive intra-refresh for low-delay error-resilient video coding
Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014.
Image Compressive-Sensing Recovery Using Structured Laplacian Sparsity in DCT Domain and Multi-Hypothesis Prediction
IEEE International Conference on Multimedia & Expo (ICME), 2014.
Image Compressive Sensing Recovery Using Adaptively Learned Sparsifying Basis via L0 Minimization
Signal Processing (SP), 2014.
Weakly Supervised Photo Cropping
IEEE Transactions on Multimedia (TMM), 2014.
Wavelet Inpainting Driven Image Compression via Collaborative Sparsity at Low Bit Rates
IEEE International Conference on Image Processing (ICIP), 2013. [Oral].
A Highly Effective Error Concealment Method for Whole Frame Loss
IEEE International Symposium of Circuits and Systems (ISCAS), 2013. [Oral].
Image Super-Resolution via Dual-Dictionary Learning and Sparse Representation
IEEE International Symposium of Circuits and Systems (ISCAS), 2012.
Exploiting Image Local and Nonlocal Consistency for Mixed Gaussian-Impulse Noise Removal
IEEE International Conference on Multimedia & Expo (ICME), 2012.
Compressed Sensing Recovery via Collaborative Sparsity
IEEE Data Compression Conference (DCC), 2012. [Oral (acceptance rate < 10%)].
Image Compressive Sensing Recovery via Collaborative Sparsity
IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS), 2012.