Letian jiang

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning featured image

SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning

Masked video modeling, such as VideoMAE, is an effective paradigm for video self-supervised learning (SSL). However, they are primarily based on reconstructing pixellevel details …

Fida mohammad thoker
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning featured image

SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning

Continued advances in self-supervised learning have led to significant progress in video representation learning, offering a scalable alternative to supervised approaches by …

Fida mohammad thoker