Max Pooling

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos featured image

Effectiveness of Max-Pooling for Fine-Tuning CLIP on Videos

CLIP is a powerful spatial feature extractor trained on a large dataset of image-text pairs. It exhibits strong generalization when extended to other domains and modalities. …

Fatimah zohra