Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Zhang, Chuhan; Miech, Antoine; Shen, Jiajun; Alayrac, Jean-Baptiste; Luc, Pauline

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.02297 (cs)

[Submitted on 3 May 2023]

Title:Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Authors:Chuhan Zhang, Antoine Miech, Jiajun Shen, Jean-Baptiste Alayrac, Pauline Luc

View PDF

Abstract:Large-scale visual language models are widely used as pre-trained models and then adapted for various downstream tasks. While humans are known to efficiently learn new tasks from a few examples, deep learning models struggle with adaptation from few examples. In this work, we look into task adaptation in the low-data regime, and provide a thorough study of the existing adaptation methods for generative Visual Language Models. And we show important benefits of self-labelling, i.e. using the model's own predictions to self-improve when having access to a larger number of unlabelled images of the same distribution. Our study demonstrates significant gains using our proposed task adaptation pipeline across a wide range of visual language tasks such as visual classification (ImageNet), visual captioning (COCO), detailed visual captioning (Localised Narratives) and visual question answering (VQAv2).

Comments:	Tech Report
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2305.02297 [cs.CV]
	(or arXiv:2305.02297v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.02297

Submission history

From: Chuhan Zhang [view email]
[v1] Wed, 3 May 2023 17:42:54 UTC (4,660 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Making the Most of What You Have: Adapting Pre-trained Visual Language Models in the Low-data Regime

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators