2402.15017v1 copy
2402.15017v1 copy
2402.15017v1 copy
A BSTRACT
arXiv:2402.15017v1 [cs.LG] 22 Feb 2024
Foundation models have emerged as a powerful tool for many AI problems. De-
spite the tremendous success of foundation models, effective adaptation to new
tasks, particularly those with limited labels, remains an open question and lacks
theoretical understanding. An emerging solution with recent success in vision and
NLP involves finetuning a foundation model on a selection of relevant tasks, before
its adaptation to a target task with limited labeled samples. In this paper, we study
the theoretical justification of this multitask finetuning approach. Our theoretical
analysis reveals that with a diverse set of related tasks, this multitask finetuning
leads to reduced error in the target task, in comparison to directly adapting the same
pretrained model. We quantify the relationship between finetuning tasks and target
tasks by diversity and consistency metrics, and further propose a practical task
selection algorithm. We substantiate our theoretical claims with extensive empirical
evidence. Further, we present results affirming our task selection algorithm adeptly
chooses related finetuning tasks, providing advantages to the model performance
on target tasks. We believe our study shed new light on the effective adaptation of
foundation models to new tasks that lack abundant labels. Our code is available at
https://github.com/OliverXUZY/Foudation-Model_Multitask.
1 I NTRODUCTION
The advent of large-scale deep models trained on massive amounts of data has ushered in a new
era of foundation models (Bommasani et al., 2021). These models, exemplified by large language
models (e.g., BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020)) and vision models (e.g.,
CLIP (Radford et al., 2021) and DINOv2 (Oquab et al., 2023)), offer the promise for adapting to a
wide range of downstream tasks, and have led to some of the most exciting developments in AI to
date, including the latest conversational AI — ChatGPT (OpenAI, 2022) and GPT4 (OpenAI, 2023).
Despite encouraging empirical results (Zhang et al., 2020; Brown et al., 2020; Gao et al., 2021a),
the effective adaptation of foundation models, especially to new tasks with limited labels, remains a
practical challenge and lacks theoretical understanding.
In this paper, we focus on the problem of adapting a pretrained foundation model to a new task with a
few labeled samples, where the target task can differ significantly from pretraining and the limited
labeled data are insufficient for finetuning. This few-shot learning problem has been a long-standing
challenge in machine learning (Wang et al., 2020). Prior approaches include learning from examples
in the context prompt (in-context learning) (Brown et al., 2020), constructing simple classifiers
based on the pretrained representation (Zhang et al., 2020), or finetuning the model using text
prompts converted from labeled data (Gao et al., 2021a). An emerging solution involves finetuning a
pretrained model on multiple auxiliary tasks pertaining to the target task. This multitask finetuning
approach, related to meta learning (Hospedales et al., 2021), has been recently explored in NLP and
vision (Murty et al., 2021; Vu et al., 2021; Zhong et al., 2021; Hu et al., 2022b; Chen et al., 2022;
Min et al., 2022a). For example, latest studies (Sanh et al., 2022; Muennighoff et al., 2023) show that
finetuning language models on a large set of tasks enables strong zero-shot generalization on unseen
tasks. Nonetheless, the lack of sound theoretical explanations behind these previous approaches
raises doubts about their ability to generalize on real-world tasks (Perez et al., 2021).