2402.15017v1 copy

Uploaded by

ben45.pcs

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

2402.15017v1 copy

Uploaded by

ben45.pcs

0% found this document useful (0 votes)

1 views1 page

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

1 views1 page

2402.15017v1 copy

Uploaded by

ben45.pcs

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 1

Search inside document

Published as a conference paper at ICLR 2024

T OWARDS F EW-S HOT A DAPTATION OF F OUNDATION

M ODELS VIA M ULTITASK F INETUNING
Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang
University of Wisconsin-Madison
{zhuoyan.xu,jwei53,fmu2,yin.li}@wisc.edu, {zhmeishi,yliang}@cs.wisc.edu

A BSTRACT
arXiv:2402.15017v1 [cs.LG] 22 Feb 2024

Foundation models have emerged as a powerful tool for many AI problems. De-
spite the tremendous success of foundation models, effective adaptation to new
tasks, particularly those with limited labels, remains an open question and lacks
theoretical understanding. An emerging solution with recent success in vision and
NLP involves finetuning a foundation model on a selection of relevant tasks, before
its adaptation to a target task with limited labeled samples. In this paper, we study
the theoretical justification of this multitask finetuning approach. Our theoretical
analysis reveals that with a diverse set of related tasks, this multitask finetuning
leads to reduced error in the target task, in comparison to directly adapting the same
pretrained model. We quantify the relationship between finetuning tasks and target
tasks by diversity and consistency metrics, and further propose a practical task
selection algorithm. We substantiate our theoretical claims with extensive empirical
evidence. Further, we present results affirming our task selection algorithm adeptly
chooses related finetuning tasks, providing advantages to the model performance
on target tasks. We believe our study shed new light on the effective adaptation of
foundation models to new tasks that lack abundant labels. Our code is available at
https://github.com/OliverXUZY/Foudation-Model_Multitask.

1 I NTRODUCTION

The advent of large-scale deep models trained on massive amounts of data has ushered in a new
era of foundation models (Bommasani et al., 2021). These models, exemplified by large language
models (e.g., BERT (Devlin et al., 2019) and GPT-3 (Brown et al., 2020)) and vision models (e.g.,
CLIP (Radford et al., 2021) and DINOv2 (Oquab et al., 2023)), offer the promise for adapting to a
wide range of downstream tasks, and have led to some of the most exciting developments in AI to
date, including the latest conversational AI — ChatGPT (OpenAI, 2022) and GPT4 (OpenAI, 2023).
Despite encouraging empirical results (Zhang et al., 2020; Brown et al., 2020; Gao et al., 2021a),
the effective adaptation of foundation models, especially to new tasks with limited labels, remains a
practical challenge and lacks theoretical understanding.
In this paper, we focus on the problem of adapting a pretrained foundation model to a new task with a
few labeled samples, where the target task can differ significantly from pretraining and the limited
labeled data are insufficient for finetuning. This few-shot learning problem has been a long-standing
challenge in machine learning (Wang et al., 2020). Prior approaches include learning from examples
in the context prompt (in-context learning) (Brown et al., 2020), constructing simple classifiers
based on the pretrained representation (Zhang et al., 2020), or finetuning the model using text
prompts converted from labeled data (Gao et al., 2021a). An emerging solution involves finetuning a
pretrained model on multiple auxiliary tasks pertaining to the target task. This multitask finetuning
approach, related to meta learning (Hospedales et al., 2021), has been recently explored in NLP and
vision (Murty et al., 2021; Vu et al., 2021; Zhong et al., 2021; Hu et al., 2022b; Chen et al., 2022;
Min et al., 2022a). For example, latest studies (Sanh et al., 2022; Muennighoff et al., 2023) show that
finetuning language models on a large set of tasks enables strong zero-shot generalization on unseen
tasks. Nonetheless, the lack of sound theoretical explanations behind these previous approaches
raises doubts about their ability to generalize on real-world tasks (Perez et al., 2021).

PAP 1
Document54 pages
PAP 1
ben45.pcs
No ratings yet
07 - Induction Networks For Few-Shot Text Classification
Document10 pages
07 - Induction Networks For Few-Shot Text Classification
yexun deng
No ratings yet
Synthetic Data Generation With Large Language Models For Text Classification: Potential and Limitations
Document18 pages
Synthetic Data Generation With Large Language Models For Text Classification: Potential and Limitations
Zara Kolagar
No ratings yet
2312.08365v2
Document39 pages
2312.08365v2
elymolko
No ratings yet
I - E - E R - L: N Context Xploration Xploitation For E Inforcement Earning
Document16 pages
I - E - E R - L: N Context Xploration Xploitation For E Inforcement Earning
Divyesh
No ratings yet
Bridging Language and Items For Retrieval and Recommendation
Document12 pages
Bridging Language and Items For Retrieval and Recommendation
Arthur Reis de Carvalho
No ratings yet
2204.02892v4
Document28 pages
2204.02892v4
MurrayBent
No ratings yet
Investigating Continual Pretraining in Large
Document25 pages
Investigating Continual Pretraining in Large
devspersonal98
No ratings yet
MATCHER: SEGMENT ANYTHING WITH ONE SHOT USING ALL-PURPOSE FEATURE MATCHING
Document22 pages
MATCHER: SEGMENT ANYTHING WITH ONE SHOT USING ALL-PURPOSE FEATURE MATCHING
wchh2000
No ratings yet
Federated Learning With Non-Iid Data 2023
Document9 pages
Federated Learning With Non-Iid Data 2023
mina sadat
No ratings yet
RAFT
Document12 pages
RAFT
austin.routt
No ratings yet
Neural Networks: Dawid Połap, Marcin Woźniak
Document11 pages
Neural Networks: Dawid Połap, Marcin Woźniak
madhusundar
No ratings yet
On Discrete Prompt Optimization or Di Usion Models
Document20 pages
On Discrete Prompt Optimization or Di Usion Models
orzhao
No ratings yet
Localizing Task Information For Improved Model Merging and Compression
Document20 pages
Localizing Task Information For Improved Model Merging and Compression
Marcin Osial
No ratings yet
2023.findings Emnlp.314v2
Document21 pages
2023.findings Emnlp.314v2
bosheng ding
No ratings yet
Effect of Model and Pretraining Scale On Catastrophic Forgetting in Neural Networks
Document33 pages
Effect of Model and Pretraining Scale On Catastrophic Forgetting in Neural Networks
kai lu
No ratings yet
Dinov 2
Document31 pages
Dinov 2
Zi Wei
No ratings yet
When Do Prompting and Prefix-tuning Work?
Document25 pages
When Do Prompting and Prefix-tuning Work?
Trần Khiêm
No ratings yet
On Episodes, Prototypical Networks, and Few-Shot Learning: Steinar Laenen Luca Bertinetto WWW - Five.ai
Document19 pages
On Episodes, Prototypical Networks, and Few-Shot Learning: Steinar Laenen Luca Bertinetto WWW - Five.ai
19D087 SHANKARMAHADEVAN G
No ratings yet
Pre-Training Is A Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
Document8 pages
Pre-Training Is A Hot Topic: Contextualized Document Embeddings Improve Topic Coherence
amriteshwork
No ratings yet
Compositional Semantic Parsing With Large Language Models
Document59 pages
Compositional Semantic Parsing With Large Language Models
Nilay Jain
No ratings yet
Learning To Generate Reviews and Discovering Sentiment
Document9 pages
Learning To Generate Reviews and Discovering Sentiment
Furio Ruggiero
No ratings yet
Automatic Model Selection With Large Language Models For Reasoning
Document19 pages
Automatic Model Selection With Large Language Models For Reasoning
louiswirja
No ratings yet
Inference Efficiency by Learning Task Complexity
Document9 pages
Inference Efficiency by Learning Task Complexity
abhinavgcpandey30
No ratings yet
Hu2023-PredictingEmergentAbilitiesWithInfiniteResolutionEvaluation
Document21 pages
Hu2023-PredictingEmergentAbilitiesWithInfiniteResolutionEvaluation
goodhellow1
No ratings yet
From Quantity To Quality - Boosting LLM Performance With Self-Guided Data Selection For Instruction Tuning
Document34 pages
From Quantity To Quality - Boosting LLM Performance With Self-Guided Data Selection For Instruction Tuning
bhavishya mittal
No ratings yet
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
Document15 pages
ACL - 2022 - Yanzhe Zhang - Continual Sequence Generation With Adaptive Compositional Modules
Yun Zhou
No ratings yet
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
Document8 pages
Glam: Fine-Tuning Large Language Models For Domain Knowledge Graph Alignment Via Neighborhood Partitioning and Generative Subgraph Encoding
JOHN
No ratings yet
Fnins 18 1349204
Document10 pages
Fnins 18 1349204
23521441
No ratings yet
Wavelets Meet Large Language Models
Document16 pages
Wavelets Meet Large Language Models
Avinash Reddy
No ratings yet
RA-ISF: Learning To Answer and Understand From Retrieval Augmentation Via Iterative Self-Feedback
Document15 pages
RA-ISF: Learning To Answer and Understand From Retrieval Augmentation Via Iterative Self-Feedback
Divyesh
No ratings yet
Better Distractions: Transformer-Based Distractor Generation and Multiple Choice Question Filtering
Document10 pages
Better Distractions: Transformer-Based Distractor Generation and Multiple Choice Question Filtering
Ayşegül Gündüz
No ratings yet
Ulti Modal Atent Iffusion
Document40 pages
Ulti Modal Atent Iffusion
pranaypromo
No ratings yet
Impact Robotic
Document21 pages
Impact Robotic
Gabriel
No ratings yet
Multilingual ICL
Document30 pages
Multilingual ICL
hengyuan zhang
No ratings yet
Why Tree Based Method
Document14 pages
Why Tree Based Method
xawad13
No ratings yet
Finecopsref
Document19 pages
Finecopsref
Sam D
No ratings yet
An Analysis of Graph Convolutional Networks and Recent Datasets For Visual Question Answering
Document24 pages
An Analysis of Graph Convolutional Networks and Recent Datasets For Visual Question Answering
vita
No ratings yet
2310.19792v1
Document27 pages
2310.19792v1
Aarya Pakhale
No ratings yet
2021 emnlp-main 92论文
Document14 pages
2021 emnlp-main 92论文
aslina akwa
No ratings yet
When Factorization Meets Argumentation: Towards Argumentative Explanations
Document11 pages
When Factorization Meets Argumentation: Towards Argumentative Explanations
houtatsu16
No ratings yet
2110.04366v3
Document15 pages
2110.04366v3
yjliu06
No ratings yet
Summarization and Visualization of Files based on Genai
Document5 pages
Summarization and Visualization of Files based on Genai
International Journal of Innovative Science and Research Technology
No ratings yet
Transformers Can Do Bayesian Inference
Document23 pages
Transformers Can Do Bayesian Inference
mock7ee
No ratings yet
2023.matching-1.1
Document13 pages
2023.matching-1.1
ksimardeep4991
No ratings yet
Towards The Use of Pretrained Language Models For Task-Oriented Dialogue Systems
Document8 pages
Towards The Use of Pretrained Language Models For Task-Oriented Dialogue Systems
yadavraje
No ratings yet
Multi-Period Portfolio Optimization Using A Deep Reinforcement Learning Hyper-Heuristic Approach
Document11 pages
Multi-Period Portfolio Optimization Using A Deep Reinforcement Learning Hyper-Heuristic Approach
nakranitirth7
No ratings yet
D R V K B: Ifferentiable Easoning Over A Irtual Nowledge ASE
Document16 pages
D R V K B: Ifferentiable Easoning Over A Irtual Nowledge ASE
Mihai Ilie
No ratings yet
Deng 等 - 2021 - TAG Gradient Attack on Transformer-based Language
Document11 pages
Deng 等 - 2021 - TAG Gradient Attack on Transformer-based Language
zikangding92
No ratings yet
2206.04282v1
Document56 pages
2206.04282v1
22520398
No ratings yet
Building Math Agents With Multi-Turn Iterative
Document41 pages
Building Math Agents With Multi-Turn Iterative
mateus.sousa
No ratings yet
Cross-Modal Fine-Tuning: Align Then Refine
Document26 pages
Cross-Modal Fine-Tuning: Align Then Refine
paraprovarvainas
No ratings yet
Method: Research
Document28 pages
Method: Research
vaishnaviguntupalli
No ratings yet
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
Document18 pages
Guiding Large Language Models With Divide-and-Conquer Program For Discerning Problem Solving
crimsonprinceblood
No ratings yet
W Pg#s
Document41 pages
W Pg#s
slaai.lk
No ratings yet
2308.06911v3 2
Document14 pages
2308.06911v3 2
Xaocyc hosseini
No ratings yet
Word Embeddings Paper
Document7 pages
Word Embeddings Paper
Stefanie Muroya
No ratings yet
Transfer Learning
Document14 pages
Transfer Learning
zahraallahyari.sap
No ratings yet
Murag: Multimodal Retrieval-Augmented Generator For Open Question Answering Over Images and Text
Document13 pages
Murag: Multimodal Retrieval-Augmented Generator For Open Question Answering Over Images and Text
Brandon Koh
No ratings yet
Deep Learning with Python. Part 2
From Everand
Deep Learning with Python. Part 2
Simon Winston
No ratings yet
The Theoretical Reformer: On Husserl's Plato (Husserl and The Figures of The History of Philosophy)
Document17 pages
The Theoretical Reformer: On Husserl's Plato (Husserl and The Figures of The History of Philosophy)
Tayronizando
No ratings yet
Letter of Intention February 22 2018
Document3 pages
Letter of Intention February 22 2018
Sebastián Posada
No ratings yet
19 Hazard Psychosocial PDF
Document45 pages
19 Hazard Psychosocial PDF
amirq4
No ratings yet
Vietnam's National Foreign Language 2020 Projet
Document3 pages
Vietnam's National Foreign Language 2020 Projet
Phạm Tài
100% (1)
Lesson Plan - Grade 2 - Hobbies
Document9 pages
Lesson Plan - Grade 2 - Hobbies
irfan sidiq
No ratings yet
DLP - PE Locomotor Skills
Document4 pages
DLP - PE Locomotor Skills
Lyka Matias Borja
No ratings yet
English Language Year 6
Document21 pages
English Language Year 6
Norul Ba'ayah
No ratings yet
How To Create
Document35 pages
How To Create
Sumawijaya
No ratings yet
Why Study Statistics PDF
Document3 pages
Why Study Statistics PDF
Castor
No ratings yet
Final Exam
Document3 pages
Final Exam
Jenelin Enero
No ratings yet
Assure Model
Document33 pages
Assure Model
Karlyn Ramos
100% (1)
7004 Assignment
Document3 pages
7004 Assignment
Shirin Soni
0% (2)
Kontrak Latihan Murid Tahun 4
Document2 pages
Kontrak Latihan Murid Tahun 4
Daud Bin Hashim
100% (3)
Cagayan State University1
Document5 pages
Cagayan State University1
nnnn hhhh
No ratings yet
An Analysis On Word Search in Conversation Between Non-Native Speaker and Native Speaker of English
Document13 pages
An Analysis On Word Search in Conversation Between Non-Native Speaker and Native Speaker of English
Shierly Diana Da Costa
No ratings yet
Fill in The Table Below With Information From The Passage
Document2 pages
Fill in The Table Below With Information From The Passage
Mercedes Manavella
No ratings yet
JH Dimataling - Offsite Class 2nd SEMESTER 2021 IT 105 Prelim Examination
Document2 pages
JH Dimataling - Offsite Class 2nd SEMESTER 2021 IT 105 Prelim Examination
Reynold Tanlangit
No ratings yet
Semi-Detailed Lesson Plan
Document2 pages
Semi-Detailed Lesson Plan
Jisbert Pablo Ampo
100% (1)
# How To Brief A Minister
Document3 pages
# How To Brief A Minister
Bayu Ardhiansyah
No ratings yet
Business Impact Analysis
Document18 pages
Business Impact Analysis
Lionel Pinuer
No ratings yet
Learningstyles Adetailedliteraturereview
Document10 pages
Learningstyles Adetailedliteraturereview
Nur Aliyah Aisyah
No ratings yet
Portfolio Ag2
Document5 pages
Portfolio Ag2
Alison Apolaya
No ratings yet
Intrapersonal and Self Concept
Document3 pages
Intrapersonal and Self Concept
Mikasa Ashi
No ratings yet
Lesson 3 - Types of Speech PDF
Document18 pages
Lesson 3 - Types of Speech PDF
Jan Allen Lumiqued
No ratings yet
Performance Task Second Quarter
Document2 pages
Performance Task Second Quarter
Rhina May
No ratings yet
Difference BW Leader and Manager
Document2 pages
Difference BW Leader and Manager
Saud Khan
No ratings yet
Writing Processs
Document2 pages
Writing Processs
marcjelo blas
No ratings yet
EFL For Subnivel Superior of EGB Ok
Document84 pages
EFL For Subnivel Superior of EGB Ok
PatricioAndinoValles
No ratings yet
The Reading Teacher - 2017 - Roehling - Text Structure Strategies For Improving Expository Reading Comprehension
Document12 pages
The Reading Teacher - 2017 - Roehling - Text Structure Strategies For Improving Expository Reading Comprehension
evanrinaldi15
No ratings yet
Group Discussion
Document24 pages
Group Discussion
himadrigcc2018
No ratings yet