Welcome to Scribd!

0% found this document useful (0 votes)

77 views

MLSys Class LLM Introduction

Uploaded by

The document introduces language models including BERT, GPT, and T5 which use techniques like masked language modeling, causal language modeling, and text-to-text transfer. It discusses how transformer models use attention and self-attention. The document compares BERT and GPT and explains how pretraining, fine-tuning, prompting, and reinforcement learning from human feedback are used. It raises questions about the advantages and disadvantages of different training methods, the role of systems research in scaling language models, security considerations, and improving energy efficiency.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

MLSys Class LLM Introduction

Uploaded by

Ali Elouafiq

0% found this document useful (0 votes)

77 views43 pages

Original Title

MLSys class LLM introduction

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

0% found this document useful (0 votes)

77 views43 pages

MLSys Class LLM Introduction

Uploaded by

Ali Elouafiq

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Download as pdf or txt

Jump to Page

You are on page 1of 43

Search inside document

Introduction to

Language Models
Eve Fleisig & Kayo Yin
CS 294-162
August 28, 2023
Language Modeling

Image credit: jalammar.github.io/illustrated-word2vec/

Masked Language Modeling
BERT

Image credit: jalammar.github.io/illustrated-bert/

Causal Language Modeling
GPT

Image credit: jalammar.github.io/illustrated-gpt2/

BERT vs. GPT

● Bidirectional encoder models (BERT) do better than generative models at

non-generation tasks, for comparable training data/model complexity.

● Generative models (GPT) have training efficiency and scalability advantages

that may make them ultimately more accurate. They can also solve
downstream tasks in a zero-shot setting.
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer

Image credit: jalammar.github.io/illustrated-transformer/ v

Attention
Self-Attention
Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention

Image credit: jalammar.github.io/illustrated-gpt2/

Self-Attention
Self-Attention
Self-Attention
Self-Attention
Multi-headed Attention
Multi-headed Attention
Transformer

Image credit: jalammar.github.io/illustrated-transformer/

Transformer Input
Transformer Encoder

Image credit: jalammar.github.io/illustrated-transformer/

Adding the Decoder

Image credit: jalammar.github.io/illustrated-transformer/

BERT

Image credit: jalammar.github.io/illustrated-bert/

BERT
GPT
GPT
T5

Text-to-Text Transfer Transformer

Pretraining & Fine-tuning
Pretraining & Fine-tuning
Pretraining & Fine-tuning

Unsupervised objective

Supervised objective
Prefixes & Prompting
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning
Few- & Zero-Shot Learning

Generalization to new tasks without fine-tuning enabled by:

Scaling
Data Compute
Scaling Data
Common Crawl dataset: introduced with T5; still in use
GPT-3 Training Data:
Scaling Data & Compute

Kaplan et al., 2020;

Hoffmann et al., 2022
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback
Discussion
● What are the advantages and disadvantages of different training or tuning methods
that have been tried (task-specific training, pretrain/fine-tune, prompting, RLHF)?
● What is the role of systems research in scaling up LLMs? How could advances in
systems research change scaling “laws”?
● What security considerations do we need to consider when deploying LLMs into the
real world?
● How can we improve the energy efficiency and carbon footprint of LLMs?

D1S1 - Intro to ML.pptx
Document39 pages
D1S1 - Intro to ML.pptx
THOR GAMING
No ratings yet
From GPT2AutoGPT
Document12 pages
From GPT2AutoGPT
emilyma630
No ratings yet
ChatGPT KZ Feb2023 PDF
Document7 pages
ChatGPT KZ Feb2023 PDF
samuel asefa
No ratings yet
GPT 4
Document98 pages
GPT 4
Gary Joel Pimentel Rosario
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
Document25 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
bigdatateacher
No ratings yet
Code Generation With LLMs
Document59 pages
Code Generation With LLMs
algorithmicaindia1
No ratings yet
Generativeaiatato1 230816133559 Eb737e87
Document34 pages
Generativeaiatato1 230816133559 Eb737e87
ASR PANDEY
No ratings yet
Machine Learning
Document13 pages
Machine Learning
isacmartins.1225
No ratings yet
GPT-4 Vs GPT-35 A Concise Showdown
Document6 pages
GPT-4 Vs GPT-35 A Concise Showdown
kirinNarak
No ratings yet
Untitled
Document202 pages
Untitled
foratação pcenootebok
No ratings yet
Gpfsworkshop2010 Tutorial v17 2
Document381 pages
Gpfsworkshop2010 Tutorial v17 2
atretau
No ratings yet
LLM Learning
Document56 pages
LLM Learning
13910235173
No ratings yet
GPT 4
Document99 pages
GPT 4
Ang Andrew
No ratings yet
Training Language Models To Follow Instructions With Human Feedback
Document68 pages
Training Language Models To Follow Instructions With Human Feedback
Luka Savic
No ratings yet
Building Your Own GPT: A Step-by-Step Guide to Creating Custom AI Models
From Everand
Building Your Own GPT: A Step-by-Step Guide to Creating Custom AI Models
Peter Lengyel
No ratings yet
6 ZeroShot HyperParam Tuning
Document14 pages
6 ZeroShot HyperParam Tuning
Ygor Rebouças Serpa
No ratings yet
How To Make Custom AI-Generated Text With GPT-2
Document3 pages
How To Make Custom AI-Generated Text With GPT-2
zikit.ben.david
No ratings yet
BERT
Document1 page
BERT
ice queen
No ratings yet
Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3
From Everand
Transformers for Natural Language Processing and Computer Vision: Explore Generative AI and Large Language Models with Hugging Face, ChatGPT, GPT-4V, and DALL-E 3
Denis Rothman
No ratings yet
Plagiarism
Document17 pages
Plagiarism
jack
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
Document186 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
Rutvik Acharya
No ratings yet
Training Language Models To Follow Instructions
Document15 pages
Training Language Models To Follow Instructions
maniacmusic
No ratings yet
BGSG Document
Document12 pages
BGSG Document
Logesh Kumar S
No ratings yet
Transformer, BERT, and GPT: Unlock the Power of Transformers, BERT, GPT-3, and GPT-4 in Natural Language Processing
From Everand
Transformer, BERT, and GPT: Unlock the Power of Transformers, BERT, GPT-3, and GPT-4 in Natural Language Processing
Mercury Learning and Information
No ratings yet
How To Calculate Capability For Positional Tolerance Like A Hole Position - Searching For A Generic Solution - THX - LinkedIn
Document2 pages
How To Calculate Capability For Positional Tolerance Like A Hole Position - Searching For A Generic Solution - THX - LinkedIn
Lokesh Narasimhaiah
No ratings yet
fine-tuning-llama-3-on-AMD-radeon-gpus
Document27 pages
fine-tuning-llama-3-on-AMD-radeon-gpus
iiiasrcloud
No ratings yet
AI Learning Content Limitations
Document53 pages
AI Learning Content Limitations
harinder752787864
No ratings yet
EMNLP 2020 Tutorial High Performance NLP
Document274 pages
EMNLP 2020 Tutorial High Performance NLP
刘江
No ratings yet
Bionic GPTs Training
Document37 pages
Bionic GPTs Training
solomonhaddo
No ratings yet
Sujets PFE NYUAD
Document3 pages
Sujets PFE NYUAD
chakibking66
No ratings yet
ChatGPT, LLM and RLHF
Document45 pages
ChatGPT, LLM and RLHF
Ailed De La Cruz Paez
No ratings yet
Master Thesis Genetic Algorithm
Document6 pages
Master Thesis Genetic Algorithm
alissacruzomaha
100% (2)
JISHNU Seminar Report Draft
Document16 pages
JISHNU Seminar Report Draft
Abhin As
No ratings yet
TARP Report
Document18 pages
TARP Report
dragonnishanth
No ratings yet
SGPT: GPT Sentence Embeddings For Semantic Search: Preprint. Under Review
Document19 pages
SGPT: GPT Sentence Embeddings For Semantic Search: Preprint. Under Review
Susan George
No ratings yet
Technical Seminar
Document21 pages
Technical Seminar
Deepak Gowda
100% (1)
Parameter Reference
Document7,270 pages
Parameter Reference
Enio Laguardia
No ratings yet
Albert: A L Bert S - L L R: ITE FOR ELF Supervised Earning of Anguage Epresentations
Document16 pages
Albert: A L Bert S - L L R: ITE FOR ELF Supervised Earning of Anguage Epresentations
Aks123
No ratings yet
Fuzzy Model For Optimizing Strategic Decisions Using Matlab
Document13 pages
Fuzzy Model For Optimizing Strategic Decisions Using Matlab
Research Cell: An International Journal of Engineering Sciences
No ratings yet
BERT Vs GPT Models - Differences, Examples
Document12 pages
BERT Vs GPT Models - Differences, Examples
Shiv Shankar Dutta
No ratings yet
Paper 006
Document18 pages
Paper 006
m.hajihosseini95
No ratings yet
Trends in Personalized Video Recommendations
Document46 pages
Trends in Personalized Video Recommendations
ameydhar
No ratings yet
Tianzheng Troy Wang CIS498EAS499 Submission
Document51 pages
Tianzheng Troy Wang CIS498EAS499 Submission
dan_1967
No ratings yet
A Simple Guide On Using BERT For Binary Text Classification
Document18 pages
A Simple Guide On Using BERT For Binary Text Classification
sita devi
No ratings yet
210 Icmlpaper
Document8 pages
210 Icmlpaper
rahman.0466
No ratings yet
Introducing Decision Transformers On Hugging Face ?
Document12 pages
Introducing Decision Transformers On Hugging Face ?
minfuel
No ratings yet
GPT-4 Capabilities and Limitations
Document8 pages
GPT-4 Capabilities and Limitations
Arjun T
No ratings yet
Mastering Reinforcement Learning With Pyth - Enes Bilgin
Document426 pages
Mastering Reinforcement Learning With Pyth - Enes Bilgin
Nguyen Duc Anh
100% (2)
Disha Chat Bot
Document15 pages
Disha Chat Bot
no819154
No ratings yet
LeNgiCoaLahProNg11 PDF
Document8 pages
LeNgiCoaLahProNg11 PDF
Jorge Leandro
No ratings yet
The Flan Collection
Document22 pages
The Flan Collection
yog54Origin
No ratings yet
FP8 LM
Document23 pages
FP8 LM
jeremysun1224
No ratings yet
Thesis Certificate
Document4 pages
Thesis Certificate
tammylacyarlington
100% (2)
Experiment 9
Document4 pages
Experiment 9
SAYYAM
No ratings yet
GenAI Vfinal 13oct2023
Document7 pages
GenAI Vfinal 13oct2023
PRADYUMNA BEHERA
No ratings yet
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
From Everand
Unveiling the Secrets of ChatGPT Inside the Mind of an AI
Nelson Ambrose
No ratings yet
Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more
From Everand
Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more
Denis Rothman
No ratings yet
How to use ChatGPT
From Everand
How to use ChatGPT
Bernhard Gaum
No ratings yet
The Art of Writing Efficient Programs: An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples
From Everand
The Art of Writing Efficient Programs: An advanced programmer's guide to efficient hardware utilization and compiler optimizations using C++ examples
Fedor G. Pikus
Rating: 4 out of 5 stars
4/5 (1)
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
From Everand
Accelerate Model Training with PyTorch 2.X: Build more accurate models by boosting the model training process
Maicon Melo Alves
No ratings yet
ISO 2812-1 2007 (E) - Character PDF Document
Document12 pages
ISO 2812-1 2007 (E) - Character PDF Document
Guritno Gustianto
100% (2)
A Disneyland Dilemma
Document22 pages
A Disneyland Dilemma
Nitin Sharma
No ratings yet
Blockchain Design Thinking 4
Document7 pages
Blockchain Design Thinking 4
Dipak Nandeshwar
No ratings yet
Brilliant: Iit/Aiims - 2021 Screening Cum Scholarship Exam
Document14 pages
Brilliant: Iit/Aiims - 2021 Screening Cum Scholarship Exam
rizan amani
100% (1)
Basics of Electronic Engineering: Junction Field Effect Transistor (JFET)
Document4 pages
Basics of Electronic Engineering: Junction Field Effect Transistor (JFET)
Esty Yusty
No ratings yet
Dec50103 PW1
Document13 pages
Dec50103 PW1
Lenoil
No ratings yet
Nested Quantifiers
Document6 pages
Nested Quantifiers
1170 Ananya Sutradhar
No ratings yet
Three-Dimensional Imaging by Deconvolution Microsc
Document14 pages
Three-Dimensional Imaging by Deconvolution Microsc
aaasim93
No ratings yet
KZN Sept 2021 (p1) Memo
Document10 pages
KZN Sept 2021 (p1) Memo
xanita.de
No ratings yet
Steve Vai
Document47 pages
Steve Vai
Marco Tureta
No ratings yet
Pig Launcher and Receiver Trap
Document1 page
Pig Launcher and Receiver Trap
Jaroslaw Konieczny
No ratings yet
Reductor Excentrico y Concetrico
Document1 page
Reductor Excentrico y Concetrico
Roberto
No ratings yet
IIR Filters in Matlab
Document4 pages
IIR Filters in Matlab
sidhuhere
No ratings yet
Proctor Compaction Test
Document5 pages
Proctor Compaction Test
sanduni
90% (31)
Angular Motion
Document7 pages
Angular Motion
Arslan Aslam
No ratings yet
Practical Examination 2020 Ip Set 1
Document3 pages
Practical Examination 2020 Ip Set 1
sofia gupta
100% (1)
Iso 7708
Document14 pages
Iso 7708
Victor Martinez Martinez
No ratings yet
Si Ge Photodiodes
Document305 pages
Si Ge Photodiodes
loading01
No ratings yet
Regular Expression
Document17 pages
Regular Expression
Khushbul Alam
No ratings yet
Punching Shear Check
Document4 pages
Punching Shear Check
joeilagan
No ratings yet
Mid Term Umt
Document4 pages
Mid Term Umt
fazalulbasit9796
No ratings yet
Stereotactic Treatment Definitions and Literature
Document66 pages
Stereotactic Treatment Definitions and Literature
Omkar Kongari
No ratings yet
Configuring The Metadata Card (M-Files 2018 and M-Files Online)
Document42 pages
Configuring The Metadata Card (M-Files 2018 and M-Files Online)
Thomas Saréa
No ratings yet
RMCCU-2022 B.Sc. (Honours) Mathematics Semester-4 Paper-CC-9 QP
Document4 pages
RMCCU-2022 B.Sc. (Honours) Mathematics Semester-4 Paper-CC-9 QP
facto Gamer
No ratings yet
Equilibrium - Part 2
Document29 pages
Equilibrium - Part 2
Akashdeep Singh
No ratings yet
Genetic Algorithm Solution of The TSP Avoiding Special Crossover and Mutation
Document6 pages
Genetic Algorithm Solution of The TSP Avoiding Special Crossover and Mutation
Muhammad Fadzreen
No ratings yet
SPM Algorithms
Document8 pages
SPM Algorithms
Tamil Selvi
No ratings yet
Multi-Container Pods and Container Communication in Kubernetes
Document20 pages
Multi-Container Pods and Container Communication in Kubernetes
Bamboi Andrei
No ratings yet
Alphabet & Number Chart (For Competitive Exams)
Document2 pages
Alphabet & Number Chart (For Competitive Exams)
Jay Jones
No ratings yet
Functional Requirements of 8 Transportation Systems in A Building.
Document40 pages
Functional Requirements of 8 Transportation Systems in A Building.
Umar Faruq Afoke
No ratings yet