Lecun 20240124 Uw Lyttle
Lecun 20240124 Uw Lyttle
Lecun 20240124 Uw Lyttle
Objective-Driven AI
Towards AI systems that can learn,
remember, reason, plan,
have common sense,
yet are steerable and safe
Yann LeCun
New York University
Meta – Fundamental AI Research
University of Washington
Lytle Lecture
2024-01-24
M-51, HSO
Y. LeCun
Smart glasses
Communicates through voice, vision, display,
electro-myogram interfaces (EMG)
Intelligent Asistant
Can answer all of our questions
Helps us in our daily lives
“Her”
Knows our preferences and interests (2013)
Objective-Driven AI Architecture
Self-Supervised Learning
has taken over the world
Learned
representation
Corruption
masking
SeamlessM4T
Speech or text input: 100 languages
Text output: 100 languages
Speech output: 35 languages
Seamless Expressive: real-time, preserves voice & expression
https://ai.meta.com/blog/seamless-m4t/
Y. LeCun
Of the violating content we actioned for hate speech, how much did
we find and action before people reported it?
https://transparency.fb.com/reports/community-standards-enforcement/hate-speech/facebook/
95.6%
23.6%
Y. LeCun
Protein Generation
[Lin et al. 2021]
Protein Design:
from 3D structure to
sequences of amino acids
For drug design
[Lin & al. BioRxiv:2022.07.20.500902]
Y. LeCun
Stochastic
Encoder
Predictor
Stochastic
Encoder
Predictor
Context
Y. LeCun
Llama-2: https://ai.meta.com/llama/
Open source code / free & open models / can be used commercially
Available on Azure, AWS, HuggingFace,….
Y. LeCun
ArXiv:2301.06627 ArXiv:2206.10498
Y. LeCun
Human child
16,000 wake hours in the first 4 years (30 minutes of YouTube uploads)
2 million optical nerve fibers, carrying about 10 bytes/sec each.
Data volume: 1.1E15 bytes
A four year-old child has seen 50 times more data than an LLM !
Y. LeCun
Actor
Find optimal action sequences action
Short-Term Memory
Stores state-cost episodes percept
Y. LeCun
Objective-Driven AI
Perception: Computes an abstract representation of the state of the world
Possibly combined with previously-acquired information in memory
World Model: Predict the state resulting from an imagined action sequence
Task Objective: Measures divergence to goal
Guardrail Objective: Immutable objective terms that ensure safety
Operation: Finds an action sequence that minimizes the objectives
Guardrail
Objective
memory
Task
Perception World Model Objective
Initial World state Predicted state
representation Sequence
representation
Action
Sequence
Y. LeCun
Guardrail Guardrail
Costs Costs
Task
Perception World Model World Model Cost
World state Predicted state Final state
representation representation representation
action0 action1
Y. LeCun
Guardrail Guardrail
Latent Costs Latent
Costs
Task
Perception World Model World Model Cost
World state Predicted state Final state
representation representation representation
action0 action1
Y. LeCun
Guardrail2 z1 Guardrail2 z1
Task
Enc1(x) Pred1 Pred1
Objective
s1initial s1
s0 initial s0
a0 a1
Y. LeCun
Distance
Enc1(x) Pred1 Pred1
To Paris
At NYU s1
hail or call?
Obstacles? Traffic?
z0 Guardrail1 z0 Guardrail1 Distance
To airport
s0 initial s0 final
a0 a0
How could Machines
Learn World Models
from Sensory Input?
with
Self-Supervised Learning
Y. LeCun
pointing
How
Social could machines learn like animals and humans?
helping vs false perceptual
Communication hindering beliefs
biological
motion
gravity, inertia
Physics stability, conservation of
support momentum
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
proto-imitation
crawling walking
emotional contagion
Masking,
Action
My solution: Joint-Embedding
Predictive Architecture
Transformation,
Action
Y. LeCun
a) Joint Embedding Architecture (JEA) b) Deterministic Joint Embedding c) Joint Embedding Predictive
Examples: Siamese Net, Pirl, MoCo, Predictive Architecture (DJEPA) Architecture (JEPA)
SimCLR, BarlowTwins, VICReg, Examples: BYOL, VICRegL, I-JEPA Examples: Equivariant VICReg
I-JEPA…..
Y. LeCun
x y
time or space →
x
Y. LeCun
Contrastive methods y
Push down on energy of Contrastive
samples
Recommendations:
Variance:
Maintains variance of
components of
representations
Invariance:
Minimizes prediction
error.
Barlow Twins [Zbontar et al. ArXiv:2103.03230], VICReg [Bardes, Ponce, LeCun arXiv:2105.04906, ICLR 2022],
VICRegL [Bardes et al. NeurIPS 2022], MCR2 [Yu et al. NeurIPS 2020][Ma, Tsao, Shum, 2022]
Y. LeCun
Variance:
Maintains variance of
components of
representations
Covariance:
Decorrelates
components of
covariance matrix of
representations
Invariance:
Minimizes prediction
error.
Barlow Twins [Zbontar et al. ArXiv:2103.03230], VICReg [Bardes, Ponce, LeCun arXiv:2105.04906, ICLR 2022],
VICRegL [Bardes et al. NeurIPS 2022], MCR2 [Yu et al. NeurIPS 2020][Ma, Tsao, Shum, 2022]
Y. LeCun
Variance:
Maintains variance of
components of
representations
Covariance:
Decorrelates
components of
covariance matrix of
representations
Invariance:
Minimizes prediction
error.
Barlow Twins [Zbontar et al. ArXiv:2103.03230], VICReg [Bardes, Ponce, LeCun arXiv:2105.04906, ICLR 2022],
VICRegL [Bardes et al. NeurIPS 2022], MCR2 [Yu et al. NeurIPS 2020][Ma, Tsao, Shum, 2022]
Y. LeCun
hx hy d=2048 hx
x y x label
“polar bear”
Y. LeCun
EMA
hx w hy
SSL by distillation
cross-ent
classify quantize
Y. LeCun
DINOv2
DINOv2
I-JEPA Results
Training is fast
Non-generative method
beat reconstruction-
based generative
methods such as
Masked Auto-Encoder
(with a frozen trunk).
Y. LeCun
Problems to Solve
Points
Computing power
AR-LLM use a fixed amount of computation per token
Objective-Driven AI is Turing complete (inference == optimization)
We are still missing essential concepts to reach human-level AI
Scaling up auto-regressive LLMs will not take us there
We need machines to learn how the world works
Learning World Models with Self-Supervised Learning and JEPA
Non-generative architecture, predicts in representation space
Objective-Driven AI Architectures
Can plan their answers
Must satisfy objectives: are steerable & controllable
Guardrail objectives can make them safe by construction.
Y. LeCun
Questions
How long is it going to take to reach human-level AI?
Years to decades. Many problems to solve on the way.
Before we get to HLAI, we will get to cat-level AI, dog-level AI,...
What is AGI?
There is no such thing. Intelligence is highly multidimensional
Intelligence is a collection of skills + ability to learn new skills quickly
Even humans can only accomplish a tiny subset of all tasks
Will machines surpass human intelligence
Yes, they already do in some narrow domains.
There is no question that machine will eventually surpass human
intelligence in all domains where humans are intelligent (and more)
Y. LeCun
Questions
Are there short-term risks associated with powerful AI?
Yes, as with every technology.
Disinformation, propaganda, hate, spam,...: AI is the solution!
Concentration of information sources
All those risks can be mitigated
Are there long-term risks with (super-)human-level AI?
Robots will not take over the world! a mistaken projection of human nature on machines
Intelligence is not correlated with a desire to dominate, even in humans
Objective-Driven AI systems will be made subservient to humans
AI will not be a “species” competing with us.
We will design its goals and guardrails.
Y. LeCun
Questions
How to solve the alignment problem?
Through trial and error and testing in sand-boxed systems
We are very familiar with designing objectives for human and
superhuman entities. It’s called law making.
What if bad people get their hand on on powerful AI?
Their Evil AI will be taken down by the Good Guys’ AI police.
What are the benefits of human-level AI?
AI will amplify human intelligence, progress will accelerate
As if everyone had a super-smart staff working for them
The effect on society may be as profound as the printing press
By amplifying human intelligence, AI will bring a new
era of enlightenment, a new renaissance for humanity.
Thank
you!