Lec00 Intro For Web Highlighted

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 72

CS5670: Intro to Computer Vision

(Cornell Tech)
Depth from a single image
Visualizing scenes from tourist
photos
Reconstructing dynamic 3D
scenes

DynIBaR: Neural Dynamic Image-Based Rendering [


https://dynibar.github.io/]
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely
CVPR 2023
Today
1. What is computer vision?

2. Why study computer vision?

3. Course overview

4. Images & image filtering [time permitting]


Today
• Readings
– Szeliski, Chapter 1 (Introduction)
Every image tells a story
• Goal of computer vision:
perceive the “story”
behind the picture
• Compute properties of
the world
– 3D shape
– Names of people or
objects
– What happened?
The goal of computer vision
Can computers match human perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are better at
“hard” things

• But huge progress


– Accelerating in the last five
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its shortcomings

https://twitter.com/pickover/status/
1460275132958662657/
But humans can tell a lot about a scene
from a little information…

Source: “80 million tiny images” by Torralba, et al.


The goal of computer vision
The goal of computer vision
• Compute the 3D shape of the world

ZED 2i Camera
The goal of computer vision
• Recognize objects and people

Terminator 2, 1991
slide credit: Fei-Fei, Fergus & Torralba
sky
building

flag

face
banner
wall
street lamp
bus bus

cars slide credit: Fei-Fei, Fergus & Torralba


The goal of computer vision
• “Enhance” images
The goal of computer vision
• Forensics

Source: Nayar and Nishino, “Eyes for Relighting”


Source: Nayar and Nishino, “Eyes for Relighting”
Source: Nayar and Nishino, “Eyes for Relighting”
The goal of computer vision
• Improve photos (“Computational Photography”)

Super-resolution (source:
2d3)

Depth of field on cell phone


camera (source:
Google Research Blog) Removing objects (
Google Magic Erase
Low-light photography r
(credit: Hasinoff et al., SIGGRAPH ASIA 2016 )
)
April 10, 2019
Why study computer vision?
• Billions of images/videos captured per day

• Huge number of potential applications


• The next slides show the current state of
Optical character recognition
(OCR) • If you have a scanner, it probably came with OCR
software

Digit recognition, AT&T labs (1990’s) License plate readers


http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://yann.lecun.com/exdb/lenet/

Sudoku grabber
http://sudokugrab.blogspot.com/

Automatic check processing


Face detection

• Nearly all cameras detect faces in real


time
– (Why?)
Face analysis and recognition
Vision-based biometrics

Who is she? Source: S. Seitz


Vision-based biometrics

“How the Afghan Girl was Identified by Her Iris Patterns” Read
the story

Source: S. Seitz
Login without a password

Fingerprint scanners Face unlock on Apple iPhone X


on many new See also
smartphones and http://www.sensiblevision.com/
other devices
New York Times, Jan. 18, 2020
by Kashmir Hill
Bird identification

Merlin Bird ID (based on Cornell Tech technology!)


Special effects: shape capture

The Matrix movies, ESC Entertainment, XYZRGB, NRC


Source: S. Seitz
Special effects: motion capture

Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz


3D face tracking w/ consumer cameras

Snapchat Lenses

Face2Face system (Thies et


Image synthesis

Karras, et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR
Which face is real?

https://www.whichfaceisreal.com/
Image synthesis

“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in


photorealistic style” – DALL-E 2 Times Square. It is wearing sunglasses and
a beach hat” – Imagen
Sports

Sportvision first down line


Explanation on www.howstuffworks.com

Source: S. Seitz
Smart cars

• Mobileye
• Tesla Autopilot
• Safety features in many cars
Self-driving cars

Waymo
Robotics

NASA’s Mars Curiosity Rover Amazon Picking Challenge


https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events/amazon-picking-chal
lenge/

Amazon Prime Air Amazon Scout


Medical imaging

3D imaging
(MRI, CT) Skin cancer classification with deep learning
https://cs.stanford.edu/people/esteva/nature/
Virtual & Augmented Reality

6DoF head tracking Hand & body tracking

3D scene understanding 3D-360 video capture


Current state of the art
• You just saw many examples of current systems.
– Many of these are less than 5 years old

• Computer vision is an active research area, and rapidly


changing
– Many new apps in the next 5 years
– Deep learning and generative methods powering many modern
applications

• Many startups across a dizzying array of areas


– Generative AI, robotics, autonomous vehicles, medical
imaging, construction, inspection, VR/AR, …
Why is computer vision difficult?

Viewpoint variation

Credit: Flickr user michaelpaul

Scale
Illumination
Why is computer vision difficult?

Motion (Source: S. Lazebnik)


Intra-class variation

Background clutter Occlusion


Challenges: local ambiguity

slide credit: Fei-Fei, Fergus & Torralba


But there are lots of visual cues we can
use…

Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a given 2D
image

Artist Julian Beever with his anamorphic Coke bottle


– We often must use prior knowledge about the world’s
structure Image source: F. Durand
CS5670: Introduction to Computer Vision

• Project-based course whose goal is to teach you


the basics of computer vision – image processing,
geometry, recognition – in a hands-on way
Course requirements
• Prerequisites
– Data structures
– Good working knowledge of Python programming
– Linear algebra
– Vector calculus

• Course does not assume prior imaging


experience
– computer vision, image processing, graphics, etc.
Course overview
(tentative)
1. Low-level vision
– image processing, edge detection,
feature detection, cameras, image
formation

2. Geometry & appearance


– projective geometry, stereo, structure
from motion, optimization, lighting &
materials

3. Recognition & generative


models
– object classification, deep learning,
1. Low-level vision
• Basic image processing and image formation

* =
Filtering, edge detection

Feature extraction Image formation


Project: Hybrid images
Project: Feature detection and matching
2. Geometry & appearance

Image credit: IDS Imaging

Projective geometry Stereo vision

Multi-view stereo Structure from motion


Project: Creating panoramas
Project: 3D reconstruction
3. Recognition, Deep Learning &
Generative Models

“dog”

Image classification Convolutional Neural Networks

“a class watching a computer vision lecture at Cornell Tech”

Image generation
Project: Neural Radiance Fields
(NeRFs)
Questions?

You might also like