Lec00 Intro For Web Highlighted

CS5670: Intro to Computer Vision
(Cornell Tech)
Depth from a single image
Visualizing scenes from tourist
photos
Reconstructing dynamic 3D
scenes
DynIBaR: Neural Dynamic Image-Based Rendering [

https://dynibar.github.io/]
Zhengqi Li, Qianqian Wang, Forrester Cole, Richard Tucker, Noah Snavely
CVPR 2023
Today
1. What is computer vision?
2. Why study computer vision?
3. Course overview
4. Images & image filtering [time permitting]

Today
• Readings
– Szeliski, Chapter 1 (Introduction)
Every image tells a story
• Goal of computer vision:
perceive the “story”
behind the picture
• Compute properties of
the world
– 3D shape
– Names of people or
objects
– What happened?
The goal of computer vision
Can computers match human perception?
• Yes and no (mainly no)
– computers can be better at
“easy” things
– humans are better at
“hard” things
• But huge progress

– Accelerating in the last five
years due to deep learning
– What is considered “hard”
keeps changing
Human perception has its shortcomings
https://twitter.com/pickover/status/
1460275132958662657/
But humans can tell a lot about a scene
from a little information…
Source: “80 million tiny images” by Torralba, et al.

• Compute the 3D shape of the world
ZED 2i Camera
• Recognize objects and people
Terminator 2, 1991
slide credit: Fei-Fei, Fergus & Torralba
sky
building
flag
face
banner
wall
street lamp
bus bus
cars slide credit: Fei-Fei, Fergus & Torralba

• “Enhance” images
• Forensics
Source: Nayar and Nishino, “Eyes for Relighting”

• Improve photos (“Computational Photography”)
Super-resolution (source:
2d3)
Depth of field on cell phone

camera (source:
Google Research Blog) Removing objects (
Google Magic Erase
Low-light photography r
(credit: Hasinoff et al., SIGGRAPH ASIA 2016 )
)
April 10, 2019
Why study computer vision?
• Billions of images/videos captured per day
• Huge number of potential applications

• The next slides show the current state of
Optical character recognition
(OCR) • If you have a scanner, it probably came with OCR
software
Digit recognition, AT&T labs (1990’s) License plate readers

http://en.wikipedia.org/wiki/Automatic_number_plate_recognition
http://yann.lecun.com/exdb/lenet/
Sudoku grabber
http://sudokugrab.blogspot.com/
Automatic check processing

Face detection
• Nearly all cameras detect faces in real

time
– (Why?)
Face analysis and recognition
Vision-based biometrics
Who is she? Source: S. Seitz

Vision-based biometrics
“How the Afghan Girl was Identified by Her Iris Patterns” Read
the story
Source: S. Seitz
Login without a password
Fingerprint scanners Face unlock on Apple iPhone X

on many new See also
smartphones and http://www.sensiblevision.com/
other devices
New York Times, Jan. 18, 2020
by Kashmir Hill
Bird identification
Merlin Bird ID (based on Cornell Tech technology!)

Special effects: shape capture
The Matrix movies, ESC Entertainment, XYZRGB, NRC

Source: S. Seitz
Special effects: motion capture
Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz

3D face tracking w/ consumer cameras
Snapchat Lenses
Face2Face system (Thies et

Image synthesis
Karras, et al., Progressive Growing of GANs for Improved Quality, Stability, and Variation, ICLR
Which face is real?
https://www.whichfaceisreal.com/
Image synthesis
“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in

photorealistic style” – DALL-E 2 Times Square. It is wearing sunglasses and
a beach hat” – Imagen
Sports
Sportvision first down line

Explanation on www.howstuffworks.com
Source: S. Seitz
Smart cars
• Mobileye
• Tesla Autopilot
• Safety features in many cars
Self-driving cars
Waymo
Robotics
NASA’s Mars Curiosity Rover Amazon Picking Challenge

https://en.wikipedia.org/wiki/Curiosity_(rover) http://www.robocup2016.org/en/events/amazon-picking-chal
lenge/
Amazon Prime Air Amazon Scout

Medical imaging
3D imaging
(MRI, CT) Skin cancer classification with deep learning
https://cs.stanford.edu/people/esteva/nature/
Virtual & Augmented Reality
6DoF head tracking Hand & body tracking
3D scene understanding 3D-360 video capture

Current state of the art
• You just saw many examples of current systems.
– Many of these are less than 5 years old
• Computer vision is an active research area, and rapidly

changing
– Many new apps in the next 5 years
– Deep learning and generative methods powering many modern
applications
• Many startups across a dizzying array of areas

– Generative AI, robotics, autonomous vehicles, medical
imaging, construction, inspection, VR/AR, …
Why is computer vision difficult?
Viewpoint variation
Credit: Flickr user michaelpaul
Scale
Illumination
Why is computer vision difficult?
Motion (Source: S. Lazebnik)

Intra-class variation
Background clutter Occlusion

Challenges: local ambiguity
slide credit: Fei-Fei, Fergus & Torralba

But there are lots of visual cues we can
use…
Source: S. Lazebnik
Bottom line
• Perception is an inherently ambiguous problem
– Many different 3D scenes could have given rise to a given 2D
image
Artist Julian Beever with his anamorphic Coke bottle

– We often must use prior knowledge about the world’s
structure Image source: F. Durand
CS5670: Introduction to Computer Vision
• Project-based course whose goal is to teach you

the basics of computer vision – image processing,
geometry, recognition – in a hands-on way
Course requirements
• Prerequisites
– Data structures
– Good working knowledge of Python programming
– Linear algebra
– Vector calculus
• Course does not assume prior imaging

experience
– computer vision, image processing, graphics, etc.
Course overview
(tentative)
1. Low-level vision
– image processing, edge detection,
feature detection, cameras, image
formation
2. Geometry & appearance

– projective geometry, stereo, structure
from motion, optimization, lighting &
materials
3. Recognition & generative

models
– object classification, deep learning,
1. Low-level vision
• Basic image processing and image formation
* =
Filtering, edge detection
Feature extraction Image formation

Project: Hybrid images
Project: Feature detection and matching
2. Geometry & appearance
Image credit: IDS Imaging
Projective geometry Stereo vision
Multi-view stereo Structure from motion

Project: Creating panoramas
Project: 3D reconstruction
3. Recognition, Deep Learning &
Generative Models
“dog”
Image classification Convolutional Neural Networks
“a class watching a computer vision lecture at Cornell Tech”
Image generation
Project: Neural Radiance Fields
(NeRFs)
Questions?

Lec00 Intro For Web Highlighted

Uploaded by

Copyright:

Available Formats

Lec00 Intro For Web Highlighted

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lec00 Intro For Web Highlighted

Uploaded by

Copyright:

Available Formats

CS5670: Intro to Computer Vision

DynIBaR: Neural Dynamic Image-Based Rendering [

2. Why study computer vision?

4. Images & image filtering [time permitting]

• But huge progress

Source: “80 million tiny images” by Torralba, et al.

cars slide credit: Fei-Fei, Fergus & Torralba

Source: Nayar and Nishino, “Eyes for Relighting”

Depth of field on cell phone

• Huge number of potential applications

Digit recognition, AT&T labs (1990’s) License plate readers

Automatic check processing

• Nearly all cameras detect faces in real

Who is she? Source: S. Seitz

Fingerprint scanners Face unlock on Apple iPhone X

Merlin Bird ID (based on Cornell Tech technology!)

The Matrix movies, ESC Entertainment, XYZRGB, NRC

Pirates of the Carribean, Industrial Light and Magic Source: S. Seitz

Face2Face system (Thies et

“An astronaut riding a horse in a “A photo of a Corgi dog riding a bike in

Sportvision first down line

NASA’s Mars Curiosity Rover Amazon Picking Challenge

Amazon Prime Air Amazon Scout

6DoF head tracking Hand & body tracking

3D scene understanding 3D-360 video capture

• Computer vision is an active research area, and rapidly

• Many startups across a dizzying array of areas

Credit: Flickr user michaelpaul

Motion (Source: S. Lazebnik)

Background clutter Occlusion

slide credit: Fei-Fei, Fergus & Torralba

Artist Julian Beever with his anamorphic Coke bottle

• Project-based course whose goal is to teach you

• Course does not assume prior imaging

2. Geometry & appearance

3. Recognition & generative

Feature extraction Image formation

Image credit: IDS Imaging

Projective geometry Stereo vision

Multi-view stereo Structure from motion

Image classification Convolutional Neural Networks

“a class watching a computer vision lecture at Cornell Tech”

You might also like