Virtual Environments: A Survey of The Technology: Email: Holloway@cs - Unc.edu Lastra@cs - Unc.edu
Virtual Environments: A Survey of The Technology: Email: Holloway@cs - Unc.edu Lastra@cs - Unc.edu
Virtual Environments: A Survey of The Technology: Email: Holloway@cs - Unc.edu Lastra@cs - Unc.edu
TR93-033
Richard Holloway
Anselmo Lastra
email: [email protected]
[email protected]
September 1993†
† These notes were actually prepared in April and May of 1993 for distribution in September at Eurographics.
1. Introduction ..................................................................................... 1
2. How VE Works: Fooling the Senses........................................................ 3
2.1 The Display/Detection Model........................................................... 3
2.2 Perceptual Issues in Matching Displays to Senses................................... 4
3. Displays ......................................................................................... 7
3.1 General.................................................................................... 7
3.2 Visual Displays .......................................................................... 7
3.2.1 Overview........................................................................... 7
3.2.2 How Immersive Displays Work ................................................ 8
3.2.3 Immersive Display Characteristics............................................. 10
3.2.4 Immersive Display Issues and Problems...................................... 11
3.2.5 Desired Display Specifications ................................................. 12
3.2.6 Commercial Systems ............................................................ 12
3.2.7 Research Systems................................................................ 14
3.3 Auditory Displays ...................................................................... 17
3.4 Haptic Displays ......................................................................... 18
4. Image Generation.............................................................................. 21
4.1 Introduction ............................................................................. 21
4.2 Graphics Performance ................................................................. 21
4.2.1 The cost of graphics performance.............................................. 22
4.2.2 Antialiasing ....................................................................... 25
4.3 Illumination Models .................................................................... 25
4.4 Shading .................................................................................. 26
4.5 Graphics Libraries...................................................................... 26
4.5.1 Retained mode.................................................................... 27
4.5.2 Immediate mode.................................................................. 27
4.6 Image Generation Systems ............................................................ 27
4.6.1 Performance Specifications..................................................... 27
4.6.2 Commercial VE Systems........................................................ 28
4.6.3 Commercial Image Generators ................................................. 30
4.6.4 Experimental...................................................................... 35
4.7 Special Requirements for VE.......................................................... 38
4.7.1 Input/Output ...................................................................... 38
4.7.2 Shared Virtual Worlds........................................................... 38
4.7.3 Potential z-buffering problems ................................................. 38
4.7.4 Low latency....................................................................... 38
4.7.5 Correction for optical distortion................................................ 38
4.7.6 Video .............................................................................. 39
4.7.7 Combining camera video and synthetic imagery ............................. 39
4.8 Further Reading......................................................................... 39
5. Detectors........................................................................................ 41
5.1 General................................................................................... 41
5.2 Trackers.................................................................................. 41
5.2.1 Principles of Tracking........................................................... 41
5.2.2 Commercial and Research Systems............................................ 42
5.2.2.1 Magnetic Trackers......................................................... 42
5.2.2.2 Optical Trackers ........................................................... 46
5.2.2.3 Mechanical Trackers ...................................................... 48
5.2.2.4 Acoustic Trackers ......................................................... 49
5.3 Other Detectors.......................................................................... 50
6. Acknowledgments............................................................................. 53
7. References...................................................................................... 53
1. Introduction
• Flight simulators: GE (in late 50s!), Singer-Link, Evans & Sutherland, McDonnell
Douglas
“The ultimate display would, of course, be a room within which the computer can
control the existence of matter. A chair displayed in such a room would be good
enough to sit in. Handcuffs displayed in such a room would be confining, and a
bullet displayed in such a room would be fatal.” [Sutherland 65].
• Tom Furness and researchers at Wright-Patterson Air Force Base develop VCASS
Developments in mid-1980s:
Even after all of this progress, the subset of the "ultimate" capabilities that Sutherland
implemented in 1968—stereo images, head and hand tracking—remains prevalent even
today. Typical modern systems are essentially just improved versions of Sutherland’s
system, and are still plagued by the same problems:
3. Tracking of the head and hand has to be determined in real time and with considerable
accuracy.
This tutorial will begin with an overall model for virtual environments and will therefore
touch on all of the associated technologies, but will then focus on the technology associated
with the three problems listed above.
• The real environment is perceived by our senses; our actions change our viewpoint in
the environment and cause changes in it.
There are therefore two essential tasks: Display (presenting the alternative stimuli) and
detection (monitoring user actions). Another way of looking at these two processes is as a
user-centered input/output model: Displays provide sensory input to the user, and the
detectors sense any user actions, which can be thought of as outputs.
1 Mort Heilig’s Sensorama is the only system of which we are aware that has olfactory display.
2 Haptics- “Pertaining to sensations such as touch, temperature, pressure, etc. mediated by skin, muscle,
tendon, or joint.” -Webster’s New International Dictionary
3 Touch- “The special sense by which pressure or traction exerted on the skin or mucous membrane is
perceived.” - Webster’s 7th Dictionary
In addition to the actions listed in Table 2.2, one can imagine a host of other attributes or
actions that future systems may monitor: facial expression, heart rate, blood pressure, etc.
The key idea is that there are two types of tasks in a VE system: display and detection, and
a VE system must have components for each.
Visual
• Visual sensory organs are localized at the eyes, so displays can be localized.
• Field of view: Human eye has approximately a 208˚ instantaneous horizontal field of
view [Levine 85]. (19” CRT viewed at 18” subtends about 56˚)
• Illumination range: Visual system can operate over a range from roughly 1 to 10 10,
or about 10 orders of magnitude [Jain 89].
• Acuity: Eye can resolve a separation of about 0.5 minutes of arc under optimal
lighting conditions [Bruce & Green 90].
• Color system of eye uses three primaries ⇒ Displays can use three primaries
• Limited frequency response means auditory display doesn’t have to reproduce all
frequencies
• Human can process many sounds in parallel ⇒ must be able to display multiple
sounds simultaneously
• Ideal display should account for individual differences: Pinnae, head, etc.
Haptic
• Sensory organs not localized; no way to place one or two displays to fool haptic
sense
• Hard-surface forces and textures require high frequency response and non-linear
damping [Deyo 89]
Vestibular
• Fooling this sense implies real body motion (although fooling other senses can give
illusion of self-motion)
• Motion sickness may result if vestibular input does not match input from other
senses; interactions between other senses are critical
Smell
Taste
3.1 General
General characteristics:
• Display is usually located close to sensory organ (e.g., displays in front of eyes,
earphones next to ears, etc.).
• Display’s output should ideally match stimulus range of organ (e.g., visual displays
need only display visible light, not infrared, ultra-violet, etc.).
• There is often the option of replacing or augmenting the real environment (e.g.,
visual displays may be see-through or opaque; earphones may block outside noises
or let them through).
3.2.1 Overview
Function: Provide images of virtual objects to user’s eyes that either replace or augment
the real environment.
Types:
• monitor-based (non-immersive)
• head-mounted (immersive)
• arm-mounted displays (semi-immersive).
Monitor-based displays are typically fixed within the environment and give a “window”
onto a virtual world, rather than immersing the user in it. A conventional CRT is the
simplest example of this type of display. Other systems use some stereo mechanism to
deliver different images to the user’s two eyes in order to display objects in depth. Realism
can further be enhanced by monitoring head movement so that the “through the window”
image can be updated according to head movements.
Head-mounted displays (HMDs) are typically headsets incorporating two small LCD or
CRT screens with optics to bring their images to the user’s eyes. Head motion is usually
monitored with a tracker (discussed in section 5) so that the image can be updated to reflect
the current head position and orientation, which gives an illusion of presence in the virtual
environment.
Virtual image of
Screen screen
Lens(es)
Eye
relief
Stereo viewing:
Left
eye
Right Perceived
eye Right-eye image point
of point
Left
eye
2δ
Right
eye
Monocular
horizontal Total horizontal
FOV FOV
FOV formulas:
If we define:
FOVT = FOVm + 2δ
screen point
real object
half-silvered
mirror
General
• Stereo/Mono- Can the device display stereo images, mono images, or either?
• See-through/Opaque: Does the display obscure the real environment or superimpose
the virtual objects into it?
• Price
Image source
• Resolution
• Type: CRT, LCD, ELD, etc.
• Luminance: overall brightness; critical for see-through HMDs (may be mediated by
optics; e.g., pancake window)
• Contrast
• Color/Monochrome
• Refresh rate and lag
• Screen or image-source size
Optics
• Field of view (horizontal and vertical)
• Binocular overlap
• Image distance
• Eye relief
• Exit pupil
• Magnification: Screen image size
• Distortion and other aberrations
• Overall size/weight of optics
• Resolution: Most popular units have resolution of less than 200x200: Leaves user
legally blind in virtual world [Piantanida et al. 93].
• Color: Small, color CRTs are just becoming available, but only at high cost.
• Field of view: Current systems are not even close to matching human visual
system’s FOV. Tends to diminish the sense of presence in VE.
• Range: Current HMD systems require the user to be tethered by cables of varying
lengths, which limits range of use and encumbers the user.
• Color displays typically have less resolution than monochrome, since color pixels
may take up three times as much area as monochrome. Field-sequential color
systems don’t have this problem, but are harder to update quickly and harder to make
bright.
• CRTs have better images but are less safe than LCDs due to high voltages and strong
magnetic fields.
• Larger fields of view mean more work for the image-generation system.
Ideally, a visual display should meet or exceed all of the limitations of the human visual
system, so that the virtual image would be indistinguishable from a real one. Practically
speaking, this will not happen any time soon. The following is our wish list for a visual
display that we might see within the next 10-15 years.
• Stereo
• 180˚x100˚ FOV
• 4,000 x 4,000 resolution, with optics that concentrate resolution in center of FOV
• Can electronically blend between completely opaque and completely see-through on a
per-pixel basis
• < $10,000
• Full color
• High contrast and variable brightness
• 100 Hz update rate
• Negligible optical distortion (or easily correctable in software)
• Lightweight, low moments of inertia: Like a pair of glasses
• No encumbering cables
Commercial HMDs can be divided into two major categories: LCD-based, general-purpose
opaque HMDs and CRT-based, military, see-through HMDs.
This section gives a listing of many of the current vendors of immersive displays for
comparison and contrast. This listing is intended to be representative of the current state of
the art, but is not necessarily a complete listing of all vendors. A good listing of vendor
contact information is Appendix B of [Pimental & Teixeira 92].
General notes:
• Stereo/Mono: Many stereo HMDs cannot be used for mono images; i.e., feeding the
same video signal to both eyes will not work because the screen images are not
centered in front of the eyes; generation of two different viewpoints or simple lateral
shifting is required.
• The range category is intended to give an idea of whether the display itself (not the
tracker) limits the range of movement of the user. Systems whose inputs are NTSC
are listed as unlimited, since these cables can be quite long and the NTSC signal can
be broadcast as RF without cables. Other systems limited by RGB cable or fiber
optic cable length are so indicated.
• Some vendors also list “field of regard” or “total field of view”, which is essentially a
tracker specification.
• CRTs typically have much higher contrast, brightness, and resolution than LCDs.
Polhemus Labs either both color see ~500x500 50x40˚ 0-100% 8’ 35- e
notes TV line prs 50,000
Private Eye opaque mono* mono red 720x280 22x14˚ N/A unltd 795 f
LED (one eye)
Virtual Reality, Inc see- both gray- CRT 1280x1024 50-77˚ 35- unltd 60,000
HMD 121 thru scale diag 100%
(green)
n/Vision High- see- both color* CRT 1280x1024 50-83˚ 35- 10’ 75,000 g
Resolution HMD* thru 100%
CAE FOHMD see- both color Light- 1024x 127x66˚* 46% 6’ ~1 h
thru valve* 1024* million
project
or
Kaiser SIM EYE see- both color* CRT 640x480 to 60x40˚ 50- 6’ 200,000 i
thru 1280x1024 100%
Honeywell see- mono gray- CRT 1280x1024 40x30˚ N/A 6’ see j
IHADSS thru scale (one eye) notes
Honeywell WFOV see- both gray- CRT 1280x1024 80-110˚ x 17-67% 6’ ~ k
thru scale 60˚ 200,000
Honeywell see- both gray- CRT 1280x1024 35-52˚ x 51- 6’ ~
MONARC thru scale 35˚ 100% 150,000
Fake Space opaque both gray- CRT ~480x480 140x90˚* 43% ltd* 35,000 l
BOOM2 scale
Fake Space opaque both pseudo CRT 1280x1024 140x90˚* 43% ltd* 74,000 l
BOOM2C -color
LEEP Cyberface 3 opaque mono color LCD† 720x240 80˚ diag N/A ltd* 9,740
† LCD display resolutions are typically quoted in “primary colored pixels," meaning that each red, green,
and blue pixel is counted individually, which multiplies the real resolution by a factor of three. The real
resolution depends on the layout used for the color triads in the display. Dividing the horizontal resolution
by 3 or both resolutions by √3 should give some idea of the real resolution. Keep this in mind when
comparing CRT-based HMDs with LCD-based HMDs- CRTs are currently MUCH sharper.
1. Image source: Improving the resolution, color, size, etc. of the displays in the
HMD
2. Optics: Creating wide-field-of-view optical systems with large exit pupils, adequate
eye relief, minimal distortion and other aberrations, while minimizing size and
weight.
3. Mounting: Developing the part of the HMD worn on the head to make it
lightweight, comfortable, and easily adjustable.
A few of the many projects going on worldwide are listed below; this is by no means a
complete listing.
NASA-Ames
Stephen Ellis and Urs Bucher have developed an “electronic haploscope” for performing
perceptual experiments for better understanding depth perception in VEs. They use a
HDTV monitor configured for NTSC to achieve a horizontal resolution of better than 4.5
arcmin/pixel with a monocular FOV of about 25˚. The CRT images are relayed to the user
by a partially silvered mirror. The haploscope has adjustments for overlap, vergence and
accommodation, and is mounted on a table similar to an optical bench. The display is used
in experiments for making depth comparisons between real and virtual objects.
The HITLab VRD uses modulated light from a red laser to scan out a computer-generated
image directly onto the retina of the user. An optical bench prototype has been
demonstrated with a 500x1000 monochrome image subtending a 45˚ monocular field of
view (approximate).
The group at AFIT (Wright-Patterson AFB) has done research on HMDs since at least
1988, when Captain Robert Rebo built an LCD-based bicycle-helmet-mounted display,
described in [Rebo 88]. The third-generation HMD was designed by Jim Mills and Phil
Amburn and addresses the current lack of color CRTs for HMDs. Their approach is
similar to that used for projection TVs: Separate red, green and blue CRTs are optically
combined to achieve a high-resolution, high-contrast color image. A block diagram of the
optics is given in Figure 3.5.
M3
M1 M6
Green CRT
BS2
This system uses 1” CRTs, beam splitters, dichroic filters and mirrors to form a 640x480
image through the LEEP optics.
The HMD being developed at UCF will use holographic optical elements (HOEs) to
provide a lightweight, wide-field-of-view, see-through design. The system will warp the
image prior to display to maximize the FOV and concentrate detail near the center. The
optics then perform the inverse warping to create a correct, wide-FOV image. The HOEs
are based on duPont’s new holographic recording film materials. The warping is done
either in hardware or software by the image generator. The HOE is both lightweight and
inexpensive to fabricate, and could lead toward the goal of HMDs as comfortable as
eyeglasses. [Clarke 93]
• 30-degree FOV
• Resolution: 360x240 primary-colored pixels
• 100% Overlap
• Design goals: Rigid frame, adjustments for IPD, focus, fore-aft and up-down for
images, calibrated settings
A 60˚ FOV model using custom optics and color CRTs has been designed and should be
operational by late Fall ‘93.
FDE Associates: Working on a high-resolution, small color CRT. One of the problems
in building a tiny color CRT system is that a traditional three-gun system is too large to fit
into the neck of the small tubes used for such CRTs. Their idea is to use a single gun and a
moving shadow mask in order to selectively excite phosphor stripes of each color. FDE’s
goal is a 1000x1000, 1” color display.
Tektronix, Inc: Another approach to the color-CRT problem. The EX100HD uses a pair
of one-inch monochrome CRTs and color shutters to create frame-sequential color images.
The NuCOLOR shutter is an electrically switchable color filter made of fast optical LCD
switches and color filters that allow its color to change rapidly between red, green and blue.
A full color image is created by drawing the R, G and B fields of an image sequentially in
sync with the filters. The field rate for each individual color is 180 Hz, and the overall
frame rate is 60 Hz.
Final Note: There are probably a dozen research efforts underway trying to produce a
high-resolution, small, color display for use in VE. At the time of preparation of these
notes, some of the first such products are entering the marketplace; more will doubtless
make their debut by September ‘93.
Function: Provide auditory feedback to user that either replaces or augments auditory
input from real environment. System should ideally be able to present any acoustic
waveform to either ear in real time.
Types: Speakers mounted on head (earphones) or mounted in environment. The latter has
limited imaging capabilities.
Issues:
• User-specific sound modulation/head-related transfer functions (HRTFs)
• Echoic vs. anechoic sound
• Isotropic vs. non-isotropic sources
• Multiple vs. single sources: Processing power
• Sampled vs. computer-generated sounds
• Synchronization vs. playback speed
• Modeling reflective acoustic surfaces, Doppler shift and other physical characteristics
• Scheduling: Queued or interruptable
Commercial Systems
3D Sound:
• Crystal River Engineering Convolvotron- Stored spatial filters are convolved with
sound sources to produce up to four localized, isotropic sounds at a time. Simple
reflective environment can be modeled with additional hardware. $15,000.
• Visual Synthesis Audio Image Sound Cube- C-language library for MIDI-based
sounds and 3D positioning. $8,000.
Text-to-Speech Systems:
• AICOM Accent SA- PC-based text-to-speech synthesis board; $500 and up.
Types:
• Force-feedback joysticks
• Force-feedback arms
• Force-feedback exoskeletal devices for hand, arm, other
• Tactile displays
• shape-changing devices:
• shape memory actuators
• pneumatic actuators
• micro-mechanical actuators
• vibrotactile
• electrotactile
Issues:
• Sensors scattered throughout human body- no localized display possible as with
vision, etc.
• Physical characteristics of objects must be modeled: Hardness, texture, temperature,
weight, etc.
• Collision detection must be done in real time
• Hard surfaces require high frequency response, non-linear damping
• User input must be measured and reflected in environment
• Grounded vs. ungrounded displays
• Degrees of freedom (DOFs) of device vs. human
• Safety
Commercial Systems
Tactile
• Xtensory Tactools XTT1- Small transducers create tiny vibrations or impulses; basic
system has one tactor and costs $1,500; system can support up to 10.
• Telesensory Systems Opticon- A vibrotactile display for blind reading; 20x5 pins on
1.5”x.75” area; $3495.
• TiNi Corp Tactors- Shape-memory alloy tactile stimulators, points and arrays.
Monitor and 3x3 tactile display cost $7000.
• Digital Image Design Cricket- See entry under 3D mice in Section 5.3.
Kinesthetic/Proprioceptic
Research Systems
• UNC Argonne Remote Manipulator- This force-feedback arm can output three forces
and three torques at the handgrip where the user holds it and has a working volume of
about one cubic meter. It has been used in a number of different studies; most
recently for finding a minimum energy docking configuration for drugs in the active
site of a protein molecule [Ouh-Young, Beard and Brooks 90] and for feeling the
atomic-scale peaks and valleys on a surface imaged by a scanning-tunneling
microscope [Taylor et al. 93].
• Rutgers Portable Dextrous Master with Force Feedback- This system uses
micropneumatic actuators place in the palm of a VPL DataGlove to create a light,
simple, and relatively inexpensive manual force-feedback device. The air-piston
actuators are grounded at the palm and give feedback to the thumb, index, and middle
fingers [Burdea et al. 92].
Prediction: Haptic displays will remain more application-specific than other VE displays
for the foreseeable future.
4.1 Introduction
Most of the commercial 3D graphics systems are sold as desktop or deskside workstations.
Unfortunately, the demands on image generators of VE and immersive displays are quite
different than those of CRTs for typical desktop graphics uses.
• VE demands high frame rates.
• Due to the high frame rates, graphical models tend to be small. There is no need
to support arbitrarily large datasets.
• Low-resolution video rather the standard 1280 x 1024 is the norm for driving
HMDs. Most require NTSC or PAL.
• Usually two double-buffered channels are necessary for stereo HMDs.
• Windowing systems are not necessary.
These differences make it difficult to use many of the excellent graphics accelerators on the
market. Many of the low and mid-level desktop workstations do not provide NTSC, either
standard or as an option. Most cannot produce two channels. Thus they would require two
graphics heads and two scan converters to be useful. This often pushes the price to the
levels of the high-end image generators.
This section first reviews graphics performance, illumination, and shading. We then survey
systems specifically for VE, and the subset of commercial image generators that can
provide good service for HMDs, followed by a section on experimental graphics systems.
We conclude with the special demands of immersive VE on image generators.
Polygon performance and price are probably the first considerations when choosing an
image generator. Not only is VE work very demanding of performance, but the image
generator is likely to the most expensive part of a system for VE. Other factors beside raw
polygon performance must then be considered, such as capabilities for texturing. As we
shall see, for some applications, texturing can make for a much richer visual environment.
The improvement in realism afforded by texturing is often greater than that provided by
more complex geometry. Most flight simulators render few polygons, but can create very
rich scenery with textures.
We have found that it takes at least 15 frames per second to provide user comfort in a head-
mounted display. A higher frame rate, say 20 to 25 fps, is noticeably better. When using
stereo, the work of rendering is essentially doubled. This means that we'd like to render at
least 30 frames each second and would prefer 50. We would also like to maintain low
latency, but do not have much control over this factor in most image generators. Typically a
higher frame rate results in lower latency.
Using our two frame rates, we can get an idea of the rendering performance that we'd like
to achieve.
Since published benchmarks, with one notable exception, do not take into account the
complex nature of models to be rendered, be sure to carefully take your particular needs
into account when comparing image generators.
It is naive to think that one can render a rich scene at 30 frames per second on a workstation
without graphics acceleration hardware. To illustrate this fact, let us examine the number of
operations that it takes to render a triangle (this analysis is adapted for our needs from
[Molnar & Fuchs 90]).
Figure 4.1 illustrates the classical graphics pipeline (for more information, refer to the
references listed below). This particular case is for Gouraud shading and z-buffering, the
standard for current graphics hardware.
Trivial Reject/
Backface Culling
Lighting
Viewing Transformation
Clipping
Rasterization
Figure 4.1
Total Cost
Let us assume that we are rendering, in stereo, the middle example shown above, the 5000
triangle dataset. For simplicity, let us assume that these are rendered as individual triangles,
though the cost would be reduced by using triangle strips. Assume that half of the triangles
will be trivially rejected. The floating point costs will be:
This shows a total of approximately 42 megaflops for a relatively simple dataset. While
some of the newer microprocessors promise this level of performance, you'd be hard
pressed to achieve it in reality. Furthermore, note that we have assigned no cost to database
traversal, or to the application program. As a practical matter, it takes multiple processors to
achieve this level of front-end performance.
The back-end rasterization costs are also very high. To clear the frame buffers, each z (one
32 bit word) and color value (perhaps a 32 bit word) must be written, for a total of 2 × 640
× 480 × 2 = 1,228,800 32-bit writes to the frame buffer. Let us assume that the average
size of our triangles is 100 pixels, and that one half of them is visible. Since some pixels
are initially visible, and are then obscured by another pixel, let us assume that 3/4 of the
pixels must perform the complete z-buffer access, while the remaining 1/4 will just read the
z value.
Using forward differencing, calculation of a z value takes one addition. Therefore, our
invisible pixels cost 2 × 625 × 100 = 125,000 integer additions and the same number of
frame buffer cycles. The remaining initially visible pixels cost four additions per pixels and
three frame buffer cycles. The total costs for each stereo frame are 1,625,000 additions and
2,845,800 frame buffer cycles. When multiplied by 15 frames per second and two
channels, the total costs are 48 million additions, and 84 million word-wide frame buffer
cycles per second.
All high performance graphics accelerators use parallelism in one form or another. The first
stage of the pipeline that needs parallelism is rasterization and frame buffer access. These
functions are usually performed with specialized hardware. At some point the
transformation, lighting, and clipping become the bottleneck and must also be parallelised.
4.2.2 Antialiasing
Even with the poor quality of current head-mounted displays, aliasing is clearly visible.
With better displays in the future, antialiasing will be necessary. We see the trend toward
antialiasing in high-end graphics systems of today such as the Reality Engine from Silicon
Graphics. Given the normal progression of features we expect to see lower cost systems
with strong antialiasing capabilities soon. Note that antialiasing is very expensive
computationally.
Most graphics hardware and software support illumination models consist of the following
components:
• Ambient - models a source of light that is non-directional and arrives from all
directions. The ambient light term is a simple approximation to the light reflected
by objects in the environment.
• Diffuse - models the dull reflections of light from matte surfaces due to point or
directional light sources. The brightness of a surface depends on the angle
between the surface normal and the light, and is independent of viewing angle.
The illumination models may be calculated for each vertex, or perhaps only once per
polygon, depending on the shading model to be executed during rasterization. Note that
these models are fairly simple, and do not include the effects of reflections of lights within
an environment. Global illumination lighting models take into account the contribution of
light from other objects in the environment.
Radiosity Illumination
Radiosity methods calculate global illumination based on physical models of reflection and
can provide very realistic images. They are too computationally expensive to be calculated
in real-time on most graphics systems, as is necessary for HMDs. However, the
illumination is view independent, so may be pre-computed off-line if the graphical database
4.4 Shading
By shading, we refer to algorithms used to calculate color for individual pixels. The color
is normally based on illumination models that determine the effects of light on geometry.
• Flat shading - This is the simplest shading model. It was used in early systems,
and is still used in low-end systems because the computational requirements are
much lower. Only one normal vector must be computed per polygon to calculate
the lighting, and no color interpolation is performed. Unfortunately, this causes
very objectionable effects because the color of polygons changes at polygon
edges.
• Gouraud shading - This is the common shading model provided by graphics
systems today. Normal vectors at each vertex are used to compute lighting, but
only the color is interpolated during rasterization. The interpolation avoids the
color discontinuities at polygon edges.
• Phong shading - A lighting model is calculated at each pixel using a normal
interpolated across the polygon from normals at the vertices. This shading
method produces much more accurate specular highlights than those possible
with Gouraud shading. Commercial systems typically do not perform Phong
shading at the frame rates required for VE.
Texture Mapping
Texture mapping refers to the mapping of an image to geometry. Usually the image is
stored in dedicated memory on the graphics accelerator. Extra processing must be done to
properly map the images onto the polygons. Hardware texture mapping is a feature found
mostly on high-end graphics accelerators. In many simulations, the sense of realism
achieved by texture mapping can replace many polygons.
The most realistic textures are obtained from hardware capable of tri-linearly interpolating
pre-filtered images. The usual technique used is known as Mip mapping and is due to
Williams [Williams 83]. Some hardware point-samples the image-based texture, with a
resulting loss in image quality. Note that for highest quality, textures should also be
corrected for perspective distortion.
Shadows
Shadows can provide visual cues that enhance the effects of an image. Unfortunately,
computing shadows can be quite expensive. Using pre-computed radiosity illumination,
one can obtain shadows for static datasets. To provide shadows for dynamic datasets, it is
possible to use a two-pass z-buffering algorithm [Williams 78].
This type of API is well suited for applications that do not change the graphics database
very much. A disadvantage of retained mode that is often cited, the fact that image
complexity is limited by memory size, is not a problem for work in VE since the size of the
dataset is limited by frame rate. Retained-mode systems can load the display list onto the
graphics accelerator, thus largely divorcing rendering from the main CPU. The best known
retained mode API is PHIGS+ which supports a hierarchical display list.
The main disadvantage with retained mode is that, for some applications, you must
replicate the database. For example, the 3DM application [Butterworth et al. 1992] at UNC
(a 3D MacDraw like tool for use under the HMD) maintains an application specific database
and issues PHIGS editing commands. This results in two independent copies of the
dataset.
Immediate mode is well suited for data that change a great deal every frame. For example, a
mesh of triangles describing an airplane wing during a simulation. It certainly involves the
host CPU, therefore that processor must be fast enough both to run the application and to
generate the polygons to be rendered. The best known immediate-mode API is IRIS GL
from Silicon Graphics. An evolution of the GL into an open standard called OpenGL is
underway. The standard is now under the control of a committee with a broad base of
industry representation. Many manufacturers have agreed to support OpenGL .
Of course, the best way to compare computers in general is to benchmark them on the
problems that one is interested in solving. Since this is rarely possible, the next best
approach is to compare the results of standard benchmarks executing code similar to that
necessary to solve the target problems. This has been done for many years with CPUs.
Benchmark results are reported in units called GPCmarks, and can be literal (instructions
followed exactly), or optimized (some optimizations were performed to increase
performance). GPCmarks are calculated as
The normalization factor is meant to capture the difficulty of the benchmark. Higher
GPCmarks indicate lower elapsed time and higher performance. Of the standard benchmark
suite, three are applicable to VE. They are:
head - a scanned human head of 60,000 triangles in triangle strips lit by four directional
light sources. The head rotates around the vertical axis for 240 frames. The difficulty
factor is 4800.
shuttle - A space shuttle rendezvous with a satellite. The models consist of a mix of
triangle strips, quad meshes, and polygons. There are also 2283 markers used to
represent stars. The difficulty factor is 4000.
studio - An architectural walkthrough of a radiosity-lit artist’s studio modeled with
7518 quads. A total of 300 frames are rendered during the walkthrough. The difficulty
factor is 2500.
Note that the standard benchmarks are computed using a 900 by 720 window, somewhat
larger than the normal NTSC resolution. Unfortunately, not all vendors supply PLB
specifications.
The systems in this section are targeted specifically for work in VE with HMDs. Unlike
systems for the desktop graphics market, these feature the NTSC video necessary to drive
the current HMDs as standard. These firm’s intents are to sell complete systems, including
the image generator, HMD, tracker, software, and sometimes a sound generator.
Division
Division provides hardware and software for VE. They sell complete standard
configurations, as well as custom systems. The Division software is also available for use
on SGI image generators.
Test conditions:
• Triangles are 100 pixel, 24-bit color.
• Textured triangles are 100 pixel, 24-bit color, point sampled.
Notes:
• Performance is per eye.
• Host processor is an Intel 80486 running UNIX System V.
• Fill rate of 100VRT is 8M pixels per eye.
• Supervision rendering system consists of a communications ring (for each eye),
supporting renderers based on i860s. A frame grabber for generating composite
images of live and synthetic video is an option.
• The rendering board set (3 PC cards) used in the 100VRX is sold separately as
the dView for approximately $13,000. This set generates two channels of video.
Sense8
The main focus of Sense8 is their software for VE, the WorldToolKit. However, they
resell the SPEA Fire PC-peripheral Graphics board that uses an i860 for rendering. The
WorldToolKit also runs on SGI hardware.
Test conditions:
• Polygons are 100 pixel, 24-bit color.
• Textured polygons are 100 pixel, not perspective corrected, 24-bit color.
Notes:
• 8 Mb SPEA Fire memory upgrade available for $800.
VPL
VPL is developing the Microcosm system hosted by a Macintosh Quadra 950 with graphics
accelerator boards (manufactured by Division, but sold exclusively by VPL). The graphics
boards are essentially the dView boards (see above) with a Macintosh rather than a PC
interface. The current pre-production system includes an Eyephone LX HMD (see section
3.2.6), and a Polhemus Fastrak (see section 5.2.2.1).
By far the largest market segment in 3D graphics is for desktop workstations. The typical
uses of these workstations are in fields such as computer-aided design, where the
performance needs and demands are much different from those in VE. Since these
workstations are sold in moderately high volume and there is much competition for the
market, price/performance tends to be good.
There are some problems with using desktop graphics systems for VE, however. One is
that the standard video provided on workstations is not NTSC (which is used for most
HMDs). To obtain NTSC, optional video cards or expensive scan converters must be
purchased. Since a good scan converter costs upwards of $20,000US, it is not an attractive
solution. Furthermore, generating two channels of video to provide stereo for the two
displays in a typical HMD is another problem. The low-end workstation systems are not
equipped with multiple channels of output video. The combination leaves many good
systems, especially in the middle of the performance range, unusable.
There have been custom solutions to the problem of generating two channels of video for
HMDs: For example, Folsom Research, a manufacturer of high quality scan converters
(see section 4.7.6) has built custom scan-converter / video-splitter products for customers.
Unfortunately, these are not cheap.
Silicon Graphics
Silicon Graphics (SGI) is the best known vendor in the graphics workstation arena. They
have a range of processor products from the Indigo with a MIPS R3000 CPU, to the Onyx
with a maximum of 24 MIPS R4400 CPUs. Matching graphics processors range from the
XS24 to the RealityEngine2.
The high-end systems are all very good for work with HMDs.
Test conditions:
• Triangle Mesh: 50 pixel, unlighted, flat-shaded, z-buffered
• Textured Triangles: 50 pixel triangles in a mesh, antialiased, Mip-mapped, tri-
linearly interpolated.
Notes:
• All provide 24 bits for color. Extra frame buffer memory for advanced shading.
• 32 bits for z, except for the VGXT which provides 24.
• At least four channels of NTSC is provided via a “video splitter” option at a cost
of $19,000US. Price for a splitter is included in the table.
• The Reality Engine is primarily an image generator for generating textured map
polygons.
• SGI software is specialized for real-time operation.
• Crimson CPU is a MIPS R4000 benchmarking at 60.5 SPECfp92, and 58.3
SPECint92.
The new Indigo2 Extreme is a very interesting desktop graphics system with performance
of 450,000 triangles per second (meshed as above). It is priced at $35,000. An option
called Galileo and priced at $6,500 is available that can provide NTSC for the Extreme.
However, unlike the splitter, it only provides one channel. Two Extremes can not be placed
in one Indigo. It's possible to use two networked Indigo2 Extremes, one for each eye.
However, this may not be an attractive option because of the tight synchronization that
must be performed to switch both frame buffers simultaneously.
At the time this report was prepared, the PLB results for the model 3200 were available.
They are:
Literal Optimized
head 239
shuttle 78.2 102.5
studio 198.3
Manufacturer’s specifications:
Test conditions:
• Triangle strips contain 25 pixel triangles, flat shaded.
• Polygons of size 100 pixel, Gouraud shaded.
• Textured polygons are in a quad mesh, sized 50 pixels Mip-mapped, tri-linearly
interpolated.
Kubota
Recently Kubota announced the Denali line of graphics accelerators, designed to interface
to Digital Equipment Corporation workstations based on the Alpha microprocessor
architecture. Combined workstation/graphics systems are named Kenai.
Test conditions:
• Triangle strips contain 50-pixel Gouraud-shaded triangles.
• Textured triangles, of size 50 pixels, are perspective corrected, and point
sampled.
Notes:
• NTSC and PAL video is made via an optional card at $3000. One channel.
• Two accelerators must be used to generate stereo for HMDs. Provision for this is
made in the hardware and software.
• Prices are for one Kenai workstation, two Denali accelerators, and two NTSC
video options.
• Processor for the 3400 is a 133 MHz Alpha benchmarking at 75 SPECint92, and
112 SPECfp92.
• Processor for the 3500 is a 150 MHz Alpha benchmarking at 84 SPECint92, and
128 SPECfp92.
• The specifications were generated with only one graphics accelerator.
• Dual 24-bit frame buffers.
• 24 bits for z.
• The graphics accelerator model number designates the series, E, P, and V,
followed by the number of transformation and frame buffer modules: For
example, P510 is a P series accelerator with 5 transformation modules (TEM)
and 10 frame buffer modules (FBM). Transformation and frame buffer modules
may be added for higher performance, up to a limit of 5 FBMs and 3 TEMs for
the E series, 10 FBMs and 5 TEMs for the P series, 20 FBMs and 6 TEMs for
the V series.
• Both immediate (GL) and retained (PHIGS) mode APIs are available. The GL is
provided by a third party vendor, Nth Graphics. Support for OpenGL is
planned.
Pixel-Planes 5
The first full-size prototype system built by the graphics group of the University of North
Carolina at Chapel Hill was Pixel-Planes 4 [Poulton et al. 1987]. That system had a frame
buffer with a 1-bit processor for every pixel on the display. It served as the main graphics
engine at UNC for many years, and was used to drive several HMDs.
However, it became apparent that a processor for every pixel was not a good way to utilize
computing power when primitives typically covered only a small portion of the screen.
This led to the design and construction of Pixel-Planes 5 [Fuchs et al. 89], which is a much
more modular machine. A block diagram is shown in Figure 4.2.
Renderer
Renderer
128x128-processor SIMD Renderer
"computing surface" based on Renderer
custom, logic-enhanced memory Renderer Frame
Renderer
chips. Renderer also contains Buffer
two other full custom chips, a
controller for the SIMD array and a
data "corner turner". Each Frame Buffers
Renderer can rasterize over 120K
Phong- shaded triangles per System can accept a variety of Frame Buffers and other external
second, and over 100K spheres interfaces. Currently, these include HiRes (1280x1024, 74Hz, 24bit)
per second. A system may contain and NTSC (640x512, 30Hz, 24bit). Applications are supported by a
from one to 25 or more renderers. Host Interface to a Sun4 and a HiPPI interface. A video frame grabber
is under construction.
Figure 4.2
PixelFlow
To provide an architecture that is scalable with respect to the number of primitives, one can
provide multiple communications paths for primitives. This is possible, but since primitives
can generate pixels on any part of a display, there must still be a way to allow a primitive to
influence any pixel. This is done by sorting pixels instead of primitives via a composition
network.
R R R R
C C
C
Figure 4.3
The system shown in Figure 4.3 consists of four separate graphics accelerators with
geometry (G) and rasterization pipelines (R), each rendering a complete z-buffered image,
but only of a subset of the primitives. Each of these pipelines looks very much like an older
graphics system without much parallelism. The images produced by the individual
pipelines are composited by a combining network to generate the final image. The data
flowing down the tree contains depth, as well as color. The composition network performs
z-buffering to generate the final image. To add more performance, one need only add more
graphics processing nodes. The bandwidth of the composition network remains fixed and
is determined by the product of the frame buffer size, number of subsamples for
antialiasing, and the required frame rate. For ease of physical implementation, PixelFlow
will have a linear composition network instead of a tree.
Another bottleneck encountered when trying to increase rendering rate is the host and host
to image generator link. We expect to drive PixelFlow with a parallel host and parallel data
streams to the rendering modules. The software model will generate primitives in parallel.
We call this distributed immediate mode and have prototyped it on Pixel-Planes 5 by using
some GPs as host processor nodes.
4.7.1 Input/Output
This is not much of a problem. Most commercial trackers interface via RS232, and one or
more RS232 interfaces are standard on most computers. Trackers that require a higher data
rate use more complex I/O interfaces, such as IEEE488. Usually these interfaces must be
accommodated by adding interface cards.
Output to non-graphical devices, such as those to generate audio, also tends to be at low
data rates. RS232 interfaces accommodate most of these needs.
Stereo is often used with head-mounted displays. That alone requires two channels of
video. We may also want to support two or more users wearing HMDs. The performance
demands on one image generator can grow drastically, though multiple channels are often
used in flight simulators. We support two persons wearing HMDs on Pixel-Planes 5, but
would find it difficult to sustain the pixel traffic necessary to add more HMDs on one
machine.
Another way to support shared virtual worlds is to use separate image generators attached
by a network. Changes in the dataset are transmitted over the network. This approach splits
the intensive rendering and pixel traffic, but limits the dynamics of the dataset to what can
be exchanged between machines. It seems better to use a tightly coupled parallel machine
with multiple image generation pipelines (a SGI Onyx, for example).
Since the hither plane for work using HMDs is often very close to the user and objects can
be far away, a deep z-buffer may be necessary. Keep this in mind when choosing an image
generator. Twenty four bits should be adequate for most work.
Latency is a very big problem for HMD systems. We can divide latency into that caused by
the tracking device, and that caused by the image generator. Unfortunately, latency and
throughput in an image generator are often at odds. One can use pipelining of successive
frames to increase throughput. We do that on the standard PHIGS system on Pixel-Planes
5. This, of course, increases latency. Reducing the latency by eliminating pipelining also
reduces the utilization of resources, and consequently the throughput.
Note that the minimum latency is normally determined by the scanout of the display.
Usually latency is measured back from the last scan line of the display. This fixes the
minimum latency of an NTSC display at one field time, 1/60 of a second.
The wide field of view optics used in HMDs tends to distort the images by a large amount
[Robinett & Rolland 92]. There are several ways to correct this. One is optical, but this
solution can increase the weight of a HMD to an unacceptable amount. Another way is to
correct by using analog circuitry in the scanning drivers for the display (if it is a CRT).
This may be difficult, especially if the distortion is a complex function of screen position.
4.7.6 Video
Scan converter
A scan converter acts as a format converter for video. It accepts as input a video stream in
one format, and produces as output a video stream in a second format. For our purposes,
the input is HiRes video, such as used in graphics workstations, and the output is NTSC.
Scan converters are expensive, so are not a very practical way to generate video for HMDs.
Composite video
Some NTSC frame buffers generate video with three channels of color, one for each of
red, green, and blue. To use this video for typical HMDs, it must be converted to
composite video. Devices, called encoders, are available to perform this function. Some are
relatively inexpensive with prices in the low hundreds of US dollars. Others, of high
quality, can cost as much as $8000. We have found encoders priced under $1000 to be
adequate for the current HMDs.
Chroma keyer
A chroma keyer combines two video streams in a controlled fashion. One video stream,
called the "key" is gated to the output by default. However, the key color is constantly
examined. If a specific color, called the "chroma key color" is detected, the second video
stream is inserted into the output, replacing the first.
Thus to selectively mix video, one can use the image generator's output as the key and
make the default background color the chroma key color. Camera video can be used as the
second video input to the keyer. The result is that any objects generated synthetically appear
superimposed on the image of the "real world."
This, of course, does not result in a correct "see-through" HMD, but it approaches one. As
with any see-through HMD, tracking and latency problems will be especially apparent and
noticeable[Bajura et al. 92].
z-buffering
If one could calculate the distance of objects imaged by the video camera, one could
combine the z information with that generated for the synthetic objects to compute visibility
for the complete scene, real and virtual. Unfortunately, there are difficult research problems
to be solved to compute the distance of objects, and the computational demands to compute
the visibility are high.
For information on the general subject of graphics, the following text is the standard
reference:
Foley, J. D., A. vanDam, S. K. Feiner, and J. F. Hughes, Computer Graphics:
Principles and Practice, Addison-Wesley, 1990.
5.1 General
5.2 Trackers
Trackers are fundamentally sensors whose task is to detect position and/or orientation of an
object and make the information available to the rest of the VE system. The most common
function is to report the position and orientation of the user’s head. Hand tracking is also
common.
For head or hand tracking, there are six types of motion that may be tracked:
• Translation in x, y, z
• Rotation about the x/y/z axes: Roll, pitch and yaw.
Because these motions are mutually orthogonal, there are thus six independent variables or
degrees of freedom (DOFs) associated with any asymmetrical 3D object. These six
numbers are the minimum required to completely specify the position and orientation of a
rigid object4. A particular tracker may monitor all six or some subset, depending on the
implementation. In addition, some trackers monitor only over a limited range of a
particular variable. For example, a tracker might detect roll only in the range ±90˚, or x, y,
and z only within a sphere of one meter in radius.
Types:
• Optical- light from a fixed source or from an object is imaged by a camera device;
some number of sources and detectors is used to calculate the transformation to the
object.
4 Since the hand is not a rigid object, some systems add additional degrees of freedom to account for the
joint angles for each finger, as discussed later.
• Inertial- accelerometers and gyroscopes are used to detect changes in linear and
angular velocity, respectively. These devices are useful for predictive tracking when
coupled with other technologies, but are not currently used for full, 6-DOF tracking
systems.
Issues:
• Update rate: How many measurements are made each second. This can limit how
often we can update our display of the VE. Low update rates lead to jerky,
unconvincing virtual worlds.
• Delay/Lag/Latency: How much time elapses from the moment a user moves until
the tracker data reflecting that movement is received by the host?
• Accuracy: This is the amount of error in the measurement. Usually given as a bound
on the magnitude of the error or as an average error amount. A tracker that has an
accuracy of 0.1” will report positions that are (in theory) ±0.1” from the actual
position.
• Resolution: This is the smallest amount of the quantity being measured that the
instrument will detect. A movement smaller than the tracker’s resolution will not be
reflected in its output.
• Interference/Distortion: All trackers except for inertial systems are subject to either
interference (such as blocking of the line of sight) or distortions (such as field
distortions in magnetic trackers) which can reduce the accuracy or produce gross
errors.
• Range: Working volume and angular coverage. Absolute trackers all have limits on
working volume; many systems have limited angular range as well.
• Robustness: Is the tracker subject to gross errors when its operating environment is
degraded?
• DOFs Measured: Some trackers measure only a subset of the six DOFs cited above.
• Safety: Does use of the tracker pose a long-term (or short-term) health risk?
Magnetic trackers typically consist of a control unit, some number of transmitters (or
sources) and some number of receivers (or sensors). The transmitter radiates a magnetic
field with is sensed by the receiver, whose measurements are used by the control unit to
derive the six DOFs for that receiver.
Advantages:
• No line-of-sight constraints (particularly well-suited for hand tracking)
• Impervious to acoustic interference
• Receivers are generally small and unobtrusive
Disadvantages:
• Distortion/Interference due to metallic objects
• Current systems are very accurate only in small volume
• Cable connection required
Polhemus
Ascension
Characteristics in common:
• Range given is the maximum transmitter-receiver distance; i.e., how far you can get
from the transmitter. The working volume diameter is therefore roughly twice this
distance.
The table below is a condensation of the complete specs for each product and is thus only a
rough guide to each product’s characteristics. Consult the manufacturer’s information for
full specifications.
manu- name max # range max static resolution price price notes
facturer sensors (feet) update accuracy (xlate, rot) ($US) w/ two
rate (xlate, rot) sensors
Polhemus Fastrak 4 10’ 120 0.03” rms 0.0002” per $5,750 $6,250 a
Hz† 0.15˚ rms inch*
0.025˚
Polhemus Tracker 4 5’ 60 Hz† 0.1” rms 0.023” avg N/A N/A b
(discontinued) 0.5˚ rms 0.1˚
Polhemus Isotrak 1 3’ 58 Hz 0.25” rms 0.006”/inch N/A N/A c
(discontinued) 0.85˚ rms 0.35˚
Polhemus Isotrak II 2 5’ 60 Hz† 0.1” rms 0.0015 per $2,875 $3,375
0.75˚ rms inch*
0.01˚
Delay Measurements
70
60
50
(ms)
40
Delay
30
20
10
0
fastrak FOB 1 3 Space Bird 1 fastrak FOB 2 Optical
1 unit unit 1 unit unit 2 unit unit Ceiling
Tracking System
Notes:
1) Shaded area represents communication delays.
2) Flock of Birds not in optimum communication configuration- the IEEE-485
interface should reduce the communication delay depicted above
significantly.
3) Fastrak timing with position and orientation filter on.
Although more optical trackers have been developed than any other technology type, they
are not in common use for general-purpose VE systems. Most of the commercial systems
are employed for military applications.
Types:
• Inside-Out Trackers- Sensors are mounted on the object to be tracked, where they
image some number of fixed beacons in the environment. The position and
orientation is derived from a similar set of measurements as in the outside-in system.
See Figure 5.3.
• Natural environment trackers- No beacons are used. Sensors are mounted on the
object to be tracked and the system derives the 6D information by imaging the
environment and looking at image shifts from one frame to the next. Multiple
sensors can be used to cover the six degrees of freedom. No complete
implementation of this concept has yet been done to our knowledge. Partial
implementations are described in [Bishop 84] and [Tanner 84].
Advantages:
• Can be very fast
• Accurate for small volume
• Immune to magnetic, acoustic interference/distortion
Disadvantages:
• Line-of-sight restriction
• Angular range is often restricted
• Interference due to other light sources
• Current inside-out implementations are cumbersome
• Difficult to track multiple objects in same volume
Note: Some of the trackers listed can be used in larger working volumes with a
proportionate decrease in accuracy.
The system has a linear resolution of 2 mm and an angular resolution of 0.2˚. The update
rate varies from 20 to 100 Hz depending on the number of LEDs visible. The delay also
varies, but is typically around two frames, or 30 msec.
Mechanical trackers are in fairly wide use in many different application areas, from
stereotactic surgery to atomic research. The basic idea for a 6 DOF mechanical tracker is
that there must be at least six orthogonal joints, one for each degree of freedom, and the
joint angles must be measured accurately. These measurements, coupled with the
knowledge of the linkage geometry, allow for very accurate tracking.
Advantages:
• Accurate
• Can serve as counter-balance to hold display
• Can be used for force feedback
• No magnetic, line-of-sight, acoustic interference constraints
Disadvantages:
• Linkages can be intrusive, heavy, and have high inertia
• Difficult to track multiple objects in same volume
• Difficult to implement for large volume
• Central part of working volume is inaccessible for some trackers
• Hard to keep DOFs orthogonal- gymbal-lock problem
The ADL-1 is a low-cost, low-latency mechanical tracker. It claims an update rate of 300
Hz, an accuracy of 0.2 inches in position and a resolution of 0.025." The working range is
a cylinder 1.5 feet high with a radius of 1.5 feet.
Fake Space
Fake Space’s tracker is typically sold with its display, as described in Section 3.2.6. Its
range is a cylinder of radius 2.5 feet and height of 2.5 feet, with a 1-foot excluded inner
core. Update rate is 60 Hz; latency is determined by RS-232 communication time.
Translational and rotational accuracy figures are not given; they quote the shaft encoders
for each joint at 4000 counts per 360 degrees.
Types:
• Time of flight (TOF): Send sound from set of transmitters to set of receivers and
measure the elapsed times. These times plus the known geometry of the
transmitters/receivers is used to derive the position and orientation.
Advantages:
• No electro-magnetic fields
• Can be implemented fairly cheaply
Disadvantages:
• Limited range
• Subject to acoustic interference, changes in air density
• Line-of-sight restriction
3D Mice
• Polhemus 3Ball- Polhemus tracker sensor (6 DOF) mounted in a billiard ball
equipped with a single push-button for initiating actions in VE. This is a commercial
version of the UNC 3D mouse cited in [Brooks 86].
• Ascension Bird 3-Button Mouse- Ascension Bird sensor (6 DOF) mounted in a 3-
button mouse housing.
• Logitech 3-D Mouse- 2/6 DOF ultrasonic tracker with 5 push buttons
• SimGraphics Flying Mouse- 2/6 DOF mouse using magnetic tracking with 3 push
buttons
• Gyration GyroPoint- 3 DOF mouse with 5 buttons. Uses gyroscopes to detect
changes in orientation.
• Digital Image Design Cricket- Prototype hand grip device can have tracker sensor
mounted inside; has tactile display vibration, pressure return at trigger and grip, and
directional thumb button that returns pressure and direction.
Speech Recognition
Issues:
• Speaker-dependent/independent
• Vocabulary size
• Continuous speech vs. discrete words
• Grammar (restrictions on inputs to system)
• Effects of background noise
• Speaker enunciation
• Speech-to-text ↔ Text-to-speech
Other Detectors
• ARRC/Airmuscle Datacq II- Input device to the ARRC/Airmuscle Teletact II.
Forces generated by gripping an object are measured and recorded with this device
for later display with the Teletact II glove.
• CM Research DTSS X/10- See entry in Section 3.4 for temperature sensing.
Information on commercial systems came from the vendors themselves as well as from
summaries in [BBN 92], [Pimental & Teixeira 92]. We thank Andy Hamilton from
Division, John Toomey from Silicon Graphics, Rob Coleman from Evans and Sutherland,
and Jeff Unthank from Kubota for answering many questions.
Information on trackers came from vendors, cited references, and surveys by [Meyer et al.
92], [Bhatnagar 93], [Wang 90], and [Ferrin 91].
Thanks to Fred Brooks for his comments, Linda Houseman for her help in gathering data
for these notes, and Sherry Palmer for editorial assistance.
The authors, of course, remain responsible for any errors in these notes.
7. References
Bajura, Michael, Henry Fuchs, and Ryutarou Ohbuchi. 1992. Merging Virtual Objects
with the Real World. Computer Graphics: Proceedings of SIGGRAPH 92, 203-210.
BBN. 1992. Bolt, Beranek and Newman Report No. 7661: Virtual Environment
Technology for Training. Prepared by The Virtual Environment and Teleoperator
Research Consortium affiliated with MIT. March 1992.
Bhatnagar, Devesh. 1993. Position trackers for head mounted display systems: A
survey. UNC Technical Report TR93-010.
Bishop, Gary, and Henry Fuchs. 1984. The Self-Tracker: A smart optical sensor on
silicon. 1984 conference on advanced research in VLSI. MIT. 1/23/84.
Brooks, Frederick P. Jr. 1986. Walkthrough- A dynamic graphics system for simulating
virtual buildings. Proc. 1986 Workshop on Interactive 3D Graphics. Chapel Hill.
Bruce, Vicki, and P. Green. 1990. Visual perception: Physiology, psychology and
ecology. Lawrence Erlbaum. E. Sussex, U.K.
Butterworth, Jeff, Andrew Davidson, Stephen Hench, and Marc T. Olano. 1992. 3DM: A
Three Dimensional Modeler Using a Head-Mounted Display, Proc 1992 Workshop on
Interactive 3D Graphics, 135-138.
Deyo, Roderic, and D. Ingebretsen. 1989. Notes on real-time vehicle simulation. ACM
SIGGRAPH ‘89 Course Notes: Implementing and interacting with real-time
microworlds.
Fuchs, Henry, John Poulton, John Eyles, Trey Greer, Jack Goldfeather, David Ellsworth,
Steve Molnar, Greg Turk, Brice Tebbs, and Laura Israel. 1989. Pixel-Planes 5: A
Heterogeneous Multiprocessor Graphics System Using Processor-Enhanced
Memories. Computer Graphics: Proceedings of SIGGRAPH 89, Vol. 23, No. 3, 79-
88.
Holloway, Richard, Henry Fuchs, and Warren Robinett. 1992. Virtual-World Research
at the University of North Carolina at Chapel Hill as of February 1992. Proceedings of
Computer Graphics International. Japan. June.
Hornbeck, Larry. 1989. Deformable-mirror spatial light modulators. Proc. SPIE Volume
1150: Spatial Light Modulators and Applications III. August, San Diego.
Iwata, Hiroo. 1990. Artificial reality with force-feedback: Development of desktop virtual
space with compact master manipulator. Proc. ACM SIGGRAPH 1990.
Levine, Martin D. 1985. Vision in man and machine. McGraw-Hill. New York.
Molnar, S., and H. Fuchs. 1991. Advanced Raster Graphics Architecture, in Foley, J. D.,
A. van Dam, S. K. Feiner, and J. F. Hughes, Computer Graphics: Principles and
Practice, Addison-Wesley. 1990.
Molnar, S., J. Eyles, and J. Poulton . 1992. PixelFlow: High-Speed Rendering Using
Image Composition, Computer Graphics: Proceedings of SIGGRAPH 92, 231-240.
Ouh-Young, Ming, D. V. Beard and F. P. Brooks, Jr. 1989. Force display performs
better than visual display in a simple 6-D docking task. Proceedings of IEEE 1989
Robotics and Automation Conference. Scottsdale, Ariz.
Piantanida, Tom, D. Boman, and J. Gille. 1993. Human perceptual issues and virtual
reality. Virtual Reality Systems. 1:1.
Pimental, Ken, and K. Teixeira. 1992. Virtual Reality: Through the new looking glass.
Intel/Windcrest/McGraw-Hill. New York.
Robinett W., and J.P. Rolland 1992. “A Computational Model for the Stereoscopic Optics
of a Head-Mounted Display.” Presence, 1:1.
Tanner, J.E., and C. Mead. 1984. A correlating optical motion detector. In 1984
conference on VLSI.
Wang, Jih-Fang. 1990. A real-time optical 6D tracker for head-mounted display systems.
PhD dissertation. University of North Carolina, Chapel Hill.