0% found this document useful (0 votes)
119 views9 pages

Laszip: Lossless Compression of Lidar Data: Martin Isenburg

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 9

LASzip: lossless compression of LiDAR data

Martin Isenburg
LAStools
http://laszip.org

Abstract—Airborne laser scanning technology (LiDAR) makes not possible to seek within the file. Addressing these concerns
it easy to collect large amounts of point data that sample the the ASPRS created a simple binary exchange format - the LAS
elevation of the terrain beneath. The LAS format has become the format [1]. It is now the de facto industry standard for storage
de facto standard for storing and distributing the acquired points.
As the sampling density of LiDAR increases so does the size of and distribution of airborne and mobile LiDAR data.
the resulting files. Typical LAS files contain tens to hundreds of Up to LAS 1.3, each point record has a core 20 bytes of
millions points today, but soon billions will be commonplace. which 12 bytes store the x, y, and z coordinate as signed
We describe a completely lossless compression scheme for integers. The header of a LAS file contains scaling factors
LiDAR in binary LAS format versions 1.0 to 1.3. Our encoding for those integers that specify the precision (e.g. such as
and decoding speeds are around one to three millions points per
second and our compressed files are only 7 to 25 percent of the 0.01 for cm and 0.001 for mm). The other 8 bytes store
original file size. Compression and decompression happen on-the- intensity, scan angle, return count, classifications, etc. This
fly in a streaming manner and random-access is supported with a completes the basic point type 0 (for details see Table I). The
default granularity of 50,000 points. A reference implementation point types 1 and 3 add an 8 byte GPS time and the point
unencumbered by patents or intellectual property concerns is types 2 and 3 provide 6 bytes to store an RGB color. The
freely available with an LGPL-license, making the proposed
compression scheme suitable to become part of the LAS standard. LAS 1.3 specification introduced the point types 4 and 5 that
allow attaching full waveform information to each return (with
controversial design choices) but they are not used much.
I. I NTRODUCTION One of the great features of the LAS format is that it stores
Low flying aircrafts equipped with modern laser-range scan- the coordinates as scaled and offset integers—thereby requir-
ning technology (LiDAR) collect precise elevation information ing the producers to think about the actual precision in their
for entire cities, counties, or even states. Shooting 100,000 or scanned data samples and to choose appropriate increments
more laser pulses per second onto the earth’s surface they such as 0.01 meters (or feet) for storing the coordinates. This
take measurements at resolutions exceeding one point per eliminates the unnecessary if not disastrous bloat of double-
square meter. Derivatives of this data such as digital elevation precision floats or 20 digit ASCII representations where 15
maps are subsequently used in numerous applications: to of the 20 digits are really just scanner noise. The absence of
assess flood hazards, plan solar and wind installations, carry incompressible noise makes it possible to efficiently compress
out forest inventories, aid in power grid maintenance, etc. the LiDAR points in a completely lossless manner.
However, the sheer amount of LiDAR data collected poses Generic compression schemes are not well suited to com-
a significant challenge as not millions but billions of elevation press LiDAR because they do not have the insights into the
samples need to be stored, processed, and distributed. structure of the data to properly model the probabilities of
The scanner records the waveform of the returning reflection certain patterns to occur. The WinZIP compressor does not
for each laser pulse that it sends out. The intensity peaks of compress well while the WinRAR compressor is extremely
this waveform correspond to points that were hit by the laser slow. Neither scheme is suited for streaming or for random-
and that reflected significant portions back to the sensors on access decompression, which means the entire file needs to be
the plane. There can be multiple peaks because the laser may decompressed before its contents can be accessed.
hit several surfaces such as wires or antennas, branches, leaves, In this paper we introduce LASzip, a lossless compressor for
or even birds in flight before reaching the ground. Each peak LiDAR stored in the LAS format. It delivers high compression
above a certain threshold is called a return. The coordinates rates at unmatched speeds and supports streaming, random-
of these returns together with intensity, scan angle, GPS time, access decompression. The source code is available with
return number, flight line ID, etc. are the data of interest. LGPL-license and was integrated into the open source libraries
Fresh off the scanner, the LiDAR data is typically stored LASlib of LAStools [2] and libLAS [3]. There is native
in a binary, vendor-specific format. But to exchange the data support for reading and writing LAZ in FME 2012, TopoDOT,
between users and across different software packages it was VoyagerGIS, and LAStools and others are following. LAZ is
traditionally converted into a simple ASCII representation used internally at USACE, Certainty3D, Watershed Sciences,
where each line was listing the attributes of a single return. Riegl, and others. Data providers such as Open Topography
While flexible and easy to understand, storing millions (or now provide LAZ as an compressed download option [4] and the
billions) of LiDAR returns in a textual format is cumbersome: DNR of Minnesota hosts LiDAR for 40 counties exclusively
the file size grows large, parsing the data is inefficient, and it is in LAZ format with plans to complete the entire state [5].
name of atomic item size point type and size
II. BACKGROUND 0 1 2 3 4 5
Before describing the LASzip compressor some preliminar- point attributes format size 20 28 26 34 57 63
ies about coordinate precision, the LAS format, related work POINT10 20 bytes x x x x x x
in point compression, and entropy and difference coding. X int 4 bytes x x x x x x
Y int 4 bytes x x x x x x
A. Floating-Point Precision vs. Integer Precision Z int 4 bytes x x x x x x
Intensity u short 2 bytes x x x x x x
There is a common miss-conception that floating-point rep- Return Number 3 bits 3 bits x x x x x x
resentations provide more precision than integer presentations Number of Returns of Pulse 3 bits 3 bits x x x x x x
for storing the x, y, z coordinates of a point. They do not. Scan Direction Flag 1 bit 1 bit x x x x x x
Edge of Flight Line 1 bit 1 bit x x x x x x
The coordinates of LiDAR points from an airborne or a Classification u char 1 byte x x x x x x
mobile survey are spread out in the x-y plane with uniform Scan Angle Rank u char 1 byte x x x x x x
distribution, in the sense that there are roughly the same User Data u char 1 byte x x x x x x
Point Source ID u short 2 bytes x x x x x x
number of points per square meter everywhere and that the
points are acquired with roughly the same precision. There GPSTIME10 8 bytes x x x x
will not be one particularly dense area where points need to GPS Time double 8 bytes x x x x
be stored with higher precision. The floating-point format is
RGB12 6 bytes x x x
not designed for storing uniform distributions of numbers. Red u short 2 bytes x x x
Storing a number in floating-point representation means that Green u short 2 bytes x x x
the precision of the number will vary depending on the value of Blue u short 2 bytes x x x
that number. The closer it gets to zero the more precision it will WAVEPACKET13 29 bytes x x
have. This makes it a good format, for example, for numerical Wave Packet Descriptor Index u char 1 byte x x
computations where more precision is needed closer to zero. Bytes Offset to Waveform Data u int64 8 bytes x x
But using the floating-point format to store point coordinates Waveform Packet Size in Bytes u int 4 bytes x x
Return Point Waveform Location float 4 bytes x x
means that there is an increasingly precise spacing of data X(t) float 4 bytes x x
samples around one point—the origin—at the expense of an Y(t) float 4 bytes x x
increasingly imprecise spacing farther away. Z(t) float 4 bytes x x
An example: in single-precision floating-point there are 223 TABLE I
LAS ZIP GROUPS THE ATTRIBUTES OF THE POINT TYPES 0 TO 5 OF THE
different numbers to represent a coordinate between 2 and LAS 1.3 FORMAT INTO FOUR ATOMIC ITEMS : POINT10, GPS10, RGB12,
4 meters with a spacing of 2/223 = 0.00000023841 meter, AND WAVEPACKET13 THAT ARE THEN COMPRESSED SEPARATELY.
there are 223 different numbers to represent a coordinate
between 128 and 256 meters with a spacing of 128/223 =
B. The LAS format
0.00001525 meter, there are 223 different numbers to represent
a coordinate between 524, 288 and 1, 048, 576 meters with To facilitate the exchange of LiDAR data between data
a spacing of 524, 288/223 = 0.0625 meter, and of course vendors, users, and different software packages, the ASPRS
there are also 223 different numbers to represent a coordinate created LAS as a simple binary exchange format [1]. A LAS
between 2, 097, 152 and 4, 194, 304 meters with a spacing of file of the 1.0 - 1.3 family consists of a header that can be
4, 194, 304/223 = 0.25 meter. If you notice the pattern you followed by any number of variable length records before the
already know that there will also be 223 different numbers actual point data begins. The first 227 header bytes define the
to represent a coordinate between 4, 194, 304 and 8, 388, 608 content of a LAS file: the number of variable length records,
meters with a spacing of 4, 194, 304/223 = 0.5 meter. the offset to the start of the points, the type and size of each
In summary, if you store the easting and northing of your point, the number of points, the offsets and scale factors for the
coordinates directly in floating-point they may retain just 0.5 integer point coordinates, and a bounding box that describes
meters of precision. If you subtract a constant offset from your the extends in x, y, and z of all points in the file.
coordinates so the origin falls into the middle of the bounding In LAS 1.3, where each point can have an attached wave-
box, then the samples near the origin are stored with incredible form, there are 235 header bytes. The extra 8 bytes describe the
precision ... much much more than LiDAR has. start of the waveform data. If this field is zero the waveforms
The appropriate format for storing the coordinates of LiDAR are stored in an external WDP file. If this field is non-zero
points are properly scaled and offset integers. They offer the waveforms are stored inside the LAS file after the point
much more uniform precision than a corresponding floating- block and the field contains the offset to the start of the
point value for the same number of bits: a 32 bit integer, waveform data. LASzip does not (yet) support including the
for example, offers 7 bits more uniform precision than a 32 waveform data inside the LAZ file but always writes it to
bit floating-point number [6] and similarly a 64 bit integer an external WDP file instead—at the moment uncompressed.
offers 10 bits more than a 64 bit float. In order to increase the There is, however, an (undocumented) option in place that will
coordinate range for a large-scale LiDAR collect, the correct compress the waveforms to a more compact WDZ file.
thing to do is to move from 32 bit to 64 bit integers. The LAS Full waveform LiDAR data in LAS 1.3 format is currently
format [1] uses scaled and offset 32 bit integers. only produced by one vendor. Apparently, the mechanism for
waveform storage was quickly added to the LAS standard grow with the number of points. Usually, progressive
to meet the needs of one hardware vendor without seeking streaming also falls into this category as decompressing
mutual consensus among all scanner manufacturers first. There the points to full precision requires keeping all previously
is almost no publicly available waveform data stored in the decompressed coarser points in memory.
LAS 1.3 format and there are only a few software products that • point-permuting versus order-preserving
can make use of the waveform data in LAS 1.3 files. Therefore Point-permuting schemes do not preserve the original
we postpone the details for full waveform compression for point ordering in the file. Their compression gains come
LAS 1.3 until it becomes more relevant. in large parts from imposing a clever canonical ordering
The point types 0 to 5 available in LAS 1.3 and the onto the points that results in small residuals. Order-
attributes they are composed of are detailed in Table I. The preserving schemes do not re-order the points. They
LASzip compressor views point types as compositions of compress more information about each point as they also
four different atomic items: POINT10, GPSTIME10, RGB12, need to specify one of n! possible point permutations.
and WAVEPACKET13 that are compressed separately. For • sequential versus random-access
example, a point of type 3 is composed of POINT10 followed Sequential schemes decompress the points in the order
by GPSTIME10 and RGB12. Additionally each point may they are encoded into the compressed file. Random-
have n so called “extra bytes” at the end, each of which is access schemes can seek in the compressed file and
currently considered as a BYTE item. These “extra bytes” only decompress a particular part. The granularity of the
occur when the LAS header specifies a point size larger than random access is typically limited to blocks of points.
required by the respective point type. For example if the point The LASzip compressor is lossless, non-progressive,
type is 1 and the point size is 32 then there are 4 “extra bytes”. streaming, order-preserving, and provides random-access.
C. Point Compression D. Related Work
The compression of points has been extensively studied in The seminal geometry compression paper by Deering [7]
the context of computer graphics where a point set typically is sparked the development of a number of compression schemes
a dense sampling of a three-dimensional object. We distinguish for meshes [8], [9], [10], [11], [12], [13] that can also be
the following qualities in a compression scheme: thought of as point compression schemes that encode addi-
• lossy versus lossless tional information (i.e. the mesh connectivity). Nearly all point
Lossy schemes compress the shape the points represent compression schemes assume that the original order of the
rather than the exact point coordinates by allowing their points is meaningless and permute them as they see fit during
positions to change slightly as long as they remain encoding to maximize the achieved compression. Because
faithful to the underlying surface. They are mainly used LASzip aims at compressing LAS files exactly—without any
in visualization-only applications. Lossless schemes com- modification—reordering the points is not an option.
press point coordinates represented with uniform preci- The kd-tree approach of Devillers and Gandoin [10], [14]
sion as scaled integers (in literature often referred to as recursively bisects a quantized bounding box along all three
“after bounding box quantization” or as “after quantizing axis always encoding the number of points in one half. The
to a certain number of bits per coordinate”) exactly. oct-tree approach of Botsch et al. [15] recursively entropy
• progressive versus non-progressive (or single-rate) codes for all eight child nodes whether they contain points or
Progressive schemes compress the data such that the de- not with an 8-bit symbol. Peng and Kuo [12] and Schnabel and
coder can immediately display a lower resolution version Klein [16] use prediction schemes to further improve the bit-
of the points while detail is added as the decompression rates of the oct-tree approach. Spatial subdivision approaches
progresses. They are mainly used for instant feedback have the drawback that they do not generalize to include
in an interactive visualization setting. Non-progressive attribute data such as a GPS time or an RGB color.
schemes have only one rate of resolution and decompress The method of Waschbüsch et al. [17] generates a binary
the points at full precision. They are mainly used as tree over the points by pairing close-by points that are replaced
an I/O friendly, alternate format of the point data for by their centroid to form the next coarser level. The method of
transmission or storage, or to take load of a file server. Gumhold et al. [18] incrementally constructs a prediction tree
• streaming versus non-streaming by greedily attaching the next point to the tree such that it has
Streaming schemes start compressing the points and out- the smallest possible residual and compress the tree topology
putting the compressed file after reading only a fraction and the residuals in a streaming fashion. Merry et al. [19]
of them and vice-versa start decompressing the points present a more elaborate prediction tree variation that uses a
and outputting the decompressed file after reading only globally minimal spanning tree and a set of predictors.
a fraction of the compressed data. The memory footprint Quite similar to LASzip are the commercially available
remains tiny in comparison to the data they process. Non- LizardTech R
LiDAR compressorTM [20] and the LASCom-
streaming schemes read all points into memory either pression software [21] that implements the method of Mongus
during compression, during decompression, or both. They and Zalik [22]. Both schemes specifically target LiDAR points
usually need to construct temporary data structures that stored in the LAS format and compress them losslessly.
By default the LizardTech’s LiDAR compressor [20] en- item has its own compressor with its own version number
codes points in blocks of 4,096, performing a simplified making the compressor modular and easy to extend to future
Haar wavelet transform on each array of point attributes point types. This document describes LASzip 2.0 that uses
individually. Pairs of attribute values are recursively replaced version 2 compression for all items except WAVEPACKET13.
by an average coefficient, which is simply the left value, and The earlier LASzip 1.0 uses version 1 compression exclusively
its corresponding detail coefficient, which is the right minus and does not allow random access decompression. While still
the left value. Because high-order bits of detail coefficients supported in software for backward compatibility, we do not
tend to be zero they can compressed efficiently bit-plane by describe the LASzip 1.0 compressor in this document.
bit-plane using arithmetic coding [23]. The 8 byte floating- The LASzip compressor always encodes the points in
point GPS time that is part of some point types is compressed chunks of points to allow seeking in the compressed file. The
using standard DEFLATE. Besides the compressed contents default chunk size is currently set to 50,000 points. Chunking
of the LAS file, the resulting MrSID file also stores spatial makes it possible to, for example, augment the produced LAZ
indexing information to support area-of-interest queries. files with spatial indexing LAX files [2] and support area-
The LASCompression software [21] operates very similar of-interest queries that decompress only the relevant parts of
to LASzip in the sense that it predicts the attributes of a a compressed LAZ file. Because each compressed chunk is
point from previous points with a set of prediction rules and different in size, the compressor stores a chunk table at the
compresses the corrective deltas with arithmetic coding. In end of the file that specifies the starting byte of each chunk.
particular, the authors use a clever scheme for predicting the When starting a new chunk, the LASzip compressor stores
linear dependencies between successive points that correspond the first point as raw bytes and then initializes the entropy
to returns from the same pulse by using the already encoded coder. This point is then used as the initial value for subsequent
deltas for x to improve the predictions of y and z. prediction schemes. All following points are compressed item
by item with the compression schemes detailed below.
E. Entropy and Difference Coding
A. Compressing POINT10 (version 2)
An entropy coder turns a sequence of symbols into a com-
pact stream of bits while using knowledge about the (uneven) First, the compressor encodes a bit-mask of 6 bits that
distribution of symbols to store them more compactly—up to specifies whether any of the following attributes have changed
the theoretical optimum. As the symbol distribution is often in comparison to the previously processed point:
not known in advance, an adaptive entropy coder initially • intensity: the 2 bytes that specify the unsigned 16 bit
assumes it to be uniform and learns the actual distribution intensity value i of the point stored in bytes 12 and 13.
along the way. When a symbol stream is expected to have • return bits: the 3 + 3 + 1 + 1 bits that specify the return
different distributions given “context” information available number r, the number of returns of given pulse n, and
to the compressor, it is beneficial to switch between different scan direction and edge of flight line stored in byte 14.
contexts while encoding the symbols. The entropy coder used • classification bits: the 5 + 1 + 1 + 1 bits that specify
by LASzip is based on a fast implementation of adaptive, the 5-bit ASPRS classification of the point c and the
context-based arithmetic coding by Amir Said [24]. synthetic, keypoint, and withheld flags stored in byte 15.
A difference coder compresses the current value as the • scan angle rank: the 1 byte that specifies the signed 8-bit
difference to a previous value. This is most effective when the scan angle a of the point stored in byte 16.
distribution of differences has a much tighter spread and there- • user data: the 1 byte that specifies the unsigned 8-bit
fore a much lower entropy than the distribution of values. The user data u of the point stored in byte 17.
difference coder used by LASzip entropy codes the number k • point source ID: the 2 bytes that specify the unsigned
that describe the tightest interval [−(2k − 1), +(2k )] that the 16-bit flight line p of the point stored in bytes 18 and 19.
difference falls into, entropy codes up to 8 of its highest bits Next, the compressor encodes all attributes that have
as one symbol while switching contexts for different k <= 8, changed. It encodes the return bits while switching between
and stores any remaining lower bits raw. 256 entropy contexts in dependence on the previous return
bits byte. The return number r and the number of returns of
III. T HE LAS ZIP COMPRESSOR given pulse n are used to derive two numbers that are used for
LASzip does not compress the LAS header or any of the switching contexts later: a return map m and a return level l.
variable length records. It simply copies them unmodified from The return map m simply serializes the valid combinations
the LAS to the LAZ file. It however adds 128 to the value of of r and n: for r = 1 and n = 1 it is 0, for r = 1 and n = 2
the current point type to prevent standard LAS readers from it is 1, for r = 2 and n = 2 it is 2, for r = 1 and n = 3 it
attempting to read a compressed LAZ file. It also adds one is 3, for r = 2 and n = 3 it is 4, for r = 3 and n = 3 it is
variable length record that specifies the composition of the 5, for r = 1 and n = 4 it is 6, etc. Unfortunately, some LAS
compressed points and various compression options used. files start numbering r and n at 0, only have return numbers
The LASzip compressor views the different point types of r, or only have number of return of given pulse counts n. We
the LAS 1.3 specification as compositions of items: POINT10, therefore complete the table to also map invalid combinations
GPSTIME10, RGB12, WAVEPACKET13, and BYTE. Each of r and n to different contexts as shown below.
file size compression enc. time dec. time file size compression enc. time dec. time
file name [MB] ratio [sec] [sec] file name [MB] ratio [sec] [sec]
LAS LAZ SID LAZ SID LAZ SID LAZ SID LAS LAZ LCMP LAZ LCMP LAZ LCMP LAZ LCMP
5126-05-57.las 287 21 68 13.4 4.2 4.1 85 20 62 5126-05-57.las 287 21 26 13.4 11.1 21 119 20 125
5126-05-58.las 312 29 80 10.6 3.9 4.7 97 22 72 5126-05-58.las 312 29 38 10.6 8.2 21 161 22 149
5126-05-59.las 363 45 104 8.1 3.5 5.9 120 25 100 5126-05-59.las 363 45 59 8.1 6.1 24 240 25 120
5126-05-60.las 287 21 68 13.5 4.2 4.1 87 20 58 5126-05-60.las 287 21 26 13.5 11.0 19 120 20 65
5126-05-61.las 286 21 68 13.4 4.2 4.1 87 20 58 5126-05-61.las 286 21 26 13.4 11.1 17 122 20 65
total 1,534 138 388 11.1 4.0 23 476 106 350 total 1,534 138 175 11.1 8.8 102 762 106 524
1942-29-59.las 486 83 144 5.9 3.4 18 242 35 156 1942-29-59.las 486 83 94 5.9 5.2 42 212 35 276
1942-29-60.las 485 81 142 6.0 3.4 15 246 33 154 1942-29-60.las 485 81 91 6.0 5.3 40 281 33 363
1942-29-61.las 480 80 140 6.0 3.4 14 234 33 153 1942-29-61.las 480 80 90 6.0 5.3 37 348 33 355
1942-29-62.las 464 77 135 6.0 3.4 13 224 31 143 1942-29-62.las 464 77 87 6.0 5.3 37 343 31 345
1958-23-23.las 539 86 156 6.3 3.4 14 268 37 173 1958-23-23.las 539 86 97 6.3 5.5 41 380 37 379
total 2,454 407 716 6.0 3.4 74 1,214 169 779 total 2,454 407 460 6.0 5.3 198 1,564 169 1,718
TABLE II TABLE III
P ERFORMANCE COMPARISON BETWEEN LAS ZIP (LAZ) AND THE P ERFORMANCE COMPARISON BETWEEN LAS ZIP (LAZ) AND THE
L IZARD T ECH L I DAR COMPRESSOR (SID) IN COMPRESSION RATIO AND LASC OMPRESSION CODER (LCMP) [21] IN COMPRESSION RATIO AND
ENCODING / DECODING TIMES FOR L I DAR OF THE M INNESOTA DNR [5]. ENCODING / DECODING TIMES FOR L I DAR OF THE M INNESOTA DNR [5].

The compressor then encodes the classification bits while


const U8 return_map_m[8][8] = switching between 256 entropy contexts depending on the
{ previous return classification byte. There is a potential to
{ 15, 14, 13, 12, 11, 10, 9, 8 }, improve compression further by switching contexts based on
{ 14, 0, 1, 3, 6, 10, 10, 9 }, the return map m as, for example, a single return is more
{ 13, 1, 2, 4, 7, 11, 11, 10 }, likely to be classified as “building” or “ground” whereas the
{ 12, 3, 4, 5, 8, 12, 12, 11 }, first return of many is more likely to be “vegetation” or “wire”.
{ 11, 6, 7, 8, 9, 13, 13, 12 }, We can expect a modest compression gain from this and plan
{ 10, 10, 11, 12, 13, 14, 14, 13 }, to implement this for compressing the new point types of the
{ 9, 10, 11, 12, 13, 14, 15, 14 }, recently released LAS 1.4 specification [1].
{ 8, 9, 10, 11, 12, 13, 14, 15 }
The LASzip compressor then encodes the scan angle rank
};
as the difference to the previous scan angle rank. It switches
The return level l specifies how many returns there have between two entropy contexts based on the scan direction flag.
already been for a given pulse prior to this return. Given only Next, LASzip encodes the user data while switching between
valid combinations for the return number r and the number of 256 entropy contexts in dependence on the previous user data
returns of given pulse n we could compute it as l = n − r. byte, before it encodes the point source ID as the difference
But we again use a completed look-up table as shown below to the previous point source ID. Remember that each of these
to map invalid combinations for r and l to different contexts. six attributes is only encoded if its value has changed.
Finally the compressor encodes the x, y, and z coordinates.
const U8 return_level_l[8][8] = Rather than compressing coordinates directly, LASzip predicts
{ them from previous points and entropy codes the difference.
{ 0, 1, 2, 3, 4, 5, 6, 7 }, For the x and y coordinate it uses a second order predictor:
{ 1, 0, 1, 2, 3, 4, 5, 6 }, it predicts the coordinate differences dx and dy between the
{ 2, 1, 0, 1, 2, 3, 4, 5 }, previous and the current point as the median of the five
{ 3, 2, 1, 0, 1, 2, 3, 4 }, immediately preceding differences of points with the same
{ 4, 3, 2, 1, 0, 1, 2, 3 }, return map m. The intuition behind this is, for example, that
{ 5, 4, 3, 2, 1, 0, 1, 2 }, single returns are always from a different laser pulse than the
{ 6, 5, 4, 3, 2, 1, 0, 1 }, previous point and therefore have a wider spacing in x and/or
{ 7, 6, 5, 4, 3, 2, 1, 0 } y than the middle of three returns.
}; For the z coordinate (the elevation) LASzip uses a first order
predictor: it predicts z as the elevation of the immediately
The LASzip compressor then encodes the intensity as a preceding point of the same return level l. The intuition is, for
difference to the most recent intensity with the same return example, that in a forested area a higher return level l signals
map m. The intuition behind this is that, on average, a single a deeper penetration into the forest canopy and therefore a
return (where r = 1, n = 1, m = 0) tends to have a different lower elevation. However, the first of a double return hitting a
intensity than the first return of a double return (where r = 1, power-line has the same return level as a single return hitting
n = 2, m = 1) or the last return of a triple return (where r = the ground. We can get a small compression gain from using
3, n = 3, m = 5). The compressor also switches between 4 the return map m instead of the return map l. We plan to
entropy contexts m = 0, m = 1, m = 2, and m > 3 to further implement this for compressing the new point types of the
correlate the expected differences in intensity distributions. recently released LAS 1.4 specification [1].
Fig. 1. The data-sets used in Tables II, III, and IV are provided by the DNR Minnesota [5]. Shown are various derivatives such as false-color elevation,
standard deviation of elevation, highest intensity, hill-shaded elevation, and point densities generated with lasgrid and blast2dem from LAStools [2].

B. Compressing GPSTIME10 (version 2) coded when the GPS times are identical. LASzip starts a
new context when the delta overflows a 32 bit integer. For
The GPS times of a single flight path are a monotonically
that it difference codes the 32 higher bits of the current GPS
increasing sequence of double-precision floating-point num-
time and the current context and stores the lower 32 bits raw.
bers where returns of the same pulse have the same GPS time
Otherwise it switches to the specified context (where the delta
and where subsequent pulses have a more or less constant
will not overflows a 32 bit integer) and recurses. The current
spacing in time. While the LASzip compressor is optimized
deltas stored with each context are updated to the actual delta
for compressing single flight paths it will handle any GPS time
when they were outside the predictable range more than 3
sequence. The compression ratio depends on how far the input
consecutive times (i.e. bigger than 500 times the current delta,
is from the expectations. For randomly permuted points it will
smaller than -10 times the current delta, or zero).
be terrible. For multiple flight paths that have been sorted into
Currently the LASzip compressor does not make use of
tiles one after another it will be excellent.
known data from the already compressed item POINT10
For compression purposes LASzip treats double-precision
for compressing GPSTIME10. However, if return counts and
floating-point GPS times as if they were signed 64 bit integers
point source IDs are populated correctly there is significant
and predicts the deltas between them. As prediction contexts,
correlation that can be exploited. For example, subsequent
it stores up to four previously compressed GPS times with
returns of the same pulse are likely to have the same exact
corresponding deltas. Keeping multiple prediction contexts
GPS time, and subsequent returns with different point source
can account for repeated jumps in GPS time that arise when
IDs are likely to require a context switch. We plan to exploit
multiple flight paths are merged with fine spatial granularity.
this when compressing the new point types of the recently
LASzip distinguishes several cases that are entropy coded
released LAS 1.4 specification which include the GPS time as
with 516 symbols depending on if the current GPS time is
an integral part of the point [1].
0 predicted with a delta of zero.
1–500 predicted using the current delta times 1 to 500. C. Compressing RGB12 (version 2)
501–510 predicted using the current delta times -1 to -10. LAS uses unsigned 16 bit integers for the R, G, and B
511 identical to the last. channel. Some files—incorrectly—populate only the lower 8
512 starting a new context. bits so that the upper 8 bits are zero. Other files—correctly—
513–515 predicted with one of the other three contexts. multiply 8-bit colors with 256 so that the lower 8 bits are zero.
For the first three cases LASzip subsequently difference codes The LASzip compressor therefore compresses the upper and
the delta prediction and the actual delta. Nothing further is lower byte of each channel separately. First it entropy codes 6
100.0
bits that specify which bytes have changed as one symbol. For
all bytes that have changed it then entropy codes the difference
to the respective previous byte modulo 256.
The channels are encoded in the order R, G, and B. 80.0

Differences encoded in earlier channels are used to predict


differences in later channels as there tends to be a correlation
in the intensity across channels—especially for gray colors.
60.0
For example, if there was a byte difference in the low byte
of the R channel that difference is added to low byte of the
G channel which—clamped to a 0 to 255 range—becomes
the value to which the difference of the current low byte is 40.0

computed.

D. Compressing WAVEPACKET13 (version 1) 20.0

The LASzip compressor supports compression of point 5K 10K 20K 50K 75K 100K
types 4 and 5 that contain wave packet information. However,
the current scheme is still in version 1 as it has yet to be uncompressed compressed size [MB]
file name size 5K 10 K 20 K 50 K 75 K 100 K
optimized. So far there has been very little real-world demand 5126-05-57.las 287 24.3 22.9 22.0 21.3 21.2 21.2
for compressing LAS files containing waveform data simply 5126-05-58.las 312 33.4 31.5 30.2 29.4 29.1 29.1
due to a lack of data stored in this format. 5126-05-59.las 363 51.0 48.2 46.2 44.8 44.5 44.1
5126-05-60.las 287 24.4 22.9 22.0 21.3 21.2 21.1
LASzip simply entropy codes the wave packet descriptor 5126-05-61.las 286 24.4 23.0 22.1 21.4 21.2 21.1
index, an unsigned byte that is zero if a point has no waveform total 1,534 157 149 143 138 137 136
and indexes the variable length record describing the format 1942-29-59.las 486 93.7 88.9 85.5 82.7 82.0 81.6
1942-29-60.las 485 91.6 87.0 83.9 81.2 80.5 80.1
of the waveform otherwise. To compress the bytes offset to 1942-29-61.las 480 90.2 85.7 82.6 80.0 79.3 78.9
waveform data it entropy encodes one of 4 possible cases: 1942-29-62.las 464 87.0 82.6 79.5 77.1 76.2 75.9
1958-23-23.las 539 96.8 92.0 88.6 85.9 85.1 84.6
1) same as last offset total 2,454 459 436 420 407 403 401
2) use last offset plus last packet size TABLE IV
3) difference to last offset is less than 32 bits T HE EFFECT OF DIFFERENT CHUNK SIZES 5,000, 10,000, 20,000, 50,000,
4) difference to last offset is more than 32 bits 75,000, AND 100,000 POINTS ON THE COMPRESSION RATES OF LAS ZIP.

In the first two cases no other information is needed. For


the other two cases LASzip difference codes the 32 or the IV. R ESULTS
64 bit numbers. The LASzip compressor difference coded All timings were taken on an old (2005) Dell Inspiron
all remaining fields. Only waveform packet size in bytes D6000 laptop with a 2.13 Ghz Intel processor and an even
is an integer number. The return point waveform location, older (2003) external LaCie 120 GB fire-wire drive. Encode
x(t), y(t), and z(t) are single-precision floating-point numbers timings include reading the uncompressed LAS file from
whose 32 bits are treated as if they were a 32-bit integer. the local disk (the disk cache was flushed) and writing the
compressed LAZ file to the external fire-wire disk. Decode
E. Compressing BYTE (version 2) timings include reading the compressed LAZ file from the
A LAS point may have “extra bytes” because the LAS
header specifies a point size larger than required by the re- LiDAR of Minnesota DNR by county size [GB] compression
spective point type. Each “extra byte” is entropy encoded with name # of files # of points LAS LAZ ratio
cottonwood 216 2,491,327,766 65.0 5.2 12.6
its own context as the difference to the previous “extra byte” douglas 252 2,092,702,039 54.6 5.4 10.2
modulo 256. Treating them as individual bytes is currently freeborn 260 1,713,544,294 44.7 3.7 12.1
the best that the LASzip compressor can do as there is no houston 197 1,450,109,156 37.8 3.9 9.8
jackson 240 2,724,531,642 71.0 5.6 12.6
description in the LAS 1.3 specification what these “extra lincoln 195 2,200,533,847 57.4 4.5 12.8
bytes” may mean. Six “extra bytes”, for example, could be martin 252 2,853,353,232 74.4 5.8 12.9
a single-precision float storing the echo width followed by murray 252 2,872,608,269 74.9 5.8 12.9
pope 247 2,404,624,049 62.7 5.3 11.9
an unsigned short storing the normalized reflectivity. Or it redwood 302 3,505,060,711 91.4 7.3 12.4
could be an unsigned short storing a tile index followed by an sibley 219 2,501,934,963 65.2 5.8 11.2
unsigned integer storing the original index of the point. The swift 282 2,931,687,204 76.4 5.8 13.2
total 2,914 29,742,017,172 776 64 12.1
recently released LAS 1.4 specification now officially has an
“Extra Bytes” variable length record to describe the structure TABLE V
LAS ZIP COMPRESSES 30 BILLION POINTS ( OR 12 COUNTIES WORTH OF
and the individual data types of “extra bytes” [1], which will L I DAR) FROM 776 GB OF LAS DOWN TO 64 GB OF LAZ FOR L I DAR
allow compressing them more appropriately in the future. HOSTED BY THE DNR OF M INNESOTA AT FTP :// LIDAR . DNR . STATE . MN . US .
external fire-wire disk and writing the uncompressed LAS file in Table VI with the number corresponding to the smallest file
back to the local disk. Reading and writing from different size (or the highest compression ratio) being in bold. We also
disks makes the process less I/O bound. Nevertheless, the CPU report the point type and loosely categorize the point order as
usage for LASzip averages only 50% to 70% on this laptop. both have an effect on the compression rates. Point order f
When decompressing LAZ files into memory (not measured means in flight-line order, point order x means sorted along
here) LASzip is entirely CPU-bound running at 99%. some axis, and point order t means some form of tiling.
The most common question about LASzip is how it com- The standard WinZIP algorithm is by far the worst per-
pares to the LizardTech R
LiDAR compressorTM that gen- former. This is noteworthy because WinZIP is still used by
erates the well-known MrSID format [20]. The results in several agencies to provide compressed LAS downloads. In
Table II show a comparison of the two compressors in terms contrast, the generic WinRAR algorithm does surprisingly
of compression ratio, encoding speed, and decoding speed well. While it takes a long time to compress, in terms of
on typical LiDAR tiles that are publicly provided by the compression rate it gives the dedicated LizardTech LiDAR
Minnesota Department of Natural Resources [5]. LASzip is compressor a run for its money. The LT compressor struggles
a clear winner over MrSID in all three performance measures. with point types 1 and 3 that contain an 8 byte floating-point
The compressed LAZ files are 2 to 3 times smaller than the GPS time, which it compresses with the inefficient DEFLATE.
compressed SID files, compression was 16 to 20 times faster, Again, LASzip gives the overall best compression ratio.
and decompression was 3 to 4 times faster. The encode (!) The data sets on which it is outperformed are usually those
times were taken at the Minnesota DNR on a Dell Precision sorted along an axis (i.e. point order x) or those with little
T3400 workstation with a 3.14 Ghz Intel processor. The de- oddities such as “Grass Lake Small”, for example, which has
code times were taken on the Dell laptop using the command- random values in its return number field and its classification
line lidardecode.exe version 1.1.0.2802 [20]. field or “line 27007 dd”, which has a strange z coordinate
We need to be a bit careful in comparing LASzip and scaling. Although not reported here, LASzip is across the
the LizardTech LiDAR compressor because they are aimed board the—by far—fastest algorithm for both compression and
at different work-flows: LASzip is designed to turn large LAS decompression.
files into more compact LAZ files for easier management,
faster transmission, and lower file system I/O. The LizardTech V. D ISCUSSION
product adds extra value by seamlessly integrating into the Compressing LAS with WinZIP, WinRAR, gzip, bzip, or
established raster work-flow of the MrSID file format and any other generic compressor not only means settling for larger
includes multi-resolution support for fast access to the point files and slower encoding and decoding speeds, but also means
data at global scale with an option for lossy compression. that no seeking is possible in the compressed file and that
In Table III we compare the performance of LASzip with the accessing any part of the LiDAR requires to completely de-
LASCompression software [21] that implements the algorithm compress the file first. LASzip allows you to treat compressed
described by Mongus and Zalik [22] on the exact same data- LAZ files just like standard LAS files. You can load them
sets. LASzip consistently achieves between 15 to 25 percent directly from compressed form into your application without
higher compression, encodes 7 to 8 times faster, and decodes needing to decompress them onto disk first. The availability of
5 to 10 times faster. These are end-to-end wall-clock timings two APIs, LASlib [2] and libLAS [3], with LASzip capability
taken under the exact same conditions for I/O performance of makes it easy to add native LAZ read/write support to your
the laptop / fire-wire disk configuration. While compression own software package. The LASzip source code is available
rates are comparable, LASzip clearly excels in speed. on the website indicated above.
The impact of different chunk sizes on the achieved com- The LASzip compressor is optimized for the case where
pression is illustrated in Table IV. Smaller chunk sizes mean points are stored more or less in scanner acquisition order in
less compression as the adaptive entropy coder resets at the the LAS file and the compression rates degrades the farther
start of each chunk and needs to relearn all symbol distribu- the file is from that assumption. If a LAS file is a tile that is
tions, which negatively affects compression. There is little to part of a larger tiling, the best compression rates are achieved
be gained from chunk sizes larger than 50,000 and there is no when the flight-lines that pass through the tile are kept in
reason for choosing chunk sizes smaller than 5,000 as LiDAR acquisition order (e.g. like lastile from LAStools does it).
is usually processed in increments of millions of points. Some LiDAR processing software disturbs the original point
A large-scale user of LASzip is the Department of Natural order and produces seemingly meaningless point permutations.
Resources of Minnesota that—at the time of writing—hosts When compressing large LiDAR collects to be offered via a
40 counties of publicly accessible LiDAR in LAZ format [5]. web server to a large audience it may make sense to first re-
An overview of the savings in data storage, transmission band- order the points of each tile into acquisition order (e.g. with
width, and download time for 12 of those counties is detailed lassort from LAStools).
in Table V. The 30 billion points that would take up 776 GB The compressed LAZ files can - just like the original LAS
if stored as LAS files compress down to 64 GB as LAZ. files - be used in conjunction with the small spatial indexing
Compression performance across a large smorgasbord of LAX files (that can be produced with lasindex). This
typical, experimental, as well as unusual LAS files is reported supports efficient area-of-interest queries when reading the
original LAS file point compression ratio total file size in MB
name size in bytes type order ZIP RAR LCMP SID LAZ LAS ZIP RAR LCMP SID LAZ
Grass Lake Small 123,876,781 0 x 2.6 6.1 7.0 6.9 6.2 118 46 19 17 17 19
LASFile 1 48,097,847 0 f 1.9 3.8 4.8 4.3 4.8 46 24 12 10 11 10
LASFile 2 44,168,907 0 f 1.9 3.8 4.9 4.5 4.9 42 22 11 9 9 9
LASFile 3 16,782,887 0 f 1.8 3.6 4.6 4.2 4.7 16 9 4 3 4 3
LASFile 4 48,471,887 0 f 1.9 3.9 4.9 4.5 5.0 46 24 12 9 10 9
LDR030828 212242 0 59,672,207 0 f 1.9 3.8 4.8 4.4 4.9 57 30 15 12 13 12
LDR030828 213023 0 58,414,787 0 f 1.9 3.8 4.8 4.5 5.0 56 29 15 12 12 11
LDR030828 213450 0 53,215,067 0 f 1.9 3.8 4.8 4.3 4.9 51 27 13 11 12 10
Lincoln 185,565,975 0 x 1.7 6.1 6.1 6.1 6.5 177 106 29 29 29 27
line 27007 dd 107,603,879 0 x 2.4 3.5 3.6 4.0 3.8 103 42 30 28 26 27
MARS Sample Filtered LiDAR 163,225,753 0 x f t 2.8 6.5 7.1 6.3 8.3 156 56 24 22 25 19
Mount St Helens Nov20 2004 115,737,877 0 x 2.2 12.4 13.4 12.9 12.6 110 51 9 8 9 9
Mount St Helens Oct4 2004 134,868,035 0 x 1.7 6.1 6.7 6.4 6.6 129 78 21 19 20 19
ncwc000008 63,161,789 0 ft 1.8 3.1 3.5 3.1 3.4 60 34 19 17 19 18
Palm Beach Pre Hurricane 51,612,715 0 x 1.6 6.6 7.0 6.5 6.8 49 30 7 7 8 7
Dallas 104,639,368 1 ft 1.8 5.2 6.2 2.8 7.5 100 55 19 16 36 13
IowaDNR-CloudPeakSoft-1.0-UTM15N 163,727,279 1 ft 2.5 8.8 8.3 3.3 11.2 156 62 18 19 47 14
LDR091111 181233 1 54,609,113 1 f 1.8 4.4 5.1 2.8 5.6 52 29 12 10 19 9
LDR091111 182803 1 54,255,417 1 f 1.8 4.5 4.9 2.8 5.7 52 28 12 11 19 9
merrick vertical 1.2 54,609,113 1 f 1.8 4.4 5.1 2.8 5.6 52 29 12 10 19 9
S1C1 strip021 78,220,943 1 f 2.0 6.2 7.8 3.4 8.4 75 37 12 10 22 9
Serpent Mound Model LAS Data 91,423,839 1 f 2.2 5.6 6.9 3.3 8.8 87 40 15 13 27 10
Tetons 104,800,536 1 ft 2.0 4.8 5.0 2.8 6.0 100 50 21 20 35 17
USACE Merrick lots of VLRs 101,081,369 1 ft 1.8 4.6 5.2 2.9 5.5 96 54 21 19 33 18
LAS12 Sample withRGB Quick Terrain Modeler 99,156,855 2 x 2.7 6.5 error 6.5 7.4 95 35 14 error 15 13
xyzrgb manuscript 56,046,269 2 t 2.9 8.5 10.2 9.7 10.9 53 18 6 5 5 5
autzen-colorized-1.2-3 362,213,959 3 ft 2.1 2.3 5.2 3.1 6.6 345 164 148 66 119 52
total 2,599,260,453 2.0 4.5 5.9 4.1 6.4 2,479 1,210 551 425 610 388
TABLE VI
C OMPRESSION PERFORMANCE BETWEEN W IN ZIP (ZIP), W IN RAR (RAR), LASC OMPRESSION (LCMP), THE L IZARD T ECH L I DAR COMPRESSOR
(SID), AND LAS ZIP (LAZ) FOR TYPICAL , EXPERIMENTAL , AS WELL AS UNUSUAL LAS FILES THAT ARE AVAILABLE AT HTTP :// LIBLAS . ORG / SAMPLES .

LAS/LAZ files with any LAStools tool or any application that [9] G. Taubin and J. Rossignac, “Geometric compression through topologi-
uses the LASlib API [2] to read or write LAS or LAZ files. cal surgery,” ACM Transactions on Graphics, 17 (2), pp. 84–115, 1998.
[10] O. Devillers and P.-M. Gandoin, “Geometric compression for interactive
ACKNOWLEDGMENT transmission,” in Proc. of IEEE Visualization 2000, 2000, pp. 319–326.
[11] M. Isenburg and S. Gumhold, “Out-of-core compression for gigantic
The author would like to thank those who have down- polygon meshes,” in SIGGRAPH 2003 Proceedings, 2003, pp. 935–942.
loaded “LAStools” and sent useful feature requests or bug [12] J. Peng and C. C. Kuo, “Geometry-guided progressive lossless 3d
reports, Howard Butler for suggesting “chunking”, and Tim mesh coding with octree (OT) decomposition,” in SIGGRAPH ’05
Proceedings, 2005, pp. 609–616.
Loesch from the DNR Minnesota for the first large-scale [13] M. Isenburg, P. Lindstrom, and J. Snoeyink, “Streaming compression
LAZ campaign. Financial support for version 2.0 of LASzip of triangle meshes,” in Proceedings of the 3rd Symposium on Geometry
was provided by Dave Finnegan from USACE Cold Regions Processing, 2005, pp. 111–118.
[14] O. Devillers and P.-M. Gandoin, “Progressive and lossless compression
Research and Engineering Laboratory and by Hobu Inc. in of arbitrary simplicial complexes,” in SIGGRAPH’2002, pp. 372–379.
conjunction with other open source efforts it operates including [15] M. Botsch, A. Wiratanaya, and L. Kobbelt, “Efficient high quality
libLAS and PDAL. Michael P. Gerlek from Flaxen Geo and rendering of point sampled geometry,” in Eurographics Rendering
Workshop, 2002, pp. 53–64.
Howard Butler from Hobu Inc. assisted with the integration
[16] R. Schnabel and R. Klein, “Octree-based point-cloud compression,” in
of LASzip versions 1.0 and 2.0 into libLAS. Eurographics Symposium on Point-Based Graphics, 2006, pp. 111–120.
[17] M. Waschbüsch, M. Gross, F. Eberhard, E. Lamboray, and S. Würmlin,
R EFERENCES “Progressive compression of point-sampled models,” in Eurographics
[1] ASPRS LAS format, “Specifications for the LASer exchange file Symposium on Point-Based Graphics, 2004, pp. 95–102.
format,” accessed on November 21th 2011. [Online]. Available: [18] S. Gumhold, Z. Karni, M. Isenburg, and H. P. Seidel, “Predictive point-
http://www.asprs.org/LAS Specification cloud compression,” in SIGGRAPH ’05 Sketches, 2005, p. 137.
[2] LAStools, “Efficient tools for LiDAR processing.” [Online]. Available: [19] B. Merry, P. Marais, and J. Gain, “Compression of dense and regular
http://www.lastools.org/ point clouds,” Computer Graphics Forum, 25 (4), pp. 709–716, 2006.
[3] libLAS, “A LAS 1.0/1.1/1.2 ASPRS LiDAR data translation toolset.” [20] LizardTech R , “LiDAR compressorTM ,” accessed on October 17th 2011.
[Online]. Available: http://www.liblas.org/ [Online]. Available: http://www.lizardtech.com/
[4] OpenTopography, “A portal to high-resolution topography data and [21] LASCompression, “A lossless compression algorithm for LiDAR
tools.” [Online]. Available: http://www.opentopography.org/ datasets,” accessed on November 19th 2011. [Online]. Available:
[5] Minnesota Department of Natural Resources, “Minnesota state LiDAR http://gemma.uni-mb.si/lascompression/
collect.” [Online]. Available: ftp://lidar.dnr.state.mn.us/ [22] D. Mongus and B. Zalik, “Efficient method for lossless LiDAR data
[6] M. Isenburg, P. Lindstrom, and J. Snoeyink, “Lossless compression compression,” International Journal of Remote Sensing, vol. 32, pp.
of predicted floating-point geometry,” Computer-Aided Design, vol. 37, 2507–2518, May 2011.
no. 8, pp. 869–877, 2005. [23] A. Moffat, R. M. Neal, and I. H. Witten, “Arithmetic coding revisited,”
[7] M. Deering, “Geometry compression,” in SIGGRAPH 95 Conference ACM Transactions on Information Systems, 16 (3), pp. 256–294, 1998.
Proceesings, 1995, pp. 13–20. [24] A. Said, “Fastac: Arithmetic coding implementation,” ac-
[8] C. Touma and C. Gotsman, “Triangle mesh compression,” in Graphics cessed on October 30th 2009. [Online]. Available:
Interface’98 Proceedings, 1998, pp. 26–34. http://www.cipr.rpi.edu/ said/FastAC.html

You might also like