Laszip: Lossless Compression of Lidar Data: Martin Isenburg
Laszip: Lossless Compression of Lidar Data: Martin Isenburg
Laszip: Lossless Compression of Lidar Data: Martin Isenburg
Martin Isenburg
LAStools
http://laszip.org
Abstract—Airborne laser scanning technology (LiDAR) makes not possible to seek within the file. Addressing these concerns
it easy to collect large amounts of point data that sample the the ASPRS created a simple binary exchange format - the LAS
elevation of the terrain beneath. The LAS format has become the format [1]. It is now the de facto industry standard for storage
de facto standard for storing and distributing the acquired points.
As the sampling density of LiDAR increases so does the size of and distribution of airborne and mobile LiDAR data.
the resulting files. Typical LAS files contain tens to hundreds of Up to LAS 1.3, each point record has a core 20 bytes of
millions points today, but soon billions will be commonplace. which 12 bytes store the x, y, and z coordinate as signed
We describe a completely lossless compression scheme for integers. The header of a LAS file contains scaling factors
LiDAR in binary LAS format versions 1.0 to 1.3. Our encoding for those integers that specify the precision (e.g. such as
and decoding speeds are around one to three millions points per
second and our compressed files are only 7 to 25 percent of the 0.01 for cm and 0.001 for mm). The other 8 bytes store
original file size. Compression and decompression happen on-the- intensity, scan angle, return count, classifications, etc. This
fly in a streaming manner and random-access is supported with a completes the basic point type 0 (for details see Table I). The
default granularity of 50,000 points. A reference implementation point types 1 and 3 add an 8 byte GPS time and the point
unencumbered by patents or intellectual property concerns is types 2 and 3 provide 6 bytes to store an RGB color. The
freely available with an LGPL-license, making the proposed
compression scheme suitable to become part of the LAS standard. LAS 1.3 specification introduced the point types 4 and 5 that
allow attaching full waveform information to each return (with
controversial design choices) but they are not used much.
I. I NTRODUCTION One of the great features of the LAS format is that it stores
Low flying aircrafts equipped with modern laser-range scan- the coordinates as scaled and offset integers—thereby requir-
ning technology (LiDAR) collect precise elevation information ing the producers to think about the actual precision in their
for entire cities, counties, or even states. Shooting 100,000 or scanned data samples and to choose appropriate increments
more laser pulses per second onto the earth’s surface they such as 0.01 meters (or feet) for storing the coordinates. This
take measurements at resolutions exceeding one point per eliminates the unnecessary if not disastrous bloat of double-
square meter. Derivatives of this data such as digital elevation precision floats or 20 digit ASCII representations where 15
maps are subsequently used in numerous applications: to of the 20 digits are really just scanner noise. The absence of
assess flood hazards, plan solar and wind installations, carry incompressible noise makes it possible to efficiently compress
out forest inventories, aid in power grid maintenance, etc. the LiDAR points in a completely lossless manner.
However, the sheer amount of LiDAR data collected poses Generic compression schemes are not well suited to com-
a significant challenge as not millions but billions of elevation press LiDAR because they do not have the insights into the
samples need to be stored, processed, and distributed. structure of the data to properly model the probabilities of
The scanner records the waveform of the returning reflection certain patterns to occur. The WinZIP compressor does not
for each laser pulse that it sends out. The intensity peaks of compress well while the WinRAR compressor is extremely
this waveform correspond to points that were hit by the laser slow. Neither scheme is suited for streaming or for random-
and that reflected significant portions back to the sensors on access decompression, which means the entire file needs to be
the plane. There can be multiple peaks because the laser may decompressed before its contents can be accessed.
hit several surfaces such as wires or antennas, branches, leaves, In this paper we introduce LASzip, a lossless compressor for
or even birds in flight before reaching the ground. Each peak LiDAR stored in the LAS format. It delivers high compression
above a certain threshold is called a return. The coordinates rates at unmatched speeds and supports streaming, random-
of these returns together with intensity, scan angle, GPS time, access decompression. The source code is available with
return number, flight line ID, etc. are the data of interest. LGPL-license and was integrated into the open source libraries
Fresh off the scanner, the LiDAR data is typically stored LASlib of LAStools [2] and libLAS [3]. There is native
in a binary, vendor-specific format. But to exchange the data support for reading and writing LAZ in FME 2012, TopoDOT,
between users and across different software packages it was VoyagerGIS, and LAStools and others are following. LAZ is
traditionally converted into a simple ASCII representation used internally at USACE, Certainty3D, Watershed Sciences,
where each line was listing the attributes of a single return. Riegl, and others. Data providers such as Open Topography
While flexible and easy to understand, storing millions (or now provide LAZ as an compressed download option [4] and the
billions) of LiDAR returns in a textual format is cumbersome: DNR of Minnesota hosts LiDAR for 40 counties exclusively
the file size grows large, parsing the data is inefficient, and it is in LAZ format with plans to complete the entire state [5].
name of atomic item size point type and size
II. BACKGROUND 0 1 2 3 4 5
Before describing the LASzip compressor some preliminar- point attributes format size 20 28 26 34 57 63
ies about coordinate precision, the LAS format, related work POINT10 20 bytes x x x x x x
in point compression, and entropy and difference coding. X int 4 bytes x x x x x x
Y int 4 bytes x x x x x x
A. Floating-Point Precision vs. Integer Precision Z int 4 bytes x x x x x x
Intensity u short 2 bytes x x x x x x
There is a common miss-conception that floating-point rep- Return Number 3 bits 3 bits x x x x x x
resentations provide more precision than integer presentations Number of Returns of Pulse 3 bits 3 bits x x x x x x
for storing the x, y, z coordinates of a point. They do not. Scan Direction Flag 1 bit 1 bit x x x x x x
Edge of Flight Line 1 bit 1 bit x x x x x x
The coordinates of LiDAR points from an airborne or a Classification u char 1 byte x x x x x x
mobile survey are spread out in the x-y plane with uniform Scan Angle Rank u char 1 byte x x x x x x
distribution, in the sense that there are roughly the same User Data u char 1 byte x x x x x x
Point Source ID u short 2 bytes x x x x x x
number of points per square meter everywhere and that the
points are acquired with roughly the same precision. There GPSTIME10 8 bytes x x x x
will not be one particularly dense area where points need to GPS Time double 8 bytes x x x x
be stored with higher precision. The floating-point format is
RGB12 6 bytes x x x
not designed for storing uniform distributions of numbers. Red u short 2 bytes x x x
Storing a number in floating-point representation means that Green u short 2 bytes x x x
the precision of the number will vary depending on the value of Blue u short 2 bytes x x x
that number. The closer it gets to zero the more precision it will WAVEPACKET13 29 bytes x x
have. This makes it a good format, for example, for numerical Wave Packet Descriptor Index u char 1 byte x x
computations where more precision is needed closer to zero. Bytes Offset to Waveform Data u int64 8 bytes x x
But using the floating-point format to store point coordinates Waveform Packet Size in Bytes u int 4 bytes x x
Return Point Waveform Location float 4 bytes x x
means that there is an increasingly precise spacing of data X(t) float 4 bytes x x
samples around one point—the origin—at the expense of an Y(t) float 4 bytes x x
increasingly imprecise spacing farther away. Z(t) float 4 bytes x x
An example: in single-precision floating-point there are 223 TABLE I
LAS ZIP GROUPS THE ATTRIBUTES OF THE POINT TYPES 0 TO 5 OF THE
different numbers to represent a coordinate between 2 and LAS 1.3 FORMAT INTO FOUR ATOMIC ITEMS : POINT10, GPS10, RGB12,
4 meters with a spacing of 2/223 = 0.00000023841 meter, AND WAVEPACKET13 THAT ARE THEN COMPRESSED SEPARATELY.
there are 223 different numbers to represent a coordinate
between 128 and 256 meters with a spacing of 128/223 =
B. The LAS format
0.00001525 meter, there are 223 different numbers to represent
a coordinate between 524, 288 and 1, 048, 576 meters with To facilitate the exchange of LiDAR data between data
a spacing of 524, 288/223 = 0.0625 meter, and of course vendors, users, and different software packages, the ASPRS
there are also 223 different numbers to represent a coordinate created LAS as a simple binary exchange format [1]. A LAS
between 2, 097, 152 and 4, 194, 304 meters with a spacing of file of the 1.0 - 1.3 family consists of a header that can be
4, 194, 304/223 = 0.25 meter. If you notice the pattern you followed by any number of variable length records before the
already know that there will also be 223 different numbers actual point data begins. The first 227 header bytes define the
to represent a coordinate between 4, 194, 304 and 8, 388, 608 content of a LAS file: the number of variable length records,
meters with a spacing of 4, 194, 304/223 = 0.5 meter. the offset to the start of the points, the type and size of each
In summary, if you store the easting and northing of your point, the number of points, the offsets and scale factors for the
coordinates directly in floating-point they may retain just 0.5 integer point coordinates, and a bounding box that describes
meters of precision. If you subtract a constant offset from your the extends in x, y, and z of all points in the file.
coordinates so the origin falls into the middle of the bounding In LAS 1.3, where each point can have an attached wave-
box, then the samples near the origin are stored with incredible form, there are 235 header bytes. The extra 8 bytes describe the
precision ... much much more than LiDAR has. start of the waveform data. If this field is zero the waveforms
The appropriate format for storing the coordinates of LiDAR are stored in an external WDP file. If this field is non-zero
points are properly scaled and offset integers. They offer the waveforms are stored inside the LAS file after the point
much more uniform precision than a corresponding floating- block and the field contains the offset to the start of the
point value for the same number of bits: a 32 bit integer, waveform data. LASzip does not (yet) support including the
for example, offers 7 bits more uniform precision than a 32 waveform data inside the LAZ file but always writes it to
bit floating-point number [6] and similarly a 64 bit integer an external WDP file instead—at the moment uncompressed.
offers 10 bits more than a 64 bit float. In order to increase the There is, however, an (undocumented) option in place that will
coordinate range for a large-scale LiDAR collect, the correct compress the waveforms to a more compact WDZ file.
thing to do is to move from 32 bit to 64 bit integers. The LAS Full waveform LiDAR data in LAS 1.3 format is currently
format [1] uses scaled and offset 32 bit integers. only produced by one vendor. Apparently, the mechanism for
waveform storage was quickly added to the LAS standard grow with the number of points. Usually, progressive
to meet the needs of one hardware vendor without seeking streaming also falls into this category as decompressing
mutual consensus among all scanner manufacturers first. There the points to full precision requires keeping all previously
is almost no publicly available waveform data stored in the decompressed coarser points in memory.
LAS 1.3 format and there are only a few software products that • point-permuting versus order-preserving
can make use of the waveform data in LAS 1.3 files. Therefore Point-permuting schemes do not preserve the original
we postpone the details for full waveform compression for point ordering in the file. Their compression gains come
LAS 1.3 until it becomes more relevant. in large parts from imposing a clever canonical ordering
The point types 0 to 5 available in LAS 1.3 and the onto the points that results in small residuals. Order-
attributes they are composed of are detailed in Table I. The preserving schemes do not re-order the points. They
LASzip compressor views point types as compositions of compress more information about each point as they also
four different atomic items: POINT10, GPSTIME10, RGB12, need to specify one of n! possible point permutations.
and WAVEPACKET13 that are compressed separately. For • sequential versus random-access
example, a point of type 3 is composed of POINT10 followed Sequential schemes decompress the points in the order
by GPSTIME10 and RGB12. Additionally each point may they are encoded into the compressed file. Random-
have n so called “extra bytes” at the end, each of which is access schemes can seek in the compressed file and
currently considered as a BYTE item. These “extra bytes” only decompress a particular part. The granularity of the
occur when the LAS header specifies a point size larger than random access is typically limited to blocks of points.
required by the respective point type. For example if the point The LASzip compressor is lossless, non-progressive,
type is 1 and the point size is 32 then there are 4 “extra bytes”. streaming, order-preserving, and provides random-access.
C. Point Compression D. Related Work
The compression of points has been extensively studied in The seminal geometry compression paper by Deering [7]
the context of computer graphics where a point set typically is sparked the development of a number of compression schemes
a dense sampling of a three-dimensional object. We distinguish for meshes [8], [9], [10], [11], [12], [13] that can also be
the following qualities in a compression scheme: thought of as point compression schemes that encode addi-
• lossy versus lossless tional information (i.e. the mesh connectivity). Nearly all point
Lossy schemes compress the shape the points represent compression schemes assume that the original order of the
rather than the exact point coordinates by allowing their points is meaningless and permute them as they see fit during
positions to change slightly as long as they remain encoding to maximize the achieved compression. Because
faithful to the underlying surface. They are mainly used LASzip aims at compressing LAS files exactly—without any
in visualization-only applications. Lossless schemes com- modification—reordering the points is not an option.
press point coordinates represented with uniform preci- The kd-tree approach of Devillers and Gandoin [10], [14]
sion as scaled integers (in literature often referred to as recursively bisects a quantized bounding box along all three
“after bounding box quantization” or as “after quantizing axis always encoding the number of points in one half. The
to a certain number of bits per coordinate”) exactly. oct-tree approach of Botsch et al. [15] recursively entropy
• progressive versus non-progressive (or single-rate) codes for all eight child nodes whether they contain points or
Progressive schemes compress the data such that the de- not with an 8-bit symbol. Peng and Kuo [12] and Schnabel and
coder can immediately display a lower resolution version Klein [16] use prediction schemes to further improve the bit-
of the points while detail is added as the decompression rates of the oct-tree approach. Spatial subdivision approaches
progresses. They are mainly used for instant feedback have the drawback that they do not generalize to include
in an interactive visualization setting. Non-progressive attribute data such as a GPS time or an RGB color.
schemes have only one rate of resolution and decompress The method of Waschbüsch et al. [17] generates a binary
the points at full precision. They are mainly used as tree over the points by pairing close-by points that are replaced
an I/O friendly, alternate format of the point data for by their centroid to form the next coarser level. The method of
transmission or storage, or to take load of a file server. Gumhold et al. [18] incrementally constructs a prediction tree
• streaming versus non-streaming by greedily attaching the next point to the tree such that it has
Streaming schemes start compressing the points and out- the smallest possible residual and compress the tree topology
putting the compressed file after reading only a fraction and the residuals in a streaming fashion. Merry et al. [19]
of them and vice-versa start decompressing the points present a more elaborate prediction tree variation that uses a
and outputting the decompressed file after reading only globally minimal spanning tree and a set of predictors.
a fraction of the compressed data. The memory footprint Quite similar to LASzip are the commercially available
remains tiny in comparison to the data they process. Non- LizardTech R
LiDAR compressorTM [20] and the LASCom-
streaming schemes read all points into memory either pression software [21] that implements the method of Mongus
during compression, during decompression, or both. They and Zalik [22]. Both schemes specifically target LiDAR points
usually need to construct temporary data structures that stored in the LAS format and compress them losslessly.
By default the LizardTech’s LiDAR compressor [20] en- item has its own compressor with its own version number
codes points in blocks of 4,096, performing a simplified making the compressor modular and easy to extend to future
Haar wavelet transform on each array of point attributes point types. This document describes LASzip 2.0 that uses
individually. Pairs of attribute values are recursively replaced version 2 compression for all items except WAVEPACKET13.
by an average coefficient, which is simply the left value, and The earlier LASzip 1.0 uses version 1 compression exclusively
its corresponding detail coefficient, which is the right minus and does not allow random access decompression. While still
the left value. Because high-order bits of detail coefficients supported in software for backward compatibility, we do not
tend to be zero they can compressed efficiently bit-plane by describe the LASzip 1.0 compressor in this document.
bit-plane using arithmetic coding [23]. The 8 byte floating- The LASzip compressor always encodes the points in
point GPS time that is part of some point types is compressed chunks of points to allow seeking in the compressed file. The
using standard DEFLATE. Besides the compressed contents default chunk size is currently set to 50,000 points. Chunking
of the LAS file, the resulting MrSID file also stores spatial makes it possible to, for example, augment the produced LAZ
indexing information to support area-of-interest queries. files with spatial indexing LAX files [2] and support area-
The LASCompression software [21] operates very similar of-interest queries that decompress only the relevant parts of
to LASzip in the sense that it predicts the attributes of a a compressed LAZ file. Because each compressed chunk is
point from previous points with a set of prediction rules and different in size, the compressor stores a chunk table at the
compresses the corrective deltas with arithmetic coding. In end of the file that specifies the starting byte of each chunk.
particular, the authors use a clever scheme for predicting the When starting a new chunk, the LASzip compressor stores
linear dependencies between successive points that correspond the first point as raw bytes and then initializes the entropy
to returns from the same pulse by using the already encoded coder. This point is then used as the initial value for subsequent
deltas for x to improve the predictions of y and z. prediction schemes. All following points are compressed item
by item with the compression schemes detailed below.
E. Entropy and Difference Coding
A. Compressing POINT10 (version 2)
An entropy coder turns a sequence of symbols into a com-
pact stream of bits while using knowledge about the (uneven) First, the compressor encodes a bit-mask of 6 bits that
distribution of symbols to store them more compactly—up to specifies whether any of the following attributes have changed
the theoretical optimum. As the symbol distribution is often in comparison to the previously processed point:
not known in advance, an adaptive entropy coder initially • intensity: the 2 bytes that specify the unsigned 16 bit
assumes it to be uniform and learns the actual distribution intensity value i of the point stored in bytes 12 and 13.
along the way. When a symbol stream is expected to have • return bits: the 3 + 3 + 1 + 1 bits that specify the return
different distributions given “context” information available number r, the number of returns of given pulse n, and
to the compressor, it is beneficial to switch between different scan direction and edge of flight line stored in byte 14.
contexts while encoding the symbols. The entropy coder used • classification bits: the 5 + 1 + 1 + 1 bits that specify
by LASzip is based on a fast implementation of adaptive, the 5-bit ASPRS classification of the point c and the
context-based arithmetic coding by Amir Said [24]. synthetic, keypoint, and withheld flags stored in byte 15.
A difference coder compresses the current value as the • scan angle rank: the 1 byte that specifies the signed 8-bit
difference to a previous value. This is most effective when the scan angle a of the point stored in byte 16.
distribution of differences has a much tighter spread and there- • user data: the 1 byte that specifies the unsigned 8-bit
fore a much lower entropy than the distribution of values. The user data u of the point stored in byte 17.
difference coder used by LASzip entropy codes the number k • point source ID: the 2 bytes that specify the unsigned
that describe the tightest interval [−(2k − 1), +(2k )] that the 16-bit flight line p of the point stored in bytes 18 and 19.
difference falls into, entropy codes up to 8 of its highest bits Next, the compressor encodes all attributes that have
as one symbol while switching contexts for different k <= 8, changed. It encodes the return bits while switching between
and stores any remaining lower bits raw. 256 entropy contexts in dependence on the previous return
bits byte. The return number r and the number of returns of
III. T HE LAS ZIP COMPRESSOR given pulse n are used to derive two numbers that are used for
LASzip does not compress the LAS header or any of the switching contexts later: a return map m and a return level l.
variable length records. It simply copies them unmodified from The return map m simply serializes the valid combinations
the LAS to the LAZ file. It however adds 128 to the value of of r and n: for r = 1 and n = 1 it is 0, for r = 1 and n = 2
the current point type to prevent standard LAS readers from it is 1, for r = 2 and n = 2 it is 2, for r = 1 and n = 3 it
attempting to read a compressed LAZ file. It also adds one is 3, for r = 2 and n = 3 it is 4, for r = 3 and n = 3 it is
variable length record that specifies the composition of the 5, for r = 1 and n = 4 it is 6, etc. Unfortunately, some LAS
compressed points and various compression options used. files start numbering r and n at 0, only have return numbers
The LASzip compressor views the different point types of r, or only have number of return of given pulse counts n. We
the LAS 1.3 specification as compositions of items: POINT10, therefore complete the table to also map invalid combinations
GPSTIME10, RGB12, WAVEPACKET13, and BYTE. Each of r and n to different contexts as shown below.
file size compression enc. time dec. time file size compression enc. time dec. time
file name [MB] ratio [sec] [sec] file name [MB] ratio [sec] [sec]
LAS LAZ SID LAZ SID LAZ SID LAZ SID LAS LAZ LCMP LAZ LCMP LAZ LCMP LAZ LCMP
5126-05-57.las 287 21 68 13.4 4.2 4.1 85 20 62 5126-05-57.las 287 21 26 13.4 11.1 21 119 20 125
5126-05-58.las 312 29 80 10.6 3.9 4.7 97 22 72 5126-05-58.las 312 29 38 10.6 8.2 21 161 22 149
5126-05-59.las 363 45 104 8.1 3.5 5.9 120 25 100 5126-05-59.las 363 45 59 8.1 6.1 24 240 25 120
5126-05-60.las 287 21 68 13.5 4.2 4.1 87 20 58 5126-05-60.las 287 21 26 13.5 11.0 19 120 20 65
5126-05-61.las 286 21 68 13.4 4.2 4.1 87 20 58 5126-05-61.las 286 21 26 13.4 11.1 17 122 20 65
total 1,534 138 388 11.1 4.0 23 476 106 350 total 1,534 138 175 11.1 8.8 102 762 106 524
1942-29-59.las 486 83 144 5.9 3.4 18 242 35 156 1942-29-59.las 486 83 94 5.9 5.2 42 212 35 276
1942-29-60.las 485 81 142 6.0 3.4 15 246 33 154 1942-29-60.las 485 81 91 6.0 5.3 40 281 33 363
1942-29-61.las 480 80 140 6.0 3.4 14 234 33 153 1942-29-61.las 480 80 90 6.0 5.3 37 348 33 355
1942-29-62.las 464 77 135 6.0 3.4 13 224 31 143 1942-29-62.las 464 77 87 6.0 5.3 37 343 31 345
1958-23-23.las 539 86 156 6.3 3.4 14 268 37 173 1958-23-23.las 539 86 97 6.3 5.5 41 380 37 379
total 2,454 407 716 6.0 3.4 74 1,214 169 779 total 2,454 407 460 6.0 5.3 198 1,564 169 1,718
TABLE II TABLE III
P ERFORMANCE COMPARISON BETWEEN LAS ZIP (LAZ) AND THE P ERFORMANCE COMPARISON BETWEEN LAS ZIP (LAZ) AND THE
L IZARD T ECH L I DAR COMPRESSOR (SID) IN COMPRESSION RATIO AND LASC OMPRESSION CODER (LCMP) [21] IN COMPRESSION RATIO AND
ENCODING / DECODING TIMES FOR L I DAR OF THE M INNESOTA DNR [5]. ENCODING / DECODING TIMES FOR L I DAR OF THE M INNESOTA DNR [5].
B. Compressing GPSTIME10 (version 2) coded when the GPS times are identical. LASzip starts a
new context when the delta overflows a 32 bit integer. For
The GPS times of a single flight path are a monotonically
that it difference codes the 32 higher bits of the current GPS
increasing sequence of double-precision floating-point num-
time and the current context and stores the lower 32 bits raw.
bers where returns of the same pulse have the same GPS time
Otherwise it switches to the specified context (where the delta
and where subsequent pulses have a more or less constant
will not overflows a 32 bit integer) and recurses. The current
spacing in time. While the LASzip compressor is optimized
deltas stored with each context are updated to the actual delta
for compressing single flight paths it will handle any GPS time
when they were outside the predictable range more than 3
sequence. The compression ratio depends on how far the input
consecutive times (i.e. bigger than 500 times the current delta,
is from the expectations. For randomly permuted points it will
smaller than -10 times the current delta, or zero).
be terrible. For multiple flight paths that have been sorted into
Currently the LASzip compressor does not make use of
tiles one after another it will be excellent.
known data from the already compressed item POINT10
For compression purposes LASzip treats double-precision
for compressing GPSTIME10. However, if return counts and
floating-point GPS times as if they were signed 64 bit integers
point source IDs are populated correctly there is significant
and predicts the deltas between them. As prediction contexts,
correlation that can be exploited. For example, subsequent
it stores up to four previously compressed GPS times with
returns of the same pulse are likely to have the same exact
corresponding deltas. Keeping multiple prediction contexts
GPS time, and subsequent returns with different point source
can account for repeated jumps in GPS time that arise when
IDs are likely to require a context switch. We plan to exploit
multiple flight paths are merged with fine spatial granularity.
this when compressing the new point types of the recently
LASzip distinguishes several cases that are entropy coded
released LAS 1.4 specification which include the GPS time as
with 516 symbols depending on if the current GPS time is
an integral part of the point [1].
0 predicted with a delta of zero.
1–500 predicted using the current delta times 1 to 500. C. Compressing RGB12 (version 2)
501–510 predicted using the current delta times -1 to -10. LAS uses unsigned 16 bit integers for the R, G, and B
511 identical to the last. channel. Some files—incorrectly—populate only the lower 8
512 starting a new context. bits so that the upper 8 bits are zero. Other files—correctly—
513–515 predicted with one of the other three contexts. multiply 8-bit colors with 256 so that the lower 8 bits are zero.
For the first three cases LASzip subsequently difference codes The LASzip compressor therefore compresses the upper and
the delta prediction and the actual delta. Nothing further is lower byte of each channel separately. First it entropy codes 6
100.0
bits that specify which bytes have changed as one symbol. For
all bytes that have changed it then entropy codes the difference
to the respective previous byte modulo 256.
The channels are encoded in the order R, G, and B. 80.0
computed.
The LASzip compressor supports compression of point 5K 10K 20K 50K 75K 100K
types 4 and 5 that contain wave packet information. However,
the current scheme is still in version 1 as it has yet to be uncompressed compressed size [MB]
file name size 5K 10 K 20 K 50 K 75 K 100 K
optimized. So far there has been very little real-world demand 5126-05-57.las 287 24.3 22.9 22.0 21.3 21.2 21.2
for compressing LAS files containing waveform data simply 5126-05-58.las 312 33.4 31.5 30.2 29.4 29.1 29.1
due to a lack of data stored in this format. 5126-05-59.las 363 51.0 48.2 46.2 44.8 44.5 44.1
5126-05-60.las 287 24.4 22.9 22.0 21.3 21.2 21.1
LASzip simply entropy codes the wave packet descriptor 5126-05-61.las 286 24.4 23.0 22.1 21.4 21.2 21.1
index, an unsigned byte that is zero if a point has no waveform total 1,534 157 149 143 138 137 136
and indexes the variable length record describing the format 1942-29-59.las 486 93.7 88.9 85.5 82.7 82.0 81.6
1942-29-60.las 485 91.6 87.0 83.9 81.2 80.5 80.1
of the waveform otherwise. To compress the bytes offset to 1942-29-61.las 480 90.2 85.7 82.6 80.0 79.3 78.9
waveform data it entropy encodes one of 4 possible cases: 1942-29-62.las 464 87.0 82.6 79.5 77.1 76.2 75.9
1958-23-23.las 539 96.8 92.0 88.6 85.9 85.1 84.6
1) same as last offset total 2,454 459 436 420 407 403 401
2) use last offset plus last packet size TABLE IV
3) difference to last offset is less than 32 bits T HE EFFECT OF DIFFERENT CHUNK SIZES 5,000, 10,000, 20,000, 50,000,
4) difference to last offset is more than 32 bits 75,000, AND 100,000 POINTS ON THE COMPRESSION RATES OF LAS ZIP.
LAS/LAZ files with any LAStools tool or any application that [9] G. Taubin and J. Rossignac, “Geometric compression through topologi-
uses the LASlib API [2] to read or write LAS or LAZ files. cal surgery,” ACM Transactions on Graphics, 17 (2), pp. 84–115, 1998.
[10] O. Devillers and P.-M. Gandoin, “Geometric compression for interactive
ACKNOWLEDGMENT transmission,” in Proc. of IEEE Visualization 2000, 2000, pp. 319–326.
[11] M. Isenburg and S. Gumhold, “Out-of-core compression for gigantic
The author would like to thank those who have down- polygon meshes,” in SIGGRAPH 2003 Proceedings, 2003, pp. 935–942.
loaded “LAStools” and sent useful feature requests or bug [12] J. Peng and C. C. Kuo, “Geometry-guided progressive lossless 3d
reports, Howard Butler for suggesting “chunking”, and Tim mesh coding with octree (OT) decomposition,” in SIGGRAPH ’05
Proceedings, 2005, pp. 609–616.
Loesch from the DNR Minnesota for the first large-scale [13] M. Isenburg, P. Lindstrom, and J. Snoeyink, “Streaming compression
LAZ campaign. Financial support for version 2.0 of LASzip of triangle meshes,” in Proceedings of the 3rd Symposium on Geometry
was provided by Dave Finnegan from USACE Cold Regions Processing, 2005, pp. 111–118.
[14] O. Devillers and P.-M. Gandoin, “Progressive and lossless compression
Research and Engineering Laboratory and by Hobu Inc. in of arbitrary simplicial complexes,” in SIGGRAPH’2002, pp. 372–379.
conjunction with other open source efforts it operates including [15] M. Botsch, A. Wiratanaya, and L. Kobbelt, “Efficient high quality
libLAS and PDAL. Michael P. Gerlek from Flaxen Geo and rendering of point sampled geometry,” in Eurographics Rendering
Workshop, 2002, pp. 53–64.
Howard Butler from Hobu Inc. assisted with the integration
[16] R. Schnabel and R. Klein, “Octree-based point-cloud compression,” in
of LASzip versions 1.0 and 2.0 into libLAS. Eurographics Symposium on Point-Based Graphics, 2006, pp. 111–120.
[17] M. Waschbüsch, M. Gross, F. Eberhard, E. Lamboray, and S. Würmlin,
R EFERENCES “Progressive compression of point-sampled models,” in Eurographics
[1] ASPRS LAS format, “Specifications for the LASer exchange file Symposium on Point-Based Graphics, 2004, pp. 95–102.
format,” accessed on November 21th 2011. [Online]. Available: [18] S. Gumhold, Z. Karni, M. Isenburg, and H. P. Seidel, “Predictive point-
http://www.asprs.org/LAS Specification cloud compression,” in SIGGRAPH ’05 Sketches, 2005, p. 137.
[2] LAStools, “Efficient tools for LiDAR processing.” [Online]. Available: [19] B. Merry, P. Marais, and J. Gain, “Compression of dense and regular
http://www.lastools.org/ point clouds,” Computer Graphics Forum, 25 (4), pp. 709–716, 2006.
[3] libLAS, “A LAS 1.0/1.1/1.2 ASPRS LiDAR data translation toolset.” [20] LizardTech R , “LiDAR compressorTM ,” accessed on October 17th 2011.
[Online]. Available: http://www.liblas.org/ [Online]. Available: http://www.lizardtech.com/
[4] OpenTopography, “A portal to high-resolution topography data and [21] LASCompression, “A lossless compression algorithm for LiDAR
tools.” [Online]. Available: http://www.opentopography.org/ datasets,” accessed on November 19th 2011. [Online]. Available:
[5] Minnesota Department of Natural Resources, “Minnesota state LiDAR http://gemma.uni-mb.si/lascompression/
collect.” [Online]. Available: ftp://lidar.dnr.state.mn.us/ [22] D. Mongus and B. Zalik, “Efficient method for lossless LiDAR data
[6] M. Isenburg, P. Lindstrom, and J. Snoeyink, “Lossless compression compression,” International Journal of Remote Sensing, vol. 32, pp.
of predicted floating-point geometry,” Computer-Aided Design, vol. 37, 2507–2518, May 2011.
no. 8, pp. 869–877, 2005. [23] A. Moffat, R. M. Neal, and I. H. Witten, “Arithmetic coding revisited,”
[7] M. Deering, “Geometry compression,” in SIGGRAPH 95 Conference ACM Transactions on Information Systems, 16 (3), pp. 256–294, 1998.
Proceesings, 1995, pp. 13–20. [24] A. Said, “Fastac: Arithmetic coding implementation,” ac-
[8] C. Touma and C. Gotsman, “Triangle mesh compression,” in Graphics cessed on October 30th 2009. [Online]. Available:
Interface’98 Proceedings, 1998, pp. 26–34. http://www.cipr.rpi.edu/ said/FastAC.html