Lossless Data Compression Techniques and Their Performance
Lossless Data Compression Techniques and Their Performance
Lossless Data Compression Techniques and Their Performance
Abstract— This paper provides the study of various lossless In this paper we have discussed and analysed the three
data compression techniques and compares their performance most common lossless data compression techniques like
and efficiency using time and space complexity. Data Huffman coding, LZW and Shannon-Fano coding. As a test
compression is itself a huge field in a computer science and it is
bed we have taken several file formats like plain text(txt),
used in reducing the size of data by removing the redundant
characters or encoding the bits in data. We have used only document(doc), image(bmp,png) and audio(wav).
lossless data compression techniques for compressing various Based on the efficiency factors like:
media format files like text, doc, bmp, png and wav. Huffman • Compression Ratio: Compression ratio is a process of
coding which generates variable length codes for each symbol. calculating how much or to what extent a file is
LZW is a dictionary based technique and Shannon-Fano coding compressed.
is used to generate binary code based on the frequency of each
symbol. So we calculated the mean compression ratio,
compression factor and compression time for various media files
and result analysis of above data compression techniques will
help to select the most appropriate method of compressing of any
given file format in real world.
• Compression Factor: Compression factor is inverse
Keywords— Lossless data compression; Huffman Algorithm; of compression ratio.
Lempel-Ziv Coding; Shannon-fano coding
I. INTRODUCTION
Data compression is the art of producing information in a
reduced form. It encodes original files and information to
compressed ones using bit-codes saving time and space in the
matter. Generally data compression techniques finds and • Compression time: Compression time is a time which
eliminates the redundant data using different methodologies. a compressor takes while compressing a file.
There are broadly two data compression techniques:
• Lossless Compression: As the name suggests, • Saving Percentage: Saving percentage calculates how
lossless data compression with shrinking the much a file shrinks in size as a percentage.
original data without any permanent loss of
information. It generally used to transmit digital
data.ZIP and GIF are the image formats which uses
this kind of technique
• Lossy Compression: this data compression however
permanently loses some information in the We have examined and analyzed the above techniques
compression process. This technique is favored for its easy and selective use in implementing their algorithm
when prediction or generation of some information in C programming language
of data is quite easy and flexible. Mp3 and JPG are
the audio and image format which uses this kind of II. DATA COMPRESSION ALGORITHMS
technique.
Nowadays data transmission over the network becomes
very time consuming because of the large size of data being
transmitted, so to solve the above problem we have many
lossless data compression algorithms but we have taken only codes as in encoded output. It removes redundancy in output
three algorithms among the others, the algorithms which we file by skipping repetitive dictionary references if found in
are going to study are Huffman algorithm, LZW and Shannon- reading input stream. It is Generally favored to compress
Fano algorithm. These algorithms are used to compressed the image formats like GIF and BMPs.
large size data so that that data can efficiently transmitted LZW algorithm procedure is as follows:
over the network at a given time. Below discuss the
algorithms. • Parse the input file using a file reader stream and
obtain the distinct ASCII characters in it.
A. Huffman Algorithm
• Arrange the ASCII characters in ascending order of
Huffman coding is a lossless data compression technique their respective ASCII values and assign reference to
based on ASCII characters developed by David Huffman in the character list starting from 0.
1951. It is an entropy encoding algorithm means it converts
the fixed length codes of characters to variable length codes to • Initialize an empty string S and an empty character
characters is done by non repeating prefix codes such that no (To read from file character by character ).
two characters are assigned same codes.
• Implement a loop:
Huffman algorithm procedure is as follows: while(not End of File (EOF))
{
• Parsing Input File: Input file is to parsed via File ch=read character from file;
Reader to obtain the distinct ASCII characters if (Dictionary contains S+ch);
present in a file with their frequency. {
S=S+ch;
• Arranging the Character List: The character List is }
obtained from Parsed Input to be arranged in else
descending order according to their frequency. {
Assign reference code of S to output Stream;
• Constructing the Huffman Tree: Create a Add (S+ch) to dictionary with continued
corresponding isolated tree node for each distinct reference code;
character their data as frequency. Take two data S=ch;
nodes and merge them to form one common node X }
with frequency being addition of the individual }
nodes. Replace the two data nodes in the list with encode S to output Stream.
common node X and readjust the list using insertion
sort. Perform the above selection and merging • Flush the output stream to an output file.
process repeatedly until there is only one node left
. C. Shannon-Fano Algorithm
• Assigning Codes: Starting from the Single parent
node formed in previous step traverse the tree Shannon-Fano is a Lossless Data Compression Algorithm
recursively maintaining an empty character array for named after Claude Shannon and Robert Fano. It is also an
storing and assigning bit codes. For traversing right, entropy encoding algorithm assigning variable length code to
concatenate 1 to array and traversing left concatenate characters from input stream and writing to a compressed
0 to array, if leaf node encountered then assign array output stream. Unlike Huffman it Follows a top down tree
value to the node. construction and code assigning approach with a symbol by
symbol encoding.
• Generating Compressed File: Create a new file and Shannon-Fano algorithm procedure is as follows:
code the corresponding variable length codes for the
character format as in the original file.
• Parsing Input File: Input file is to parsed via File
Reader to obtain the distinct ASCII characters
B. Lempel-Ziv Algorithm (c1,c2..ci) present in a file with their frequency.
A Universal Lossless Data Compression Algorithm • Arranging the Character List: The character List is
created by Abraham Lempel, Jacob Ziv and Terry Welch in obtained from Parsed Input to be arranged in
1984. It is a dictionary based dynamic compression algorithm descending order according to their frequency
which reads the character substrings and assigns reference (f1,f2...fi) and calculate the total Frequency (F).
257
International Conference on Computing, Communication and Automation (ICCCA2017)
258
International Conference on Computing, Communication and Automation (ICCCA2017)
259
International Conference on Computing, Communication and Automation (ICCCA2017)
VI. CONCLUSION
From the results and analysis of the above test bed (Table
1)we have concluded that for compressing plain text files
(TXT), Huffman compression algorithm is suited the best as it
shows the least compression ratio and takes optimal
compression time. LZW compression algorithm works
suitably for image file formats (BMP, PNG, GIF) applying the
Fig 7. Compression Time best compression ratio and taking least compression time.
260
International Conference on Computing, Communication and Automation (ICCCA2017)
VII. REFERENCES
[5] Yao Liang and Yimei Li, "An Efficient and Robust Data
Compression Algorithm in Wireless Sensor Networks", IEEE
Communication Letter, Vol 18, No. 3 March 2014.
261