Lect-8 (Hash Function)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chapter (5)

Cryptography
Hash Functions

w
Hash Function
A hash function is a mathematical algorithm that
takes an input and produces a fixed-size string of
bytes.
It is commonly used in computer science for various
purposes, including data integrity verification, digital
signatures, and password hashing.
One key feature of a secure hash function is that it is
irreversible, meaning the original input cannot be
determined from the hash value. Additionally, a small
change to the input data should produce a vastly
w

different hash value. This ensures the integrity and


security of the information.
What are Security Requirements?

1 Integrity 2 Non-repudiation
Data encrypted with a hash function Hash functions are used to create digital
should remain unaltered and unchanged signatures, thereby preventing
during transmission or storage. individuals from denying the authenticity
of their digital messages.

3 Authentication 4 Confidentiality
The hash function serves as a way to While hash functions mainly focus on
verify the identity of the sender and ensuring data integrity, they can also be
ensures the data's origin. used to protect the confidentiality of
data.
Why do we need Hash Functions ?
1. Standard Length
When you hash a message, it takes your file or
message of any size, runs it through a mathematical
algorithm, and spits out an output of a fixed length.
In Table, we converted the same input message (the letters
CFI) into hash values using three different hash functions
(MD5, SHA-1, and SHA-256). Each one of those
different hash functions will spit out an output hash that
has a set fixed length of hexadecimal characters. In the
case of MD5, it is 32 characters, SHA-1, 40 characters,
and SHA-256, 64 characters.
It doesn’t matter what we put in as an input, the same hash
function
w
will always produce a hash value that has the the
same number of characters. In Table 2 above, we change
the message each time, but using the same hash function
(SHA-1 in this case), the output is always 40 hexadecimal
characters long.
Why do we need Hash Functions ?
2. Ensure data integrity
Let’s think of an example where you want to send a digital message
or document to someone, and you want to make sure that it hasn’t
been tampered with along the way. You could send it multiple times
and have the recipient verify each copy is the same, but that would
not be feasible if the file or message was very large.
It would be much easier if there was a way of having a shorter and
set number of characters for the sender and receiver to check. And
that’s essentially what a hash function allows two computers to do.
Rather than compare the data in its original (and larger) form, by
comparing the two hashes of the data, computers can quickly
confirm that the data has not been tampered with and changed.
Hashw
functions, therefore, serve as a check-sum or a way for
someone to identify whether digital data has been tampered with
after it’s been created.
Why do we need Hash Functions ?
3. Verify authenticity
For example, if you send out an email, it can be intercepted easily
(especially if it is sent over an unsecured WiFi network).
The recipient of the email has no way of knowing if someone has altered
the contents of the email along the way, called a “Man-in-the-Middle”
(MitM) attack.
However, if the sender signs the email with their digital signature and
hashes that together with the email contents, the receiver can examine the
hash data to ensure that the email contents have not been modified after
being digitally signed.
To do this, the receiver would compare the hash value on the digitally-
signed email received to a hash value they “re-generate” themselves
using the same hash function provided by the sender, as well as the
signer’s public key.
If it matches,
w that means that no one has altered the message, but if the
hashes are different, then the receiver knows that the contents of the
email are not authentic, as even if something small has been changed in
that message, the hash will be completely different.
Hash Functions Properties:
There are three central properties which hash functions need
to possess in order to be secure:
1. preimage resistance (or one-wayness)
2. second preimage resistance (or weak collision resistance)
3. collision resistance (or strong collision resistance)

w
1. Preimage resistance:

w
2. Second Preimage resistance:

w
3. Collision resistance:

w
3. Collision resistance:

Definition Importance Application

C ollision resistance refers This p rop erty is cru cial in In real- world ap p lications,
to the p rop erty of a hash ensu ring the secu rity and collision- resistant hash
fu nction that m akes it robu stness of hash fu nctions are essential in
infeasible to find two fu nctions, p articu larly in p reventing u nau thorized
d istinct inp u ts that d ig ital sig natu res and d ata tam p ering and
p rod u ce the sam e hash d ata integ rity verificatio n. fo rg ery.
value.
How does Hash Function Work ?
A hash function depends on the algorithm but generally, to get the hash value of a set length, it
needs to first divide the input data into fixed-sized blocks, which are called data blocks.
This is because a hash function takes in data at a fixed length. The size of the data block is
different from one algorithm to another.
If the blocks are not big enough, they may add padding to fill it out. However, regardless of what
method of hashing you use, the output, or hash value, is always the same fixed length.
The hash function is then repeated as many times as the number of data blocks.

w
How Does Hash Function Work ?
The “Avalanche Effect”
The data blocks are processed one at a time. The output of the first data block is
fed as input along with the second data block. Consequently, the output of the
second is fed along with the third block, and so on.
Thus, making the final output the combined value of all the blocks. If you change
one bit anywhere in the message, the entire hash value changes. This is called ‘the
avalanche effect.
Uniqueness and Deterministic
Hash functions must be Deterministic – meaning that every time you put in the
same input, it will always create the same output.
In other words, the output, or hash value, must be unique to the exact input. There
should be no chance whatsoever that two different message inputs create the same
output hash. If a hash function produces the same output from two different
pieces of data, it is known as a “hash collision,” and the algorithm is useless.
Irreversibility
w
Ideally, hash functions should be irreversible. Meaning that while it is quick and
easy to compute the hash if you know the input message for any given hash
function, it is very difficult to go through the process in reverse to compute the
input message if you only know the hash value.
Design of Hashing Algorithms
Since, the hash value of first message block becomes an input to the second hash operation,
output of which alters the result of the third operation, and so on. This effect, known as an
avalanche effect of hashing.
Avalanche effect results in substantially different hash values for two messages that differ
by even a single bit of data.
Understand the difference between hash function and algorithm correctly. The hash
function generates a hash code by operating on two blocks of fixed-length binary data.
Hashing algorithm is a process for using the hash function, specifying how the message will
be broken up and how the results from previous message blocks are chained together.

w
Popular Hash Functions
1. Message Digest (MD)
MD5 was most popular and widely used hash function for
quite some years.
•The MD family comprises of hash functions MD2, MD4,
MD5 and MD6. It was adopted as Internet Standard RFC
1321. It is a 128-bit hash function.
•MD5 digests have been widely used in the software world to
provide assurance about integrity of transferred file. For
example, file servers often provide a pre-computed MD5
checksum for the files, so that a user can compare the
checksum of the downloaded file to it.
•In 2004, collisions were found in MD5. An analytical attack
was wreported to be successful only in an hour by using
computer cluster. This collision attack resulted in
compromised MD5 and hence it is no longer recommended
for use.
Popular Hash Functions
1. Message Digest (MD)
MD5 was most popular and widely used hash function for
quite some years.
•The MD family comprises of hash functions MD2, MD4,
MD5 and MD6. It was adopted as Internet Standard RFC
1321. It is a 128-bit hash function.
•MD5 digests have been widely used in the software world to
provide assurance about integrity of transferred file. For
example, file servers often provide a pre-computed MD5
checksum for the files, so that a user can compare the
checksum of the downloaded file to it.
•In 2004, collisions were found in MD5. An analytical attack
was wreported to be successful only in an hour by using
computer cluster. This collision attack resulted in
compromised MD5 and hence it is no longer recommended
for use.
2. Secure Hash Function (SHA)
Family of SHA comprise of four SHA algorithms;
SHA-0, SHA-1, SHA-2, and SHA-3. Though from
same family, there are structurally different.
•The original version is SHA-0, a 160-bit hash
function, was published by the National Institute of
Standards and Technology (NIST) in 1993. It had few
weaknesses and did not become very popular. Later in
1995, SHA-1 was designed to correct alleged
weaknesses of SHA-0.
•SHA-1 is the most widely used of the existing SHA
hash functions. It is employed in several widely used
applications and protocols including Secure Socket
Layer (SSL) security.
•In 2005, a method was found for uncovering
w
collisions for SHA-1 within practical time frame
making long-term employability of SHA-1 doubtful.
2. Secure Hash Function (SHA)

•SHA-2 family has four further SHA variants, SHA-


224, SHA-256, SHA-384, and SHA-512 depending
up on number of bits in their hash value. No
successful attacks have yet been reported on SHA-2
hash function.
•Though SHA-2 is a strong hash function. Though
significantly different, its basic design is still follows
design of SHA-1. Hence, NIST called for new
competitive hash function designs.
•In October 2012, the NIST chose the Keccak
algorithm as the new SHA-3 standard. Keccak offers
many benefits, such as efficient performance and
w
good resistance for attacks.
3. RIPEMD
The RIPEMD is an acronym for RACE Integrity Primitives
Evaluation Message Digest. This set of hash functions was
designed by open research community and generally known
as a family of European hash functions.
•The set includes RIPEMD, RIPEMD-128, and RIPEMD-
160. There also exist 256, and 320-bit versions of this
algorithm.
•Original RIPEMD (128 bit) is based upon the design
principles used in MD4 and found to provide questionable
security. RIPEMD 128-bit version came as a quick fix
replacement to overcome vulnerabilities on the original
RIPEMD.
•RIPEMD-160 is an improved version and the most widely
used w version in the family. The 256 and 320-bit versions
reduce the chance of accidental collision, but do not have
higher levels of security as compared to RIPEMD-128 and
RIPEMD-160 respectively.
4. Whirlpool

This is a 512-bit hash function.


•It is derived from the modified version of Advanced Encryption
Standard (AES). One of the designer was Vincent Rijmen, a co-
creator of the AES.
•Three versions of Whirlpool have been released; namely
WHIRLPOOL-0, WHIRLPOOL-T, and WHIRLPOOL.

5. The Bottom Line


Cryptographic hash functions are programs that use a
mathematical function, like an algorithm, to convert
information to a hexadecimal form. These functions are
also used in cryptocurrency to secure blockchain
w
information.
Examples:

w
Examples:

w
Examples:

w
Applications of Hash Functions
There are two direct applications of hash function based on
its cryptographic properties, the most common
cryptographic applications:

1. Password Storage
Hash functions provide protection to password storage.
•Instead of storing password in clear, mostly all logon
processes store the hash values of passwords in the file.
•The Password file consists of a table of pairs which are in
the form (user id, h(P)).
•The process of logon is depicted in the following
illustration.
•An intruder can only see the hashes of passwords, even if
w

he accessed the password. He can neither logon using hash


nor can he derive the password from hash value since hash
function possesses the property of pre-image resistance.
Applications of Hash Functions
2. Data Integrity Check
Data integrity check is a most common application of the hash functions. It is used to
generate the checksums on data files. This application provides assurance to the user
about correctness of the data. The process is depicted in the following illustration.
The integrity check helps the user to detect any changes made to original file. It
however, does not provide any assurance about originality. The attacker, instead of
modifying file data, can change the entire file and compute all together new hash
and send to the receiver. This integrity check application is useful only if the user is
sure about the originality of file.

w
Applications of Hash Functions
3. Signature Generation and Verification
Verifying signatures is a mathematical process
used to verify the authenticity of digital
documents or messages. A valid digital
signature, where the prerequisites are satisfied,
gives its receiver strong proof that a known
sender created the message and that it was not
altered in transit.
A digital signature scheme typically consists of
three algorithms: a key generation algorithm; a
signing algorithm that, given a message and a
w
private key, produces a signature; and a
signature verifying algorithm.
Applications of Hash Functions
4. Verifying File and Message Integrity
Hashes can ensure messages and files
transmitted from sender to receiver are not
tampered with during transit. The practice builds
a "chain of trust." For example, a user might
publish a hashed version of their data and the
key so that recipients can compare the hash
value they compute to the published value to
make sure they align.

w
Hash Function and Cryptographic Hash Function ?

What Is the 256-Bit Cryptographic Hash Function?


A 256-bit hash function takes information and turns it into a
256-bit, 64-digit hexadecimal output that is nearly
impossible to convert without a key.

What's the Difference Between a Hash Function and a


Cryptographic Hash Function?
Cryptographic hash functions are designed to be collision-
proof, whereas hash functions are made to be faster to solve.

w
Hash Function and cryptographic Hash Function ?

What Is SHA-512 Cryptographic Hash Function?


SHA-512 does the same thing as other secure hashing
algorithms. The difference is that in 256-bit, there are
2256 possibilities for a given input, but in 512-bit,
there are 2512 possibilities. It is theoretically twice as
secure, but because 256-bit is virtually impossible to
crack with modern computers, 512-bit is unnecessary.
It also requires more storage and processing power
and could slow down processes that use it.

w
Exercises

•Exercise 16: Explain the purpose of the collision-resistance requirement for


the hash function used in a digital signature scheme.
•Exercise 17: Your colleagues urgently need a collision-resistant hash
function. They already have an existing implementation of ECBC-MAC using a
block cipher with a 256-bit block size

w
Exercises
• Exercise : Explain the purpose of the collision-resistance requirement for the hash function
used in a digital signature scheme.

• Exercise : Your colleagues urgently need a collision-resistant hash function. They already have
an existing implementation of ECBC-MAC using a block cipher with a 256-bit block size

You might also like