Lect-8 (Hash Function)
Lect-8 (Hash Function)
Lect-8 (Hash Function)
Cryptography
Hash Functions
w
Hash Function
A hash function is a mathematical algorithm that
takes an input and produces a fixed-size string of
bytes.
It is commonly used in computer science for various
purposes, including data integrity verification, digital
signatures, and password hashing.
One key feature of a secure hash function is that it is
irreversible, meaning the original input cannot be
determined from the hash value. Additionally, a small
change to the input data should produce a vastly
w
1 Integrity 2 Non-repudiation
Data encrypted with a hash function Hash functions are used to create digital
should remain unaltered and unchanged signatures, thereby preventing
during transmission or storage. individuals from denying the authenticity
of their digital messages.
3 Authentication 4 Confidentiality
The hash function serves as a way to While hash functions mainly focus on
verify the identity of the sender and ensuring data integrity, they can also be
ensures the data's origin. used to protect the confidentiality of
data.
Why do we need Hash Functions ?
1. Standard Length
When you hash a message, it takes your file or
message of any size, runs it through a mathematical
algorithm, and spits out an output of a fixed length.
In Table, we converted the same input message (the letters
CFI) into hash values using three different hash functions
(MD5, SHA-1, and SHA-256). Each one of those
different hash functions will spit out an output hash that
has a set fixed length of hexadecimal characters. In the
case of MD5, it is 32 characters, SHA-1, 40 characters,
and SHA-256, 64 characters.
It doesn’t matter what we put in as an input, the same hash
function
w
will always produce a hash value that has the the
same number of characters. In Table 2 above, we change
the message each time, but using the same hash function
(SHA-1 in this case), the output is always 40 hexadecimal
characters long.
Why do we need Hash Functions ?
2. Ensure data integrity
Let’s think of an example where you want to send a digital message
or document to someone, and you want to make sure that it hasn’t
been tampered with along the way. You could send it multiple times
and have the recipient verify each copy is the same, but that would
not be feasible if the file or message was very large.
It would be much easier if there was a way of having a shorter and
set number of characters for the sender and receiver to check. And
that’s essentially what a hash function allows two computers to do.
Rather than compare the data in its original (and larger) form, by
comparing the two hashes of the data, computers can quickly
confirm that the data has not been tampered with and changed.
Hashw
functions, therefore, serve as a check-sum or a way for
someone to identify whether digital data has been tampered with
after it’s been created.
Why do we need Hash Functions ?
3. Verify authenticity
For example, if you send out an email, it can be intercepted easily
(especially if it is sent over an unsecured WiFi network).
The recipient of the email has no way of knowing if someone has altered
the contents of the email along the way, called a “Man-in-the-Middle”
(MitM) attack.
However, if the sender signs the email with their digital signature and
hashes that together with the email contents, the receiver can examine the
hash data to ensure that the email contents have not been modified after
being digitally signed.
To do this, the receiver would compare the hash value on the digitally-
signed email received to a hash value they “re-generate” themselves
using the same hash function provided by the sender, as well as the
signer’s public key.
If it matches,
w that means that no one has altered the message, but if the
hashes are different, then the receiver knows that the contents of the
email are not authentic, as even if something small has been changed in
that message, the hash will be completely different.
Hash Functions Properties:
There are three central properties which hash functions need
to possess in order to be secure:
1. preimage resistance (or one-wayness)
2. second preimage resistance (or weak collision resistance)
3. collision resistance (or strong collision resistance)
w
1. Preimage resistance:
w
2. Second Preimage resistance:
w
3. Collision resistance:
w
3. Collision resistance:
C ollision resistance refers This p rop erty is cru cial in In real- world ap p lications,
to the p rop erty of a hash ensu ring the secu rity and collision- resistant hash
fu nction that m akes it robu stness of hash fu nctions are essential in
infeasible to find two fu nctions, p articu larly in p reventing u nau thorized
d istinct inp u ts that d ig ital sig natu res and d ata tam p ering and
p rod u ce the sam e hash d ata integ rity verificatio n. fo rg ery.
value.
How does Hash Function Work ?
A hash function depends on the algorithm but generally, to get the hash value of a set length, it
needs to first divide the input data into fixed-sized blocks, which are called data blocks.
This is because a hash function takes in data at a fixed length. The size of the data block is
different from one algorithm to another.
If the blocks are not big enough, they may add padding to fill it out. However, regardless of what
method of hashing you use, the output, or hash value, is always the same fixed length.
The hash function is then repeated as many times as the number of data blocks.
w
How Does Hash Function Work ?
The “Avalanche Effect”
The data blocks are processed one at a time. The output of the first data block is
fed as input along with the second data block. Consequently, the output of the
second is fed along with the third block, and so on.
Thus, making the final output the combined value of all the blocks. If you change
one bit anywhere in the message, the entire hash value changes. This is called ‘the
avalanche effect.
Uniqueness and Deterministic
Hash functions must be Deterministic – meaning that every time you put in the
same input, it will always create the same output.
In other words, the output, or hash value, must be unique to the exact input. There
should be no chance whatsoever that two different message inputs create the same
output hash. If a hash function produces the same output from two different
pieces of data, it is known as a “hash collision,” and the algorithm is useless.
Irreversibility
w
Ideally, hash functions should be irreversible. Meaning that while it is quick and
easy to compute the hash if you know the input message for any given hash
function, it is very difficult to go through the process in reverse to compute the
input message if you only know the hash value.
Design of Hashing Algorithms
Since, the hash value of first message block becomes an input to the second hash operation,
output of which alters the result of the third operation, and so on. This effect, known as an
avalanche effect of hashing.
Avalanche effect results in substantially different hash values for two messages that differ
by even a single bit of data.
Understand the difference between hash function and algorithm correctly. The hash
function generates a hash code by operating on two blocks of fixed-length binary data.
Hashing algorithm is a process for using the hash function, specifying how the message will
be broken up and how the results from previous message blocks are chained together.
w
Popular Hash Functions
1. Message Digest (MD)
MD5 was most popular and widely used hash function for
quite some years.
•The MD family comprises of hash functions MD2, MD4,
MD5 and MD6. It was adopted as Internet Standard RFC
1321. It is a 128-bit hash function.
•MD5 digests have been widely used in the software world to
provide assurance about integrity of transferred file. For
example, file servers often provide a pre-computed MD5
checksum for the files, so that a user can compare the
checksum of the downloaded file to it.
•In 2004, collisions were found in MD5. An analytical attack
was wreported to be successful only in an hour by using
computer cluster. This collision attack resulted in
compromised MD5 and hence it is no longer recommended
for use.
Popular Hash Functions
1. Message Digest (MD)
MD5 was most popular and widely used hash function for
quite some years.
•The MD family comprises of hash functions MD2, MD4,
MD5 and MD6. It was adopted as Internet Standard RFC
1321. It is a 128-bit hash function.
•MD5 digests have been widely used in the software world to
provide assurance about integrity of transferred file. For
example, file servers often provide a pre-computed MD5
checksum for the files, so that a user can compare the
checksum of the downloaded file to it.
•In 2004, collisions were found in MD5. An analytical attack
was wreported to be successful only in an hour by using
computer cluster. This collision attack resulted in
compromised MD5 and hence it is no longer recommended
for use.
2. Secure Hash Function (SHA)
Family of SHA comprise of four SHA algorithms;
SHA-0, SHA-1, SHA-2, and SHA-3. Though from
same family, there are structurally different.
•The original version is SHA-0, a 160-bit hash
function, was published by the National Institute of
Standards and Technology (NIST) in 1993. It had few
weaknesses and did not become very popular. Later in
1995, SHA-1 was designed to correct alleged
weaknesses of SHA-0.
•SHA-1 is the most widely used of the existing SHA
hash functions. It is employed in several widely used
applications and protocols including Secure Socket
Layer (SSL) security.
•In 2005, a method was found for uncovering
w
collisions for SHA-1 within practical time frame
making long-term employability of SHA-1 doubtful.
2. Secure Hash Function (SHA)
w
Examples:
w
Examples:
w
Applications of Hash Functions
There are two direct applications of hash function based on
its cryptographic properties, the most common
cryptographic applications:
1. Password Storage
Hash functions provide protection to password storage.
•Instead of storing password in clear, mostly all logon
processes store the hash values of passwords in the file.
•The Password file consists of a table of pairs which are in
the form (user id, h(P)).
•The process of logon is depicted in the following
illustration.
•An intruder can only see the hashes of passwords, even if
w
w
Applications of Hash Functions
3. Signature Generation and Verification
Verifying signatures is a mathematical process
used to verify the authenticity of digital
documents or messages. A valid digital
signature, where the prerequisites are satisfied,
gives its receiver strong proof that a known
sender created the message and that it was not
altered in transit.
A digital signature scheme typically consists of
three algorithms: a key generation algorithm; a
signing algorithm that, given a message and a
w
private key, produces a signature; and a
signature verifying algorithm.
Applications of Hash Functions
4. Verifying File and Message Integrity
Hashes can ensure messages and files
transmitted from sender to receiver are not
tampered with during transit. The practice builds
a "chain of trust." For example, a user might
publish a hashed version of their data and the
key so that recipients can compare the hash
value they compute to the published value to
make sure they align.
w
Hash Function and Cryptographic Hash Function ?
w
Hash Function and cryptographic Hash Function ?
w
Exercises
w
Exercises
• Exercise : Explain the purpose of the collision-resistance requirement for the hash function
used in a digital signature scheme.
• Exercise : Your colleagues urgently need a collision-resistant hash function. They already have
an existing implementation of ECBC-MAC using a block cipher with a 256-bit block size