Cyber Law (Paper - I)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 130

Paper- I

Basics of the Computer and Cyber world

Chapter 1: Computer organization and architecture

Hardware – Inside a Computer

Hardware components
Anything in computer that we can touch and see is called hardware. Eg: Monitor, Keyboard,
Mouse, Tower (Enclosure of computer electronics), Motherboard, Processor, Storage Media
(HDD, Floppy, CDs, DVDs, BlueRay Discs). We will discuss these parts one by one that comprises
of computer hardware.

Hardware is basically all the components used in interacting with computer, getting output from
various devices like monitor, printer etc. Another example: You touch and see a Disk (CD), you
can also read what the label or print says on CD, but you cannot read the information stored in
it; thus CD is h/w, the information that is stored can be categorized as software. Few of
important hardware devices are discussed below:

Display Unit
Also known as Visual Display Unit (VDU), Monitor. It is a device that re-produce images, both in
motion and static. The Display unit that we would be referring would be in terms of computers.
Basic function of a VDU is to show output of a computer through a VGA (Video Graphics
Adaptor) port. This VGA port is either built into the motherboard, or comes as a separate card,
called as a Display Card (or Video Graphics Adaptor, or PCIx16 card, or simply Graphics Card). A
VDU usually has two kinds of cables attached to it, one is for the power, which supplies power
for its circuits and screen, the other is a data cable, which receives the signals for showing image.
The other end of data cable is attached to Video Display Graphics Card.

A display device comprises of many coloured dots aligned and combined together, to form an
image on screen. Each of this ‘dot’ comprises of three basic colours, Red, Green and Blue. Each
colour can have values ranging from 0~255, ie each colour can have 256 colour shades. That is,
each dot can represent upto 256x256x256 colours. It is combination of these three basic colours
that finally show one dot in a particular colour and shade. Each of this light (combining Red,
Green and Blue as one group), is called a pixel. In a typical monitor, there are 1024x768 pixels in
a screen, which means it has 8,04,864 pixels in total. It repaints each of these pixels for approx.
60 times in a second. This is called the refresh rate of the monitor. This refresh rate varies from
1
60 to 100 Hz (times in one second). Refresh rate is represented in Hz, ie, monitors are refresh
rate is 60Hz to 100Hz. So, when we purchase a monitor, we have to decide on these basic
properties, and then size of the screen as per your requirement. The more the refresh rate, the
smoother motion images you will see.

Broadly, there are two categories of a monitor.

Flat Panel monitors


LCD Monitors. LCD Monitors stands for Liquid Crystal Display. It works on
the Crystal movement principle. The crystals oscillate several times in a
second to block each colour within a pixel.
TFT Monitors TFT are advanced form of LCD monitors. TFT stands for Thin
Film Transistors. Instead of crystals, it is array of transistors that do a similar
job of stopping coloured lights. LCD and TFT both are now getting more and
more advanced. However, LCD and TFT are very close to real colours, but
not exact. With advancement of technology, these monitors are becoming
closer to real colours.
LED Monitors: It is usually a TFT, except the light source for these are LED
that are spread across the screen. In other cases, the light source comes
from one end, and is distributed by help of glass aligned in such a way that it
spreads light evenly. Advanced LED monitors have a feature, that it can be
turned OFF on certain area, which helps to increase contrast ratio; eg: A
portion where screen needs to show Black colour, it simply fades, or switch
OFF LEDs in that area.
Plasma Monitors: It comprises of a gas filled in a vacuum sealed tube.
These inert gases get converted into plasma when electricity is passed
through them. This excites phosphorous and causes it to emit light. It is
very different from LCD/TFT, and a technology, closer to CRT TVs. Plasma
displays are brighter and usually these screen comes in sizes greater than 32
inches only. However, their power consumption is higher as compared to
LCD TVs. The power consumption varies as per the display required.

CRT Monitors: CRT Stands for Cathode Ray Tube. In this, an electric beam is fired
with help of electron gun, the same beam gets deflected with help of electro-
magnet & circuits in such a way, that the beam passes from left to right, completes
one line, and then moves to second line, left to right, and this way it hits the screen

2
surface scanning from left to right and reach bottom. A typical computer monitor
screen completes such 60 cycles in one second.
Each of this one cycle is called refresh rate. Monitors have refresh rates of 60Hz
to 100Hz.

Part Names:

1. Electron Guns. Three are here for Red Green and blue.
2. Electron Beams.
3. Focusing Coils.
4. Deflecting Electromagnetic coils
5. Anode Connection
6. Mask for separating beams for red, green and blue.
7. Phosphor layered with red, green and blue zones.
8. Enlarge view of phosphor coated on the inner side of the screen.
Input Devices: There are various input devices that can be used in a computer. The basic
input device being a Keyboard.
These input devices acts as a media that enables a user to give instructions and data
to computer, so that necessary output (later or immediate) can be achieved. As
discussed, keyboard is the basic input device. If a computer’s hardware parts are ok,
the computer supports basic input output devices without the need of extra
software (drivers). This means, even if computer software is not installed, your
keyboard (basic input devices), will communicate with the computer and give

3
instructions and signals to computer. Other input devices can be named as: Mouse,
light pens, touch screens, scanner, web-cam, microphone, punch cards, etc.

The Tower / Computer Casing: It is an enclosure for Power Supply (SMPS), Motherboard,
Processor, RAM, HDD, Optical Drives, Add-on cards and other internal parts that can be
plugged directly on the motherboard. It is specifically designed in such a way so that it helps
in maintaining a proper flow of air so that internal parts stay cool enough and give optimum
performance, and of course in open, things may not look tidy. Each device inside the tower
has a specific space, so it helps in easy maintenance and upgradability. Eg: HDD has its bay,
FDD has its bay, Optical drives, motherboard, SMPS, all have their respective bays. These
towers are defined as per various factors like size of the motherboard being the most
important factor and have different names as per different sizes and frames (form factor of
motherboards). Various form factors of casing/motherboards, in increasing order of sizes:
Pico-ITX, Nano-ITX, Mini-ITX, Micro-ATX, Standard-ATX. Computer case is often also called
as CPU (Central Processing Unit). But CPU can be misleading, as it is also called for a
Processor (Micro Processor). So, it is recommended to call Computer Casing or Tower which
specifies clearly, which part of computer we are referring.
SMPS: SMPS, stands for Switch Mode Power Supply.

This is a technology which is different from conventional transformer based


electronics. It is basically designed to step down the normal domestic AC
(Alternating Current) voltage to DC (Direct Current) voltages that can be used by
computer parts. Ie, it not only coverts from AC to DC, but also steps down voltage.
Typically, a HDD, FDD, Optical Drive will have a different input voltage as compared
to Memory, Processor and similar stuff. Most of drives operate their circuit on 5v,
and run (spin etc.) motors on 12v. A typical SMPS has 20/24 pin connector that
connects to motherboard, to give supply to various circuits on motherboard.

4
Motherboard power Connectors for HDD / Optical Connector for
connector Drives / FDD SATA Drives

Motherboard: A motherboard is also known as Main Board or System board. Its


main task is to distribute power amongst various components embedded on the
motherboard, USB ports, and expansion cards (if installed). It also gives power to
processor, RAM. Please note that the motherboard distribute power to low
powered devices only. In case of heavy usage, the devices has to be powered with
SMPS output connector or any other external power source. Eg: A typical USB
Storage media may come with additional power adaptor to meet its power
requirements, however, it is connected to one of the USB port of motherboard as
well. Some Graphic Cards requires additional power.
The other most important task that a motherboard does is, to provide
communication channel between various components and chips. The wires that are
specifically designed to carry data – are referred as bus. There are various buses on
a motherboard, Eg: Front Side Bus or also called as FSB (Carries data between Main
Chipset on board and Processor), RAM Bus (carries data between RAM and main
chipset), Audio Bus, carries data (music) from main processor to audio ports.

Motherboard is identified by the name of main chip installed on it, which is usually
referred as ‘Chipset’. There are various chipset manufacturers available, e.g., Intel,
Apple (Macintosh), VIA, Sis etc. Since developing these kind of chips requires a very high
investment, and highly complex, only few manufacturers develop these chips. There are
various assemblers (motherboard manufacturers) who use these chips on their
motherboards. Eg: Asus may use Intel / VIA chipsets in their motherboard. These
motherboards are referred as Intel Chipset or VIA chipset boards made by Asus
respectively. There are various motherboard manufacturers like Asus, Intel, Gigabyte,
AsRock etc. A typical Motherboard picture is being displayed below, for reference only.

5
In the picture above, we have shown an Intel Chipset based motherboard,
manufactured by Intel.

Processor: Processor is also sometimes referred as the brain in the computer


system. It is to be noted that computers are not intelligent, ie, it is not able to think
or take any decision by itself. The term ‘Brain’ is used to denote that this is the main
part which carries out the calculations part.

Processor is the main component in the computer that that carries out and process
the instructions received. In most cases, these instructions are received from the
operating system in binary form (or machine language). It can be some instructions
invoked by a user or a program may also invoke it. As of now, few core manufacturers
of processor are: Intel, AMD, ARM (usually used for mobile utility, also built some of
Apple series like A4/A5/A5x) etc. The other manufacturers eventually merged into
bigger groups, Intel & AMD. It is the most important factor responsible for speed of
processing instructions speedily. It is the most important part of circuit that governs
6
speed of a computer. Since, processor is the fastest chip on any computer, the
instructions that are stored in RAM often faces speed mismatch and a fetch delay. To
overcome this, there is a small Memory that is usually in-built in the processor.
Conceptually, this is similar to RAM, but much faster than RAM (which is installed in/on
motherboard). This special memory is called CACHE memory. CACHE memory is usually
a Static RAM (SRAM), as compared to RAM, which is a DRAM (Dynamic Random Access
Memory).

However, it is other components like RAM, motherboard and OS combination that


overall govern the speed and efficiency of a computer.

The new generation processors are defined by speed and core. Earlier, technology
was very costly, but as the cost of technology became more affordable and smaller, the
concept of more core per unit became possible. A single core may be defined as one
processor in itself. Now days, there are multiple cores that are combined in a single
package, in other words, it has more than one processor packed into one-unit housing.
The initial range of processors launched by Intel with multiple processors was Dual-Core
Processors, also Intel released yet another series, which is popular as Core-2-duo. In
2006, they launched Quad-core. And now, their latest series processors for desktop and
high-end computing series are i5 & i7, which has again multiple cores and makes use of
CACHE memory dynamically. AMD also has multi-core processors like AMD Phenom-II,
AMD Athlon-II etc.

RAM: RAM also known as Random Access Memory. It comprises of chips that store
the data temporarily. The processor is known to be the fastest chip on any given
computer. But if the processor does not get instructions or calculations fast enough,
the speed of a computer will remain very slow. In order to queue up the list of
instructions going to Processor, all these set of instructions are stored in RAM first.
This process is known as loading of program or software. Once the program gets
loaded into RAM, the processor, and RAM & Processor go hand-in-hand to do the
required tasks. A typical example can be, eg: if you were preparing some notes in
Word Processing Software, and you typed few pages. While typing seventh page,
the power fails. If you did not tell the computer to save on a storage media
(HDD/USB Drive etc), you would end up in losing all of the matter that you typed,
because till you did not give any instructions, it was there in the RAM only. One may
ask, then why do we use RAM. Here is a small explanation: You would have seen,
when we double click any document file, it takes several seconds to come up on the

7
screen; this is due to the fact that all your document gets loaded into the RAM. This
is done once when you open it up. Now if RAM was not there, in that case, every
time when you modified any word / sentence, each time, it would have gone to
Hard-disk, and update there, and thus causing slow-work (delays caused by
mechanical movement). This would result in very slow speed, where hard-disk
would have been the bottle neck. Thus, memory is the key factor that governs the
efficiency of a computer. The more the memory, the more number of applications it
can handle. One can go to the Windows Task Manager and monitor the processor
and RAM usage.

It can rightly be said that Processor is directly related to speed of your computer,
and RAM is related to efficiency of a computer to handle number of programs.
Higher speeds of processor helps in processing calculations faster, and higher RAM
helps more programs to run at the same time, quicker.

HDD: HDD, called for Hard Disk Drives. These devices are categorized under storage
devices. It is capable of holding data even when the computer is turned off.
Humans record data on paper by writing with help of Ink and Pen. We store music
in form of Gramo-phone record discs, Cassettes, CDs, Blue-ray etc. All these are
storage devices, which keep the information for our usage. We can record or re-
record on them, until there life span and respective limitations of the media.
Similarly, Hard disk is one of the primary storage device used for storing programs,
data and information of a user on a computer.
A hard disk drive (HDD) is a type of data storage device that serves as the main
storage device on most computers as well as for a growing number of other
products, including video surveillance equipment, scientific instrumentation,
cameras, set top boxes for televisions, satellite TV receivers and portable music
players (such as Apple's iPod).

Hard Disk Drive may be defined as various kinds:

Magnetic Storage HDD:


These are simply called as Hard Disk Drive. It is a magnetic storage hard-disk
(conventional) is an ferro-magnetic type of data storage media. It is similar
to the technology that is used in audio cassette or VHS tapes combined with
the LP record technology. You may refer a picture to see a LP. Storage in a
8
computer context refers to devices or media that can retain data for
relatively long periods of time, for example, years or even decades. This
type of HDD consists of a rigid metal case that contains one or more
identical platters, at least two magnetic heads for each platter, a spindle
motor for rotating the platters, an actuator mechanism for moving the
heads, and control circuitry.

In this hard disk, there are many such LP that are placed on one another
(called as Platter). Except for top most and most bottom, nothing is
recorded, but on inner platters, data is recorded on both sides with help of
multiple heads. A platter is a thin, high-precision aluminium or glass disk
that is coated on each side with a very thin layer (typically only a few
millionths of an inch thick) of a high-precision magnetic material in which
the actual data is stored. Data is written to and read from this coating by
magnetic heads, which are highly sensitive and high-precision
electromagnets. There is usually one head for each side of each platter.

The magnetic coating on each side of each platter is divided into tracks. A
track is any of the concentric circles on the magnetic media over which one
head passes while the head is stationary but the disk is spinning. Each track

9
on a modern HDD has a width of only a few microns (i.e., millionths of a
meter), and there can be tens of thousands of tracks on each platter. The
thinner the tracks, the greater the storage capacity of the disk. A single,
imaginary, concentric circle that cuts through all of the platters and includes
the same track on each side of each platter is called a cylinder.

Tracks are divided into a number of segments called sectors. Each sector
generally contains 512 bytes and is the smallest unit of data that can be
accessed by a disk drive (although software makes it possible to access
individual bytes and even individual bits). The operating system keeps track
of where data is stored by noting its track and sector numbers.

The first commercial HDD was launched by IBM in September 1956.


Designated the IBM 350 RAMAC (random access method of accounting and
control), it had a capacity of about five megabytes, which was attained by
using 50 platters, each 24 inches in diameter. The average data access time
was very slow by today's standards, largely because there was only a single
head to access to all the platters. IBM's 3340 Winchester disk system,
introduced in 1973, was the first HDD to use a sealed head/disk assembly,
which substantially improved performance and is now standard.

The history of HDDs has been one of continuous rapid progress, particularly
with regard to capacity, reliability, miniaturization and cost reduction. This
has been the result of advances in a number of areas, including the
development of improved magnetic media with greater areal density (i.e.,
increased data storage capacities per unit of area), increased precision of
the heads, motors and other mechanical parts, and improved control
circuitry. Please refer to a block diagram illustration of HDD:

10
Each of hard disk drive has 2 main connectors, one is for the power, which is
drawn directly from SMPS and the other is Data cable. Based on this, there
are two type of hard disks, one is PATA (Parallel ATA) and SATA (Serial ATA)
where ATA is short for Advanced Technology Attachment. [In earlier forms
of computing, computers were named as XT (eXtended Technology) and AT
(Advanced Technology)].

Solid State Hard disk: There has been much interest for a number of years in
replacing HDDs with solid state storage, mainly flash memory, because solid
state devices feature huge advantages with regard to weight, power
consumption, shock resistance and longevity. In contrast of conventional
memory, flash memory retains its contents even in the absence of a power
supply. FLASH is a chip that is also widely used in USB-Pen-Drives.
However, because the cost on a per-bit basis is still far higher for flash
memory than for HDDs and substantial improvements are continuing to be
made in HDD capacity and performance, no large-scale replacement is likely
for a few of years. Rather, replacement will continue to be mainly for
applications for which miniaturization and durability are more important
than price, such as ultra-portable computers, portable music players,
scientific instrumentation and military laptops or military equipments. SSD

11
HDD are coming in maximum of 128GB, only few offer up to 800GB but
since the cost of FLASH is declining, it seems to be a good replacement for
the conventional magnetic HDD in future.

Optical Disk Drives (OOD): An optical disc drive (ODD) is a disk drive that uses laser
light (or electromagnetic waves near the light spectrum) to read or write data to or
from the optical storage media (referred as discs). Some drives can only read from
discs, but recent drives are commonly both readers and recorders (also known as
writer). Compact discs, DVDs, HD DVDs and Blu-ray discs are common types of
optical media which can be read and recorded by such drives. The most initial disks
were CD (Compact Disc). It had storage capacity ranging upto 650MB. Then due to
advancement in laser technology, a thinner beam was used to read and write, thus
was able to store more data on same sized disc, this was DVD. DVD could store
4.7GB of data, and on single sided double layer, it can store 8.5GB of data. Yet
another spectrum of light was discovered, and now the latest form of Optical drives
is Blue Ray Disc. These are very expensive, but has massive storage capacity. A
single layer Blue ray can store up to 50GB and a dual layer can store up to 100GB.

Connector ports on a computer


PS2 Connector: Please do not confuse it with Play Station 2, this is a
universal standard port that is primarily used for connecting keyboard and
mouse. It is named after the designers called it – IBM Personal System / 2.
A colour coding is followed, which is Purple colour for keyboard and green
colour for Mouse. This port serves as an attachment for dedicated primary
input devices. You may also find few Barcode scanners with PS2
connectivity.
Audio Ports: These ports are input/output ports, receive analog signals
from mic or line-in and deliver analogue and digital sound that needs to be

12
amplified with help of amplifiers or amplified speakers (multi-media
speakers). However, headphones can be directly connected to these ports.
All analogue ports are usually 3-pin 3.5mm stereo standard plug. It is usually
used with other music devices like walkman, MP3 players, in some iPods and
Music systems also.
Serial Ports: It is a communication port, which transfers information one bit
at a time. It is getting phased out and is being replaced with USB port.
These ports were used with dial up modems, or other communication
devices.
Parallel port / LPT1 port: This is also another form of Communication port.
It can transfer several bits of information at a single time. This was primarily
used for printers. This way, printer could send feedback to computer about
paper jam etc. This port was also known as Printer Port. This technology is
again getting phased out, and USB is being used rather parallel port
technology.
USB Ports: USB stands for Universal Serial Bus. It is yet another form of
communication port on a computer. Small devices can draw power of 5V
from this port itself and does not required to be attached with another
power source. Eg: A printer consumes more power, so an additional power
adaptor needs to be attached, and communication can be done with a USB
Cable; on other hand, a USB Pen drive can be powered by the USB port
itself. There are 3 versions of USB that are available. USB 1.x (where x is 0
or 1, ie 1.0 & 1.1). This was one of the first releases. It had transfer rate of
1.5Mbits/sec and 12Mbits/sec. USB 2.0 has a transfer rate of up to
480Mbits/sec. USB 3.0 has a transfer rate of up to 3.2Gbits/sec.
Peripherals
Printers: It is a device/peripheral that is used to produce a hard copy of
digital information stored in form of a file or picture on a computer storage
device. Printers usually have two connectors, one is the power port, the
other is communication port. The communication port may be through a
wire or wireless (Bluetooth or WiFi). Again the printer can be connected to
a personal computer via LPT cable, USB cable or it may be connected to LAN
port (RJ45) through a network hub/switch. As printer has many
requirements in day to day life, it is also sometimes combined with multiple
features, like it can be a Scanner, Printer, Photo-copier or an All-In-One (AIO)
which serves purpose of Scanner, Printer, Photo-copier and fax. However,
13
there are several type of printers and are categorized by their print
technology:

Impact Printers or DMP: These are one of the oldest technology


and it impacts on a paper with help of a mini hammer. In between
paper and hammer, there is a coloured ribbon (usually black) which
creates an impression on the paper. Earlier it was used for legal
documents and where one requires multiple copies together (Eg:
Invoice Printing). DMP stands for Dot Matrix Printer. As the name
states, it combines several small hammers to form a letter or an
image. This printer is very noisy, very slow on graphics, but still in
use. Due to the fact that it uses impact on a printer, it is still widely
used in banks. Your ATM pins etc. are still printed through this
printer only. The ribbon is removed, and self-carbon special
envelope is used to do the required task.
Inkjet Printers: These are also known as bubble jet printers. It
sprays an ionized tiny drops of ink on paper to create an impression.
This is achieved by using magnetized plates which directs the path of
tiny ink drop on the paper. Almost all Ink-jet printers give colour
printing too. These printers are again sub-divided in terms of
resolution and speed of printing. The normal resolution by these
printers is 300 dots per square inch. This is also said as 300 dpi. This
is most commonly house-hold printer. The initial cost of printer is
very less, but ink cartridges are slightly expensive.

Laser Printers: These printers throw a laser beam on drum to create


an image. The drum is then rolled through a pool or reservoir, or
toner, and the electrically charged portions of the drum, pick up ink.
Finally using a combination of heat and pressure, the ink on the
drum gets transferred on the page. Laser printer prints very fast.
Laser printers are best suited for SOHO (Small Office Home Office)
applications. Colour Laser printers combine four different toner
colours. Another form of printer which is LED/LCD printers. It uses
LED / LCD instead of laser. However, these (LED/LCD) printers are
not very common.

14
Plotters: These are very accurate at producing long line drawings.
They are commonly used for technical drawings, such as engineering
drawings or architectural blue prints. There are two type of
plotters, flat bed and drum plotters. Flatbed plotters are
horizontally aligned with a flat surface to which a piece of paper is
attached; the paper remains stationary and the printer moves pens
across the paper to draw the image. Drum plotters, also called
upright plotters, are vertically positioned. They have a drum that the
paper rolls on. Drum plotters usually make more noise and are more
compact than flatbed plotters.

Thermal Printers: These are usually Label Printers. One may notice
these printers in grocery shops, airports etc. These are very fast and
have a very high resolution prints. Usually used for printing bar-
codes. It works similar to impact printer, but instead of impact, it
heats the ribbon which forms an image on a special paper. In some
cases, the ink disappears from paper.

Scanners: A scanner is a device that captures images from photographic


prints, posters, magazine pages, and similar sources for computer editing
and display. Scanners come in hand-held, feed-in, and flatbed types and for
scanning black-and-white only, or color. Scanners usually come with
software, that allows to resize and modify a captured image.

Joystick: A joystick is an input device consisting of a stick that pivots on a


base and reports its angle or direction to the device it is controlling. Joysticks
are often used to control video games, and usually have one or more push-
buttons whose state can also be read by the computer. A popular variation
of the joystick used on modern video game consoles is the analog stick.
Some joystick also provide feedback to user in form of vibrations. There are
two popular ways to connect a joystick, which is a Game port or USB port.

Multimedia Speakers: These are output devices that are categorized under
multimedia. These devices produces output in form of sound. These are
connected through the sound port present on the computer. The sound
port may be on-board or additional sound card.

15
USB Pen Drives: These are the most convenient way to carry data along
with. It is very portable and very reliable. It uses USB port for
communication and can be plugged in or plugged out even when the
computer is on. In other words, this device is hot plug and play. It gets
detected automatically in new versions of OS (WinXP, Vista Windows 7,
Windows 8 etc), so there is no need of extra software. However, there are
few applications that come with password protection features, which
requires an additional software to be installed / configured.

Web-Camera: Also called as Web-Cam. It is a device that captures video


and sends it to computer. The most common use of web-cam is video chat
through various messengers. Messengers like Windows Live, Yahoo
messenger, Skype are the most common applications that use it.

Add-on Cards:
We use a wide variety of electronic gadgets. Depending upon the power consumption, they
have different kind of plugs that goes into the wall-socket. Most commonly used are 15amp
and 5amp plugs in house-hold gadgets. You just have to check the voltage and power
specifications provided by the manufacturer, and then you can plug in your device for
power. It is because, it is an industry standard followed by various manufacturers, which
makes several electrical appliances to be pluged in any wall socket (provided it meets
technical specifications). Similarly, in computers, people wanted more objectives to be
achieved using the same hardware. Eg: requirement of a Fax-modem card; so that the
computer can also be used as a fax machine. One could easily use computer for sending a
Word Processor document through fax directly. In order to add these specific hardware
cards, a common design format is followed. The most common add-on card is being listed
below:

PCI Add-on Card:


PCI stands for Peripheral Component Interconnect. It is a facility provided on the
motherboard (through sockets) for attaching peripherals devices to a computer
motherboard. It is the most popular local Input / Output bus used. PCI provides a
shared data path between the CPU and peripheral controllers in every computer
models. It supports 32-bit and 64-bit data paths, that can run 33MHz or 66MHz
speeds. These cards usually come with device driver software so that the same can

16
be configured on respective Operating System. They may require additional
application software in order to make use of the card.

TV-tuner card: It is a device that plugs into PCI socket of motherboard. It


helps in watching cable-TV or TV on your computer. An optional remote is
also present, to change channels, volume etc or changing from TV to PC.
Some TV-Tuner card also gives a recording feature.

Graphic Enhancement Cards (PCIx16, PCI): These add-on cards are usually
installed on high-speed ports on motherboard. Usually motherboard these
days come with on-board Display Graphic Chip, but, some users like DTP
designers, Architects, Gamers install Accelerated Graphic Cards that
enhances the quality of display to higher resolution and faster refresh rate
(which smoothens motion). These card have their own memory (RAM) and
own processor. Indirectly, these cards reduce ‘the load’ on processor to
some extent.

PCI dialup modem: This add-on card helps doing fax machine functions
from computer. It also helps in connecting to internet through dial-up
connection. Modem basically means a MOdulator and DEModulator. The
computer sends and receives data in digital form, and Modem receives and
sends data in analogue form. It converts digital data to analogue and vice-
versa. This technology has become obsolete and is hardly being used these
days, as broadband has become a lot cheaper, faster and allows both phone
and Internet to work simultaneously. Please note, even broadband devices
are modems, but these are much faster. These DSL modems do not come
on-board, all ISP (Internet Service Providers) provide their own version of hi-
speed modem.

PCMCIA Cards: It stands for Personal Card Memory Card International


Association. PCMCIA is an organization which consists of around 500
companies, and developed a standard for small credit card devices (usually
for laptops). These are cards that are used in Laptops to extend the
Memory in a laptop, or add modem functionality in laptop. These cards are
hot swappable, ie, they can be removed or plugged in, even when computer

17
is on. However, this concept is already obsolete, and is replaced by yet
another technology called USB.

USB Data Cards: These days, a portable modem is more in practice. It uses
air as a medium to connect to their servers, similar as a mobile phone works.
Like Mobile phone wirelessly connects to the nearest tower, and carries
voice, similarly these portable modem carries data. Various services
providers available in India are Tata-photon, Reliance Netconnect, MTNL
wireless 3G model etc.
Various hardware devices just make one part of computer, which virtually cannot do anything on its
own. Even if you assemble all these devices, and turn on, it will just wait for the instructions of
software, ie Operating System. The same is covered in later sections.

18
Software – What makes a computer dynamic

In early phases of computers, the machines were built with specific tasks that it had to
perform. That is, if one had to change the purpose, one had to re-design or modify the
system itself to accomplish the required task. In 1801, Joseph-Marie Jacquard developed a
loom, in which the pattern being woven was controlled by punched cards. This did not
require any modification in the loom. This punch card served as what a software does now.
It is software that made our computers versatile. One can play a movie DVD or play games
or use Word Applications. The only thing a user has to do is, to choose appropriate
software. However, software is of various kinds which is categorized by the role it plays. A
software acts as an interface between the computer hardware (electronics) and the end
user who makes use computer for specific purpose. Software may be categorized as
Operating System, Driver Software & Application Software.

Operating System: It is also referred as OS (Operating System), and a OS which is dedicated


to work in a Network Environment, is called NOS (Network Operating System). OS is an
interface between user/application and hardware. A user may be the end user or in form of
an Application program. There are broadly two ways to interact with OS, i.e., it gives two
kinds of interface to user to work. In OS like DOS (Disk Operating System), it has a CLI
(Command Line Interface); In this, the user needs to remember specific commands for any
specific purpose, and has to type specific commands on a prompt which is provided by the
OS to accomplish task.

19
These days, the interface is GUI (Graphical User Interface). Its interface is picture
(graph) based interface. GUI gives many ways to approach for same application e.g.
we can open MS Paint by either clicking on a command by clicking on Start->All
Programs->Accessories->Paint to open Paint application or we can create a shortcut
on screen (desktop) and double click Paint icon. It gives us various options of
approach to fulfil desired goals of running an application.

Above, a typical screen of Windows 7

20
A typical desktop of Windows 8

Desktop of a typical Mac OS

An OS also handles the resource allocation, sharing of resources and access


prevention of the hardware resources. It also offers APIs (Application Programming
Interface) or system calls to other application software(s). API is basically common
calls that are required for basic operations; this usually happens between the
application and OS. For this the application programmer does not need to re-write
code in their respective program/application. Eg: If a programmer developed a
word processor application, he does not have to write a code on how to save a file,
instead, he just needs to send a request call to the OS and supply the necessary
parameters ie, the programmer will request for a File-Save to OS, and provide File
Name, File size, Location and drive where it has to be saved, the OS in turn will check
for available space, validate the drive and save at a physical space (media). Similar
to these, many of APIs are provided by the OS itself.
21
An Operating System (OS) is a computer program (software) that manages the
hardware and software resources of a computer. At the foundation of all system
software, the OS performs basic tasks such as controlling and allocating
memory, prioritizing system requests, controlling input and output devices,
facilitating networking, and managing files. It also provides an User Interface
command line interface (CLI) or graphical user interface (GUI) for higher level
functions. General-purpose computers, including personal computers and
mainframes, have an operating system (a general purpose operating system) to
run other programs - run application software. Examples of operating systems
for personal computers include Microsoft Windows, GNU/Linux, and Mac OS.

Some operating-system vendors do build many more utility programs and auxiliary
functions into their operating systems (e.g.: Calculator, WordPad, Internet Explorer-
IE etc. in Windows).

Few of important functions of OS are given as below:

Processor Management: This part of OS schedules the work to be done by the


processor. As most computers have one processor, it involves the computation and
distribution of "timeshares" (picking different tasks/processes/‘threads’ at fixed time
intervals).

Thread: A thread is a single sequential flow of control within a program. A


thread itself is not a program. It cannot run on its own, but runs within
program.

There are various state of processors, however, few of them are only listed
as below:

22
Suspend / Suspended: Many processes consume no CPU time until they get
some sort of input. This is a state where the process is put ‘on hold’ for
some time or if it is waiting for some further instruction. For example, a
process might be waiting on a keystroke from the user. While it is waiting for
the keystroke, it uses no CPU time.

Thrashing: CPU cycles different process/threads to accomplish its task in


different intervals of time. If the user tries to have too many processes
functioning at the same time, which requires equal priority. The operating
system itself requires some CPU cycles to perform the saving and swapping
of all the registers, queues and stacks of the application processes; the
computer may seem to get ‘stuck’, mouse may not respond immediately.
This is called as Thrashing. To avoid Thrashing (to some extent), OS divides
Processes into multiple threads.

Memory (RAM) Management: OS coordinates the memories by tracking which one


is available, which is to be allocated or de-allocated and how to swap between the
main memory (RAM) and secondary memories like HDD (Virtual Memory).
Virtual Memory: Virtual Memory allows software to run in a memory
whereby memory pages stored in primary storage (RAM) are written to
secondary storage - usually Hard Disk Drive (often to a swap file or swap
partition), thus freeing faster primary storage for other processes to use.

Device Management: OS also manages Devices with help of Drivers (Device Driver
Software). It is a Specific type of computer software developed to allow interaction
with hardware devices. A driver's function is to be the translator between the
electrical signals of the hardware subsystems and the high-level programming
languages of the operating system and application programs. Eg: Managing External
Hard-Drives, Managing Printers, scanners etc.
Storage Management: This part of OS manages drives, storage systems and files,
file securities. Operating systems have a variety of native file systems. Linux has a
greater range of native file systems, those being: ext2, ext3, ReiserFS, Reiser4, GFS,
GFS2, OCFS, OCFS2, NILFS and Google File System. Linux also has full support for XFS
and JFS, along with the FAT file systems, and NTFS. Windows on the other hand has
limited file system support which only includes: FAT12, FAT16, FAT32, and NTFS. The
NTFS file system is the most efficient and reliable of the four Windows systems. FAT
23
is older than NTFS and have limitations on the partition and file size that can cause a
variety of problems.
OS manages saving your files, e.g., when you go to save a file in word
processing application software, the application software sends few of
request to software with the following information, (a) location (b) file name
(c) file size; the OS will check if there is enough space at the location, if yes,
then it will physically write on the media (HDD/USB Drive etc.) and revert
back with confirmation. All such background activities are taken care by OS.

Application Interface (APIs): Just as drivers provide a way for applications to make
use of hardware subsystems without having to know every detail of the hardware's
operation, application program interfaces (APIs) let application programmers use
functions of the computer and operating system without having to directly keep
track of all the details in the CPU's operation. Eg: A programmer does not have to
write a code for saving a file from its application, the programmer just sends a
request to OS to save a file for that Application. Eg: Saving a file in Word-Processor
Program. The Word Processor collects information from the user, send the same to
one of API of OS.

User Interface: It brings structure to the interaction between a user and the
computer. A GUI (Graphical User interface) has a desktop environment, OS has
START menu, where one can choose various applications like Windows Explorer
helps to browse folders and files. DOS uses DIR command, which is CLI (Command
Line Interface).

Driver Software: Driver Software: A driver is a specially written program which understands
the operation of the device it interfaces to, such as a printer, video card, sound card or CD
ROM drive. It translates commands from the operating system or user into commands
understood by the component computer part it interfaces with. It also translates responses
from the component computer part back to responses that can be understood by the
operating system, application program, or user.

Application Software: Applications Software: A software which is specifically designed to do


a typical job work. Includes programs that do real work for users. For example, word
processors, spreadsheets, and database management systems fall under the category of

24
applications software. Eg: Word Processing software, Spread-sheets, Graphic multimedia
software like Photoshop, PowerPoint; Tally.

Basic working of Word Editor Software: It is a computer application


software used for composing, editing formatting and printing (on printable
material). Earlier uses of typewriter in office was replaced by electronic
type-writers and later word processing software applications on computer.
The main difference in this software is that one can validate the text before
getting a hard copy. They are also equipped with a library where it checks
for spellings, grammar etc. It can also be save your work for later use and
reference. Functions like Mail-merge can be used, where one needs to send
similar matter to a lot of people. Few of most popular Word Editing
Software is MS Word, also, Document Writer by Open Office (free). A
sample screen shot is taken from Open Office Writer:

Basic working of Spreadsheet: A spreadsheet is an application software


that maintains accounting similar to one that is maintained on a paper. It
displays multiple cells that together make up a matrix consisting of rows and
columns, each cell can contain alphanumeric text or numeric values. A
spreadsheet cell may also contain a formula that defines how the content of
that cell is to be calculated from the contents of any other cell (or
combination of cells) each time any cell is updated. Spread sheets are widely
used to maintain track of inventory, prepare and summarize management
reports, as with its basic tools, various numbers can be represented in form

25
of charts (bar graphs, pie-charts etc). Spreadsheets are also used for
financial information because of their ability to re-calculate the entire sheet
automatically after a change to a single cell is made. You may refer to a
sample spread-sheet as below.

Most popular spread sheet software is from Microsoft, called as MS Excel


(Microsoft Excel). MS Excel being expensive, other open source applications
are available freely from internet. Open Source provides Open Source Calc
(Short for Calculator).

Computer Languages
Have you ever wondered, how you utilize your computer by using Spreadsheets,
Games, Word Processor software etc.? Yes these are the Application Software. But
these have to be built so that you can use it. The software like Operating Systems
(Windows XP, Linux etc) are also built using some tools. These tools are usually
developing languages. These tools help making software. It is special software that
helps constructing software. These may be an OS, Application software, Driver
Software or another language in itself. You may have heard about Assembly,
Borland C, Visual C++, FoxPro, Fortran; all these are examples of Computer
Languages. Computer languages are basically responsible to create a special file that
can be executed by the computer, which can be executed directly, or with help of
Operating System. All programming computer languages are written for a specific
hardware environment. Please note, hardware environment is basically the
Processor architecture for which it is made.

26
HIGH LEVEL C, Visual Visual
Fortran Pascal
LANGUAGE C++ BASIC

ASSEMBLY LANGUAGE
LOW LEVEL
LANGUAGE
MACHINE LANGUAGE

HARDWARE

Binary Language. This is the most primitive form of language in digital


electronics, also used for computers. We use decimal number counting
system, which comprises of ten numerals, 0~9. We can make smallest and
biggest number by arranging these numerals in any combinations. Eg: 1,234
= One Thousand Two Hundred and Thirty Four; 05 = Five; 50 = Fifty; And so
on. Similarly, Computers use binary number system, which has 2 numerals,
0 & 1. We can make any number, but we have to ensure using of only these
numerals – 0 and 1. 1111101000 = Equivalent of decimal 1,000. One may
ask, why to complicate simple numbering system (that we know as decimal
numbering system). Well, one first has to understand, that it is easier for us
humans to understand decimal number system, as we have been using since
we get to know counting. However, the main reason that lies behind using a
binary system is rather different numbering system is because of electrical
characteristics; in any electrical circuit, or especially in context of digital
electronics. This concept is based on ON and OFF. If there is any voltage,
the condition is assumed as ON, and if no voltage, then it is considered as
OFF. OR in other words – ON=1, OFF=0. The combination of these ON and
OFF makes the digital electronic world possible. In lay man’s term, all digital
circuits are expressed in binary language. With help of few complex

27
electronic circuit and software (or firmware), it becomes possible to process
information. These also include instructions like add, subtract, multiply,
divide numbers etc. Quite often, Binary language is also known as Machine
Language, as it is the most basic, or most low-level language that a digital-
circuit (especially computers) can understand. In early phases of computer
development, the complete circuit had to be changed and re-engineered to
work for different tasks. In second phase, these computers were modified
to take input with special Punch cards, these punch cards served both
purposes, storing the program (set of instructions) and act as input device
for the system. Now-a-days, we can use keyboard, mouse, touch screens to
key in instructions or run programs that are stored or installed in permanent
storage media like Hard-disk, USB or any other storage media.

Low-Level Language: Since, computers made in initial stages had very


limited functionality, and had almost fixed role to do. However, a need was
felt about making computers more dynamic. So instead of modifying the
electronic circuit each time, they introduced software, which is easier to
change and configure; as compared to change of circuit or basic design of
machine. Instead of giving commands through punch cards, switches, now
they wrote instructions. Of course, these instructions were different from
our day to day life instructions, people had to be trained to use these
methods. These methods are basically set of instructions that may be
stored or given at real-time. Till date, Assembly Language is considered to
be most powerful software to write a code. But since, software written on
Assembly contained very specific instructions, one has to re-compile
complete instructions (programs) for a different platform (hardware). A
utility program called an assembler is used to translate assembly language
statements into target computer’s machine code.

High-Level Languages: Then came BASIC, Fortran, Cobol etc. These


languages were closer to our English language. The languages that are
closer to English Language and are called as high level language. These are
easier for us to understand, but it requires more interpretation for the
computer to understand the instruction in binary language. Since, it passes
through many stages of interpretation, it becomes slower than the low-level
languages like Assembly. These languages are much simpler to understand
28
and much easier to manage. These have an advantage over low level
languages, is that their programs are more flexible and can run on various
different hardware platforms.

The software makes a computer dynamic and versatile. The same set of hardware can be used as
gaming device, entertainment device, web browsing, search information, etc.

29
Chapter 2: Harddisk cloning, Backup, restoration

Harddisk Backup

Backup-to-disk refers to technology that allows one to back up large amounts of data to a disk
storage unit. The backup-to-disk technology is often supplemented by tape drives for data archival
or replication to another facility for disaster recovery. Additionally, backup-to-disk has several
advantages over traditional tape backup for both technical and business reasons

Type of Backup

Unstructured

An unstructured repository may simply be a stack of or CD-Rs or DVD-Rs with minimal


information about what was backed up and when. This is the easiest to implement, but
probably the least likely to achieve a high level of recoverability as it lacks automation.

Full only / System imaging

A repository of this type contains complete system images taken at one or more specific
points in time. This technology is frequently used by computer technicians to record known
good configurations. Imaging[3] is generally more useful for deploying a standard
configuration to many systems rather than as a tool for making ongoing backups of diverse
systems.

Incremental

An incremental style repository aims to make it more feasible to store backups from more
points in time by organizing the data into increments of change between points in time. This
eliminates the need to store duplicate copies of unchanged data: with full backups a lot of
the data will be unchanged from what has been backed up previously. Typically, a full
backup (of all files) is made on one occasion (or at infrequent intervals) and serves as the
reference point for an incremental backup set. After that, a number of incremental backups
are made after successive time periods. Restoring the whole system to the date of the last
incremental backup would require starting from the last full backup taken before the data
loss, and then applying in turn each of the incremental backups since then.[4] Additionally,
some backup systems can reorganize the repository to synthesize full backups from a series
of incremental.

Differential

Each differential backup saves the data that has changed since the last full backup. It has the
advantage that only a maximum of two data sets is needed to restore the data. One
disadvantage, compared to the incremental backup method, is that as time from the last full
backup (and thus the accumulated changes in data) increases, so does the time to perform
the differential backup. Restoring an entire system would require starting from the most
recent full backup and then applying just the last differential backup since the last full
backup.
30
Note: Vendors have standardized on the meaning of the terms "incremental backup" and
"differential backup". However, there have been cases where conflicting definitions of these
terms have been used. The most relevant characteristic of an incremental backup is which
reference point it uses to check for changes. By standard definition, a differential backup
copies files that have been created or changed since the last full backup, regardless of
whether any other differential backups have been made since then, whereas an incremental
backup copies files that have been created or changed since the most recent backup of any
type (full or incremental). Other variations of incremental backup include multi-level
incremental and incremental backups that compare parts of files instead of just the whole
file.

Reverse delta

A reverse delta type repository stores a recent "mirror" of the source data and a series of
differences between the mirror in its current state and its previous states. A reverse delta
backup will start with a normal full backup. After the full backup is performed, the system
will periodically synchronize the full backup with the live copy, while storing the data
necessary to reconstruct older versions. This can either be done using hard links, or using
binary diffs. This system works particularly well for large, slowly changing, data sets.
Examples of programs that use this method are rdiff-backup and Time Machine.

Storage Medium

Regardless of the repository model that is used, the data has to be stored on some data storage
medium.

Magnetic tape

Magnetic tape has long been the most commonly used medium for bulk data storage,
backup, archiving, and interchange. Tape has typically had an order of magnitude better
capacity-to-price ratio when compared to hard disk, but recently the ratios for tape and hard
disk have become a lot closer. There are many formats, many of which are proprietary or
specific to certain markets like mainframes or a particular brand of personal computer. Tape
is a sequential access medium, so even though access times may be poor, the rate of
continuously writing or reading data can actually be very fast. Some new tape drives are
even faster than modern hard disks.

Hard disk

The capacity-to-price ratio of hard disk has been rapidly improving for many years. This is
making it more competitive with magnetic tape as a bulk storage medium. The main
advantages of hard disk storage are low access times, availability, capacity and ease of use.
External disks can be connected via local interfaces like SCSI, USB, FireWire, or eSATA, or via
longer distance technologies like Ethernet, iSCSI, or Fibre Channel. Some disk-based backup
systems, such as Virtual Tape Libraries, support data deduplication which can dramatically
reduce the amount of disk storage capacity consumed by daily and weekly backup data. The
main disadvantages of hard disk backups are that they are easily damaged, especially while

31
being transported (e.g., for off-site backups), and that their stability over periods of years is
a relative unknown.

Optical storage

Recordable CDs, DVDs, and Blu-ray Discs are commonly used with personal computers and
generally have low media unit costs. However, the capacities and speeds of these and other
optical discs are typically an order of magnitude lower than hard disk or tape. Many optical
disk formats are WORM type, which makes them useful for archival purposes since the data
cannot be changed. The use of an auto-changer or jukebox can make optical discs a feasible
option for larger-scale backup systems. Some optical storage systems allow for cataloged
data backups without human contact with the discs, allowing for longer data integrity.

Solid state storage

Also known as flash memory, thumb drives, USB flash drives, CompactFlash, SmartMedia,
Memory Stick, Secure Digital cards, etc., these devices are relatively expensive for their low
capacity in comparison to hard disk drives, but are very convenient for backing up relatively
low data volumes. A solid-state drive does not contain any movable parts unlike its magnetic
drive counterpart, making it less susceptible to physical damage, and can have huge
throughput in the order of 500Mbit/s to 6Gbit/s. The capacity offered from SSDs continues
to grow and prices are gradually decreasing as they become more common.

Remote backup service

As broadband Internet access becomes more widespread, remote backup services are
gaining in popularity. Backing up via the Internet to a remote location can protect against
some worst-case scenarios such as fires, floods, or earthquakes which would destroy any
backups in the immediate vicinity along with everything else. There are, however, a number
of drawbacks to remote backup services. First, Internet connections are usually slower than
local data storage devices. Residential broadband is especially problematic as routine
backups must use an upstream link that's usually much slower than the downstream link
used only occasionally to retrieve a file from backup. This tends to limit the use of such
services to relatively small amounts of high value data. Secondly, users must trust a third
party service provider to maintain the privacy and integrity of their data, although
confidentiality can be assured by encrypting the data before transmission to the backup
service with an encryption key known only to the user. Ultimately the backup service must
itself use one of the above methods so this could be seen as a more complex way of doing
traditional backups.

Floppy disk

During the 1980s and early 1990s, many personal/home computer users associated backing
up mostly with copying to floppy disks. However, the data capacity of floppy disks failed to
catch up with growing demands, rendering them effectively obsolete.

32
Managing the data repository

Regardless of the data repository model, or data storage media used for backups, a balance needs to
be struck between accessibility, security and cost. These media management methods are not
mutually exclusive and are frequently combined to meet the user's needs. Using on-line disks for
staging data before it is sent to a near-line tape library is a common example.

On-line

On-line backup storage is typically the most accessible type of data storage, which can begin
restore in milliseconds of time. A good example is an internal hard disk or a disk array
(maybe connected to SAN). This type of storage is very convenient and speedy, but is
relatively expensive. On-line storage is quite vulnerable to being deleted or overwritten,
either by accident, by intentional malevolent action, or in the wake of a data-deleting virus
payload.

Near-line

Near-line storage is typically less accessible and less expensive than on-line storage, but still
useful for backup data storage. A good example would be a tape library with restore times
ranging from seconds to a few minutes. A mechanical device is usually used to move media
units from storage into a drive where the data can be read or written. Generally, it has
safety properties similar to on-line storage.

Off-line

Off-line storage requires some direct human action to provide access to the storage media:
for example, inserting a tape into a tape drive or plugging in a cable. Because the data are
not accessible via any computer except during limited periods in which they are written or
read back, they are largely immune to a whole class of on-line backup failure modes. Access
time will vary depending on whether the media are on-site or off-site.

Off-site data protection

To protect against a disaster or other site-specific problem, many people choose to send
backup media to an off-site vault. The vault can be as simple as a system administrator's
home office or as sophisticated as a disaster-hardened, temperature-controlled, high-
security bunker with facilities for backup media storage. Importantly a data replica can be
off-site but also on-line (e.g., an off-site RAID mirror). Such a replica has fairly limited value
as a backup, and should not be confused with an off-line backup.

33
Backup site or disaster recovery center (DR center)

In the event of a disaster, the data on backup media will not be sufficient to recover.
Computer systems onto which the data can be restored and properly configured networks
are necessary too. Some organizations have their own data recovery centers that are
equipped for this scenario. Other organizations contract this out to a third-party recovery
center. Because a DR site is itself a huge investment, backing up is very rarely considered the
preferred method of moving data to a DR site. A more typical way would be remote disk
mirroring, which keeps the DR data as up to date as possible.

Objectives

Recovery point objective (RPO)

The point in time that the restarted infrastructure will reflect. Essentially, this is the roll-back
that will be experienced as a result of the recovery. The most desirable RPO would be the
point just prior to the data loss event. Making a more recent recovery point achievable
requires increasing the frequency of synchronization between the source data and the
backup repository.

Recovery time objective (RTO)

The amount of time elapsed between disaster and restoration of business functions.

Data security

In addition to preserving access to data for its owners, data must be restricted from
unauthorized access. Backups must be performed in a manner that does not compromise
the original owner's undertaking. This can be achieved with data encryption and proper
media handling policies.

Data retention period

Regulations and policy can lead to situations where backups are expected to be retained for
a particular period, but not any further. Retaining backups after this period can lead to
unwanted liability and sub-optimal use of storage media.

Limitations

An effective backup scheme will take into consideration the limitations of the situation.

Backup window

The period of time when backups are permitted to run on a system is called the backup window. This
is typically the time when the system sees the least usage and the backup process will have the least
amount of interference with normal operations. The backup window is usually planned with users'
convenience in mind. If a backup extends past the defined backup window, a decision is made
whether it is more beneficial to abort the backup or to lengthen the backup window.

34
Performance impact

All backup schemes have some performance impact on the system being backed up. For example,
for the period of time that a computer system is being backed up, the hard drive is busy reading files
for the purpose of backing up, and its full bandwidth is no longer available for other tasks. Such
impacts should be analyzed.

Costs of hardware, software, labor

All types of storage media have a finite capacity with a real cost. Matching the correct amount of
storage capacity (over time) with the backup needs is an important part of the design of a backup
scheme. Any backup scheme has some labor requirement, but complicated schemes have
considerably higher labor requirements. The cost of commercial backup software can also be
considerable.

Network bandwidth

Distributed backup systems can be affected by limited network bandwidth.

Implementation

Meeting the defined objectives in the face of the above limitations can be a difficult task. The tools
and concepts below can make that task more achievable.

Scheduling

Using a job scheduler can greatly improve the reliability and consistency of backups by
removing part of the human element. Many backup software packages include this
functionality.

Authentication

Over the course of regular operations, the user accounts and/or system agents that perform
the backups need to be authenticated at some level. The power to copy all data off of or
onto a system requires unrestricted access. Using an authentication mechanism is a good
way to prevent the backup scheme from being used for unauthorized activity.

Chain of trust

Removable storage media are physical items and must only be handled by trusted
individuals. Establishing a chain of trusted individuals (and vendors) is critical to defining the
security of the data.

Measuring the process

To ensure that the backup scheme is working as expected, key factors should be monitored and
historical data maintained.

Backup validation
35
(also known as "backup success validation") Provides information about the backup, and
proves compliance to regulatory bodies outside the organization: for example, an insurance
company in the USA might be required under HIPAA to demonstrate that its client data meet
records retention requirements.[18] Disaster, data complexity, data value and increasing
dependence upon ever-growing volumes of data all contribute to the anxiety around and
dependence upon successful backups to ensure business continuity. Thus many
organizations rely on third-party or "independent" solutions to test, validate, and optimize
their backup operations (backup reporting).

Reporting

In larger configurations, reports are useful for monitoring media usage, device status, errors,
vault coordination and other information about the backup process.

Logging

In addition to the history of computer generated reports, activity and change logs are useful
for monitoring backup system events.

Validation

Many backup programs use checksums or hashes to validate that the data was accurately
copied. These offer several advantages. First, they allow data integrity to be verified without
reference to the original file: if the file as stored on the backup medium has the same
checksum as the saved value, then it is very probably correct. Second, some backup
programs can use checksums to avoid making redundant copies of files, and thus improve
backup speed. This is particularly useful for the de-duplication process.

Monitored backup

Backups are monitored by a third party monitoring center, which alerts users to any errors that
occur during automated backups. Monitored backup requires software capable of
pinging[clarification needed] the monitoring center's servers in the case of errors. Some monitoring
services also allow collection of historical meta-data, that can be used for Storage Resource
Management purposes like projection of data growth, locating redundant primary storage capacity
and reclaimable backup capacity

36
Disk Cloning

Disk cloning is a direct disk-to-disk method for creating an exact copy. There is no intermediary
process, simply connect both drives and clone the contents from the source drive to the destination
drive. Once completed, the two drives will be interchangeable, both will hold the identical data (at
time of cloning) and both will boot to the identical system.

How it is different from Harddisk Imaging

Disk imaging, on the other hand, is NOT a direct disk-to-disk method and requires an intermediary
process. One cannot create a copy of a hard drive simply by placing a disk image file on it; the image
needs to be opened and installed (restored) on the drive via the same imaging software that was
used to create it.

1. Create a disk image and save to external media


2. Restore that image from external media to selected drive

Disk Cloning creates a verbatim copy of Disk Imaging copies the entire contents
the entire disk contents on a second of a hard drive into a single compressed
hard drive file, generally of a proprietary format.

 Disk Cloning takes immediate effect,  Disk Imaging is not immediate, the
once the cloning process has been image created by the imaging software
completed, the new drive (or destination needs to be restored (opened and
drive) can be used to boot to the system installed) to a drive before it can be used
straight away. to boot to the system.

 Disk Cloning requires a good, working  Disk Imaging allows users to restore an
system – not much point in cloning a image created previously, when the
broken system, and not much use with a system was in a known good state,
failed hard drive. either to the same (original) hard drive
or to a different hard drive.

Example of Disk Cloning using AOMEI Software

AOMEI Backupper is a third party software which aims at system, partition/volume, disk as well as
file/folder backup and restore. Also, it supports clone function, such as disk clone, system clone and
partition/volume clone. Finally, there is a free version of AOMEI Backupper - AOMEI Backupper
Standard Edition which contain most features and can do disk clone.

37
Here, it will take AOMEI Backupper Std as example to show you how to use AOMEI Backupper clone.
Before start, download, install and open AOMEI Backupper.

To Clone Disk:

1. In the left tab page, select the Clone option and then select Disk Clone.

2. Select the source disk that you want to clone. Click Next.

38
3. Select the destination disk to which the source disk will be cloned, and click Next.

Warning! The destination disk and all existing data will be overwritten.

4. Preview the information of your source and destination disk. In the wizard page, set desired
advanced settings as follows:

a) If you want to adjust the partitions size or location on the destination disk, click the "Edit
partitions on the destination disk" button. Options available are:

 Copy without resizing partitions: Do not do any changes.


 Fit partition to entire disk: The destination disk partitions will be automatically resized to
the entire disk, appropriate for the disk size.
 Edit partitions on this disk: Manually adjust the partition size and location by dragging a
slider bar.

39
b) Sector by sector clone: Copies all sectors of the disk to the destination disk whether in use or not.
The destination disk size must be equal to or larger than the source disk. Usually, to save destination
disk space, there is no need to tick this option. Unchecking AOMEI Sector by sector clone will only
clone the used part of source disk, thus the room required on destination is smaller. Yet, the data or
system is also intact unchecking Sector by sector clone and system is bootable after cloning.

c) Align partition to optimize for SSD: If your destination disk is SSD (Solid-State Drive), we highly
recommend you to tick this option for optimizing the performance of the SSD.

5. Finally, click Start Clone. Wait for the process to complete and then click Finish.

Disk to Disk Cloning

Step 1: Select "Disk Clone" option under the "Clone" tab.


40
Step 2: Select the source disk (Disk0) which you want to clone, and then click "Next".

Step 3: Select the destination disk (Disk1) where you want to clone source disk to, and then click
"Next".

Step 4: Confirm the settings of the source and destination disk, and then click "Start Clone".

41
Tips:

 The "Sector by sector clone" option is also available. It allows you to clone all sectors on
source disk no matter they are used or not to destination disk. If you clone a large hard drive
to a smaller SSD, do not choose this option.
 If the destination disk is an SSD, it is recommended to check the box before "Aglin partition
to optimize for SSD".
 If you clone a small disk to a larger disk, you can click the button of "Edit partitions on the
destination disk". There are three options for you to resize partitions so that you can use full
capacity.

Step 5: Click "Finish" when all the operations have been done:

In fact, disk clone is also called "Disk to Disk (Disk2Disk)". There is a similar function-Disk Imaging,
which can store all the data on a disk to a image file. That is the so-called "Disk to Image
(Disk2Image)". AOMEI Backupper can realize Disk2Disk as well as Disk2Image. It can backup your
disk to an image file so that you can restore your disk in the future.

42
Some other Cloning Software available in the market:

Hardware Cloning Devices:

Hard drive docking stations can be incredibly valuable to small businesses, the tech service industry,
and even enthusiasts who require access to a large amount of data. Combining the functionality of
an external hard drive enclosure with the features of partition imaging software, there are three
primary reasons you might need to pick one up for yourself.

If you need on-hand access to large amounts of data, the docking feature of a hard drive duplicator
is incredibly handy. Currently, standard spinning-disk hard drives offer storage space at a very
economical cost per gigabyte. Multi-terabytes of storage space can be picked up for a couple
hundred dollars. Keeping those drives on hand, you’ll be able to pop them in or out of the docking
station as needed. If transfer rates are important to you, there are very few portable storage
solutions that can beat the throughput of a 2.5” SSD connected to an eSATA or USB 3.1 interface.

If you regularly need to provide upgrade or maintenance service for computer systems, these hard
drive duplicators can make your life much easier. Internal storage can be upgraded with as little as
five to ten minutes of actual work. Just remove the old drive, plug it into the duplicator next to the
new one, then press the button. After about 10 seconds, the light will begin flashing indicating that
the cloning process has begun.

These drive cloners are also very popular in enterprise environments. Critical drives can be backed
up daily with just the push of a button. Restoration images of the company’s computers can be kept
on hand, meaning setting up any new piece of equipment is as simple as copying your installation
source onto a new drive.

When it comes to hard drive duplicators, there are many options available on the market today.
Thanks to high bandwidth communication ports such as USB 3.1 and eSATA, these devices have
never been easier or more affordable. Let’s take a look at some of the best options.

StarTech SATA Hard Drive Duplicator Dock

The StarTech SATA Hard Drive Duplicator Dock comes in a fairly plain case. The black plastic frame
uses a top-loading method to connect to your hard drives. It’s sized for 3.5” drives, but a fold down
plastic insert has a small cutout allowing you to insert a 2.5” drive while keeping it secure.

43
Connectivity

To communicate with your computer, this drive duplicator gives you three options. The USB 2.0 /
SATA combination is likely the most popular. eSATA is fast enough that you can access most drives
just as if they were internal. For all other purposes, USB 2.0 is fast enough to browse the drive
structure or perform simple file transfers. Those of you that do not have access to an eSATA port can
opt for the USB 3.0 model.

This will facilitate file transfers at up to a peak rate of 480 megabits per second. Finally, if you are on
the cutting edge of technology you can take advantage of the USB 3.1 model which has a peak
throughput of 10Gbps. Currently, there is no drive that can even come close to reaching that kind of
speed, but if you intend to keep the duplicator around for a long time you may wish to take
advantage of that feature.

For connecting to your hard drives, most models use SATA connections. Virtually every drive on the
market is SATA, so this is necessary for the majority of consumers. If you are performing data
recovery / duplication on very old drives there is an IDE model available, but it is triple the price of
the SATA ones.

44
Chapter 3: Networking Concepts

Network

A computer network or data network is a digital telecommunications network which allows nodes
to share resources. In computer networks, networked computing devices exchange data with each
other using a data link. The connections between nodes are established using either cable media or
wireless media.

Few Important Terms

Connection: In networking, connection is built before the data transfer (by following the procedures
laid out in a protocol) and then is deconstructed at the at the end of the data transfer.

Packet: A packet is, generally speaking, the most basic unit that is transferred over a network. When
communicating over a network, packets are the envelopes that carry your data (in pieces) from one
end point to the other.

Packets have a header portion that contains information about the packet including the source and
destination, timestamps, network hops, etc. The main portion of a packet contains the actual data
being transferred. It is sometimes called the body or the payload.

Network Interface: A network interface can refer to any kind of software interface to networking
hardware. For instance, if you have two network cards in your computer, you can control and
configure each network interface associated with them individually.

Protocol: A protocol is a set of rules and standards that basically define a language that devices can
use to communicate. There are a great number of protocols in use extensively in networking, and
they are often implemented in different layers.

Some low level protocols are TCP, UDP, IP, and ICMP. Some familiar examples of application layer
protocols, built on these lower protocols, are HTTP (for accessing web content), SSH, TLS/SSL, and
FTP.

Port: A port is an address on a single machine that can be tied to a specific piece of
software(Service). It is not a physical interface or location, but it allows your server to be able to
communicate using more than one application.

SMTP Port: 25 HTTP Port: 80

Network Types:

1. Peer to Peer Network

2. Client Server Model

Peer to Peer Network:

45
In a peer-to-peer network, a group of computers is connected together so that users can share
resources and information. There is no central location for authenticating users, storing files, or
accessing resources. It also means that users must log on to each computer to access the shared
resources on that computer.

In most peer-to-peer networks, it is difficult for users to track where information is located because
data is generally stored on multiple computers.

Client-Server network

In a server-based network, the server is the central location where users share and access network
resources. This dedicated computer controls the level of access that users have to shared resources.
Shared data is in one location, making it easy to back up critical business information. Each
computer that connects to the network is called a client computer. In a server-based network, users
have one user account and password to log on to the server and to access shared resources.

Network Types (Geographical)

We can distinguish Network types on basis of their geographical structure as followings:

LAN (Local Area Network)

LAN is a form of a computer network most known to the general public. It has a limited reach,
roughly a bunch of closely situated houses or building. And that’s because we typically use the
Ethernet technology (IEEE 802.3) to power our local area networks. The ethernet cables we lay
across our houses and offices have their practical limitations. Beyond a certain length, the speed
gets degraded. The reach of a LAN can be enhanced using repeaters, bridges, etc.

Home Area Network (HAN)

A kind of local area network is the HAN (Home Area Network). All the devices like smartphones,
computers, IoT devices, televisions, gaming consoles, etc. connect to a central router (wired or
wireless) placed in a home constitute a home area network.

What is a Wireless LAN (WLAN)?

This type of computer network is a wireless counterpart of the local area network. It uses the WiFi
technology defined as per the IEEE 802.11 standards. If you’re the one who thinks WiFi and WLAN
are the same things, then you need to rectify your confusion. They are completely different. WiFi is
used to create a wireless local area network.

What is a Metropolitan Area Network (MAN)?

The area covered by a MAN is quite larger in comparison to a LAN. In fact, a MAN can be used to link
several LANs spread across a city or a metro area. A wired backhaul spread across a city is used to
powers a metropolitan area network in that place. You might have known about various city-wide
WiFi networks in different parts of the world.

46
What is a Wide Area Network (WAN)?

We can think of a WAN as the superset of all the small networks we find in our homes, offices, cities,
states, and countries. The router or modem placed at your home is a device used to connect to the
WAN. The internet is also a type WAN that spans across the entire earth. Various technologies like
ADSL, 4G LTE, Fiber optic, cable, etc. are used to connect to the internet. However, these
technologies mostly confine to a country at max.

What is a Storage Area Network (SAN)?

Generally, it is used to connect external storage devices to servers but make them believe that the
storage is attached directly. The technology used to accomplish this is known as Fiber Channel.

What is a Near-me Network (NAN)?

Although it sounds totally unfamiliar, you use Near-me network almost every day. Remember
chatting with your friends on Facebook while all of you were sitting in the same room. You were part
of a NAN, even though you might be on the networks of different carriers.

A message from your device would traverse all the way to Facebook servers over the internet come
to your friend’s device sitting right next to you. In a logical way, both the devices are on some sort of
network. They don’t need to be connected to the same network. For instance, they can be
connected via WiFi, cellular, etc.

What is a Virtual Private Network (VPN)?

VPN is a type of computer network which doesn’t have physical existence. The devices that are part
of a VPN could be present anywhere on the earth, connected to each other over the internet.

VPNs are used by corporates to interconnect their offices located in different places and give their
remote employees access to company’s resources. It has phased out another type of network
known as Enterprise Private Network, a physical network created by organizations to link their office
locations.

Types of Network Topology

Network Topology is the schematic description of a network arrangement, connecting various nodes
(sender and receiver) through lines of connection.

BUS Topology

Bus topology is a network type in which every computer and network device is connected to single
cable. When it has exactly two endpoints, then it is called Linear Bus topology.

47
Features of Bus Topology

1. It transmits data only in one direction.


2. Every device is connected to a single cable

Advantages of Bus Topology

1. It is cost effective.
2. Cable required is least compared to other network topology.
3. Used in small networks.
4. It is easy to understand.
5. Easy to expand joining two cables together.

Disadvantages of Bus Topology

1. Cables fails then whole network fails.


2. If network traffic is heavy or nodes are more the performance of the network decreases.
3. Cable has a limited length.
4. It is slower than the ring topology.

RING Topology

It is called ring topology because it forms a ring as each computer is connected to another computer,
with the last one connected to the first. Exactly two neighbors for each device.

48
Features of Ring Topology

1. A number of repeaters are used for Ring topology with large number of nodes, because if
someone wants to send some data to the last node in the ring topology with 100 nodes,
then the data will have to pass through 99 nodes to reach the 100th node. Hence to prevent
data loss repeaters are used in the network.
2. The transmission is unidirectional, but it can be made bidirectional by having 2 connections
between each Network Node, it is called Dual Ring Topology.
3. In Dual Ring Topology, two ring networks are formed, and data flow is in opposite direction
in them. Also, if one ring fails, the second ring can act as a backup, to keep the network up.
4. Data is transferred in a sequential manner that is bit by bit. Data transmitted, has to pass
through each node of the network, till the destination node.

Advantages of Ring Topology

1. Transmitting network is not affected by high traffic or by adding more nodes, as only the
nodes having tokens can transmit data.
2. Cheap to install and expand

Disadvantages of Ring Topology

1. Troubleshooting is difficult in ring topology.


2. Adding or deleting the computers disturbs the network activity.
3. Failure of one computer disturbs the whole network.

STAR Topology

In this type of topology all the computers are connected to a single hub through a cable. This hub is
the central node and all others nodes are connected to the central node.

Features of Star Topology

1. Every node has its own dedicated connection to the hub.


2. Hub acts as a repeater for data flow.
3. Can be used with twisted pair, Optical Fibre or coaxial cable.

Advantages of Star Topology

1. Fast performance with few nodes and low network traffic.


49
2. Hub can be upgraded easily.
3. Easy to troubleshoot.
4. Easy to setup and modify.
5. Only that node is affected which has failed, rest of the nodes can work smoothly.

Disadvantages of Star Topology

1. Cost of installation is high.


2. Expensive to use.
3. If the hub fails then the whole network is stopped because all the nodes depend on the hub.
4. Performance is based on the hub that is it depends on its capacity

MESH Topology

It is a point-to-point connection to other nodes or devices. All the network nodes are connected to
each other. Mesh has n(n-1)/2 physical channels to link n devices.

There are two techniques to transmit data over the Mesh topology, they are :

1. Routing
2. Flooding

Routing

In routing, the nodes have a routing logic, as per the network requirements. Like routing logic to
direct the data to reach the destination using the shortest distance. Or, routing logic which has
information about the broken links, and it avoids those node etc. We can even have routing logic, to
re-configure the failed nodes.

Flooding

In flooding, the same data is transmitted to all the network nodes, hence no routing logic is required.
The network is robust, and the its very unlikely to lose the data. But it leads to unwanted load over
the network.

Types of Mesh Topology

1. Partial Mesh Topology: In this topology some of the systems are connected in the same
fashion as mesh topology but some devices are only connected to two or three devices.
50
2. Full Mesh Topology: Each and every nodes or devices are connected to each other.

Features of Mesh Topology

1. Fully connected.
2. Robust.
3. Not flexible.

Advantages of Mesh Topology

1. Each connection can carry its own data load.


2. It is robust.
3. Fault is diagnosed easily.
4. Provides security and privacy.

Disadvantages of Mesh Topology

1. Installation and configuration is difficult.


2. Cabling cost is more.
3. Bulk wiring is required.

TREE Topology

It has a root node and all other nodes are connected to it forming a hierarchy. It is also called
hierarchical topology. It should at least have three levels to the hierarchy.

Features of Tree Topology

1. Ideal if workstations are located in groups.


2. Used in Wide Area Network.

Advantages of Tree Topology

1. Extension of bus and star topologies.


2. Expansion of nodes is possible and easy.
3. Easily managed and maintained.
4. Error detection is easily done.
51
Disadvantages of Tree Topology

1. Heavily cabled.
2. Costly.
3. If more nodes are added maintenance is difficult.
4. Central hub fails, network fails.

HYBRID Topology

It is two different types of topologies which is a mixture of two or more topologies. For example if in
an office in one department ring topology is used and in another star topology is used, connecting
these topologies will result in Hybrid Topology (ring topology and star topology).

Features of Hybrid Topology

1. It is a combination of two or topologies


2. Inherits the advantages and disadvantages of the topologies included

Advantages of Hybrid Topology

1. Reliable as Error detecting and troubleshooting is easy.


2. Effective.
3. Scalable as size can be increased easily.
4. Flexible.

Disadvantages of Hybrid Topology

1. Complex in design.
2. Costly.

Network Devices

Hubs
Logically, there is a single bus and computers are connected to it. A HUB can have any numbers of
ports according to its size. Information from one computer say ‘A’ goes to the other computer, say

52
‘B’, through the dotted lines as shown. HUB gave birth to Star Topology but its internal structure is
like a bus. 10Base-T line is used where ‘T’ denotes the T-type structure. We can also connect
another hub through the port of one hub!!! Suppose we have 15 users and one hub has only 10
ports. Then we connect two hubs like this:

Each computer in a 10-port Hub gets 1 Mbps speed each. Latency is involved in Hub as packets
collide with each other. Thus, data rate gets reduced!!

Bridges

Bridge is a device to connect two networks. It has two stages

i) Bridge is learning

ii) Bridge is forwarding

Bridge learns the address of devices and keep a table to communicate between the networks.

53
Switches
Hubs are now replaced by switches. A switch is a multiple bus device i.e. there are multiple
transceiver chips for every port. Here, every computer has 10 Mbps speed and no latency is
involved. I/o chips can interact with each other which are within the buses. Since, there are several
buses, so no information traffic occurs. Simply, a computer A can send packets to computer C
without interruption. Pure isolation is provided. Switch is a very sophisticated device. We have 2
switching techniques.

SWITCHING

– CUT-THROUGH

– STORE&FORWARD

Switch has a CAM i.e. Content Address Memory Table. It is a feature taken from the ‘BRIDGE’

CAM Table:-

Port No. Mac Address

1 _______

2 _______

Whenever a PC gets booted, light blinks in the switch for that PC and Mac address of that PC gets
updated automatically in the CAM table opposite its port number. Switch refreshes its table after
every 300 millisecond

1. Cut Through Switching: Here, there is no need for switch to see the CRC. It only reads source
and destination address. It has a latency (delay) of 35 ms. It is just a chance technology. Thus, it is
not so reliable since different office are located far away.

2. Store and Forward Switching: Here, the frame goes into the memory of the switch; refers the
CAM table; reads the whole packet; checks for any error; and then finally sends the packet. Earlier,
it had a latency of 51 ms but now it has been enhanced to only 40 ms. Thus, technology is more
reliable in comparison to cut through technology.

Routers
In technical terms, a Router is a Layer 3 network gateway device, meaning that it connects two or
more networks and that the router operates at the network layer of the OSI model. Routers contain
a processor (CPU), several kinds of digital memory, and input-output (I/O) interfaces. They function
as special-purpose computers, one that does not require a keyboard or display.

The router's memory stores an embedded operating system (O/S). Router operating systems limit
what kind of applications can be run on them and also need much smaller amounts of storage space.
Examples of popular router operating systems include Cisco Internetwork Operating System (IOS)
and DD-WRT. These operating systems are manufactured into a binary firmware image and are
commonly called router firmware.

54
By maintaining configuration information in a part of memory called the routing table, routers also
can filter both incoming or outgoing traffic based on the addresses of senders and receivers.

OSI Layers
Two objects, if wanting to communicate, should satisfy some specifications. Thus, ISO (Organization
for International Standards) was asked to make certain specifications (specially for Ethernet at the
time of Bob Metceff). Thus, OSI layers were introduced. OSI stands for Open System for
Interconnection. OSI said that last step of interface is SAP (Service Access Point).

These 7 layers are seven SAPs and after qualifying these specifications only will the networks get
qualified for interacting.

Application Layer: The application layer is the layer that the users and user-applications most often
interact with. Network communication is discussed in terms of availability of resources, partners to
communicate with, and data synchronization.

Presentation Layer: It defines how the OS accepts the info given by the application software and
forwards it. Presentation may be different for different operation systems (OS).

Session Layer: OS dictates which priority works first or else it can get hanged or can be reinstalled.
OS also dictates how many windows can be opened up at a time. Eg –Microsoft holds maximum 12-
13 windows LINUX holds maximum 24-30 windows.

Transport Layer: It contains TCP (Transfer Control Protocol) and UDP (User Demand Protocol). TCP
is a carrier. TCP and UDP are the blind transporters which have no idea where to take the packets.
They just follow the sign boards. Packets are huge and in large amounts. They are given the name
‘Datagram’ (incremental packets layer by layer). Thus, slicing takes place. Packets are converted
into segments. Sequence numbers are put on them.

Network Layer: This layer tells TCP and UDP where to go, i.e. segments are given a direction.
Different protocols in network layer are IP, ARP, IGMT, RARP etc. Every segment is given a
destination IP address and is connected into packets. Thus, TCP without IP is not valid!!

Data Link Layer: Now, information is loaded in LAN card of PC. L-2 switches work here. Packets are
represented by bits here and frames are generated.

Physical: Now, data link layer does not know where the packets are going. It only knows that it has
to go to the switch. Thus, MAC address is defined of self and the destination PC in this layer.

You can remember all seven layers by using following phrases:

All People Seems To Need Data Presentation

Or

Apne Pyare Sachin Tendulkar Ne Doodh Piya

So, we can remember 7 layers of OSI:

Application Presentation Session Transport Network DataLink Physical


55
So, in overall we conclude that:

• Till the transportation layer, we have the work of operating system.

• At network layer we have Routers and some special types of switches called the L-3
switches.

• At Data Link Layer, we have the normal switches, LAN card etc.

• At Physical Layer we have cables, hubs etc.

TCP/IP Model

This is called DARPA model as well. The TCP/IP model, more commonly known as the Internet
protocol suite, is another layering model that is simpler and has been widely adopted. It defines the
four separate layers, some of which overlap with the OSI model:

• Application: In this model, the application layer is responsible for creating and transmitting
user data between applications. The applications can be on remote systems, and should
appear to operate as if locally to the end user. The communication is said to take place
between peers.

• Transport: The transport layer is responsible for communication between processes. This
level of networking utilizes ports to address different services. It can build up unreliable or
reliable connections depending on the type of protocol used.

• Internet: The internet layer is used to transport data from node to node in a network. This
layer is aware of the endpoints of the connections, but does not worry about the actual
connection needed to get from one place to another. IP addresses are defined in this layer
as a way of reaching remote systems in an addressable manner.

• Link: The link layer implements the actual topology of the local network that allows the
internet layer to present an addressable interface. It establishes connections between
neighboring nodes to send data

What is Internet Protocol (IP)?

IP (short for Internet Protocol) specifies the technical format of packets and the addressing scheme
for computers to communicate over a network. Most networks combine IP with a higher-level
protocol called Transmission Control Protocol (TCP), which establishes a virtual connection
between a destination and a source.

IP by itself can be compared to something like the postal system. It allows you to address a package
and drop it in the system, but there's no direct link between you and the recipient. TCP/IP, on the
other hand, establishes a connection between two hosts so that they can send messages back and
forth for a period of time.

56
Internet Protocol Versions

There are currently two version of Internet Protocol (IP): IPv4 and a new version called IPv6. IPv6 is
an evolutionary upgrade to the Internet Protocol. IPv6 will coexist with the older IPv4 for some time.

What is IPv4 (Internet Protocol Version 4)?

IPv4 (Internet Protocol Version 4) is the fourth revision of the Internet Protocol (IP) used to to
identify devices on a network through an addressing system. The Internet Protocol is designed for
use in interconnected systems of packet-switched computer communication networks.

IPv4 is the most widely deployed Internet protocol used to connect devices to the Internet. IPv4 uses
a 32-bit address scheme allowing for a total of 2^32 addresses (just over 4 billion addresses). With
the growth of the Internet it is expected that the number of unused IPv4 addresses will eventually
run out because every device -- including computers, smartphones and game consoles -- that
connects to the Internet requires an address.

What is IPv6 (Internet Protocol Version 6)?

A new Internet addressing system Internet Protocol version 6 (IPv6) is being deployed to fulfill the
need for more Internet addresses. IPv6 (Internet Protocol Version 6) is also called IPng (Internet
Protocol next generation) and it is the newest version of the Internet Protocol (IP) reviewed in the
IETF standards committees to replace the current version of IPv4.

IPv6 is the successor to Internet Protocol Version 4 (IPv4). It was designed as an evolutionary
upgrade to the Internet Protocol and will, in fact, coexist with the older IPv4 for some time. IPv6 is
designed to allow the Internet to grow steadily, both in terms of the number of hosts connected and
the total amount of data traffic transmitted. IPv6 is often referred to as the "next generation"
Internet standard and has been under development now since the mid-1990s. IPv6 was born out of
concern that the demand for IP addresses would exceed the available supply.

The Benefits of IPv6

While increasing the pool of addresses is one of the most often-talked about benefit of IPv6, there
are other important technological changes in IPv6 that will improve the IP protocol:

 No more NAT (Network Address Translation)


 Auto-configuration
 No more private address collisions
 Better multicast routing
 Simpler header format
 Simplified, more efficient routing
 True quality of service (QoS), also called "flow labelling"
 Built-in authentication and privacy support
 Flexible options and extensions
 Easier administration (say good-bye to DHCP)

The Difference Between IPv4 and IPv6 Addresses

An IP address is binary numbers but can be stored as text for human readers. For example, a 32-bit
numeric address (IPv4) is written in decimal as four numbers separated by periods. Each number can
be zero to 255. For example, 1.160.10.240 could be an IP address.

57
IPv6 addresses are 128-bit IP address written in hexadecimal and separated by colons. An example
IPv6 address could be written like this: 3ffe:1900:4545:3:200:f8ff:fe21:67cf (see "What does an IPv6
address look like?")

We have two questions:

1) How many networks are there in the world?

2) In every network how many hosts are there?

If both these questions are justified, then IP addresses are made. Concurrently, we have an IP
addressing scheme of 32 bits i.e. 4 outlets.

We have generally three types of IP addresses:

i) Class A :

0 7 15 23 31

N/W ID Host ID

Here, the first octet is called the Network ID and the other three octets are Host IDs. In the
N/W ID the first bit starts from ‘0’. Total number of networks available here are

2 (8-2) – 2 = 27 – 2 = 126 networks.

Thus, IP addresses of Class A are :\

1.__.__.___
2.__.__.___

3.__.__.___

126.__.__.__

Total number of computers in one network are : 2 24 – 2 = 16,777,214 = 2,113,928,964

Incredible !!

58
ii) CLASS B : Here, we have 2 octets as Network IDs and 2 octets as Host IDs.

0 7 15 23 31

N/W ID N/W ID Host ID Host ID

Here, the first bit is ‘1’ and the second is ‘0’


Total Networks = 214 – 2
No. of PCs in 1 N/W = 216 - 2

Any IP address starting from ‘128’ till ‘151’ comes under Class B scheme.

iii) CLASS C : Here, we have three outlets as N/W IDs and one outlet as Host ID. It is
the most commonly used IP addressing scheme. Also, it is more desirable to use
this class as the host ID is least here.

0 1 15 23 31

N/W ID Host ID

Here, 1st bit is 1, 2nd bit is 1 and the 3rd bit is 0.

Total N/W = 221 – 2


Total no. of PCs in 1 N/W = 28 – 2

Thus, Class A is represented as : N.H.H.H


Class B : N.N.H.H.
Class C: N.N.N.H

Now, we need to do masking i.e. filtering of IP addresses so that only the relevant
information is kept. Network IDs are always checked first and not the Host IDs. Thus, Host
IDs need to be masked. All bits of Host IDs are made ‘0’ during masking and N/W IDs are
denoted by all ‘1s’.

Thus, Class A IP address are masked as : 255.0.0.0 where ‘255’ is decimal of binary number
containing all ‘1s’

Class B : 255.255.0.0

Class C : 255.255.255.0

Example – Suppose we have a Class C IP address 192.168.1.15. After masking it will be


seen as 255.255.255.0

We now verify the above masking:

59
i) We convert the given IP address into its binary form
11000000.10101000.00000001.00001111

ii) We now perform a ‘Logic AND’ operation of above IP with binary of 255.255.255.255

Thus, we get :

11000000.10101000.00000001.00001111

+ 11111111.11111111.11111111.11111111

11000000.10101000.00000001.00000000

Host ID

Clearly, the Host ID becomes all zero. Hence, proved!

The Host PC converts the IP address of destination PC as : 192.168.1.0

Now, every router in the way checks these networks and then finally when the last network
is reached, the host ID is demanded through the Broadcasting technique. In this example,
the Host ID is ‘15’.

Subnet Mask

Representation of Class Addressing:

Thus, Class A : N.H.H.H

Class B : N.N.H.H

Class C: N.N.N.H

Now, we need to do masking i.e. filtering of IP addresses so that only the relevant
information is kept. Network IDs are always checked first and not the Host IDs. Thus, Host
IDs need to be masked. All bits of Host IDs are made ‘0’ during masking and N/W IDs are
denoted by all ‘1s’.

Thus, Class A IP address are masked as : 255.0.0.0 where ‘255’ is decimal of binary number
containing all ‘1s’

Class B : 255.255.0.0

Class C : 255.255.255.0

Example – Suppose we have a Class C IP address 192.168.1.15. After masking it will be


seen as 255.255.255.0

SUB NETTING: IPs can be classified into

(i) Public IPs

(ii) Private IPs

60
Public IPs are generally provided to user application companies like the Internet Service
Providers like Reliance etc.; Network Service Providers provide telephone services, internet
services etc. eg – MTNL; Telecom Service Providers (authorized to send our info to outside
world via cables and lines) Eg- Airtel, Tata, Reliance etc. provide cable landing services.

Public IPs have a valid passport to travel across the globe. Public IPs are given to users
only when they want Internet or other useful services.

Generally, all websites are made on public IPs so that everyone can view these websites.
Private IPs are given to those companies which do not provide any user services and
perform their own work. These IPs are given to companies like NTPC, Oil companies etc.
which are for high level workings. These are reserved IPs and can also be needed when we
want to make our own networks.

Following is the range of private IPs:

1. In Class A : 0.0.0.0 - 10.255.255.255

2. In Class B : 172.16.0.1 - 172.31.255.255

3. In Class C : 192.168.0.1 - 192.168.255.255

Note : Cable Landing Services may be used as it is or we can also use MTNL and then
further these services provide help in linking to the outer world.

Do you know that if Tata company gets closed then all ISD calls will never occur!

Also, if the international gateways like Airtel, Reliance etc. go down or strike then no net
connection would be possible – Amazing!!

Now the question arises that if we have private IPs, then how can we access the
internet?

There is a device used for this called NAT. NAT stands for Network Access Translation.

NAT contains two LAN cards. One for the Private IP and other for the Public IP. For one
whole network, we require one NAT.

* Effective use of IP addresses: If we have multiple offices then we can use Class A
scheme else Class B or Class C is used. Class C is generally used for fewer offices. Thus,
our aim is to have Class C scheme always. Thus, we convert the Host IDs of Class A or
Class B into Network IDs.

There is a decomposition formula for this. Suppose we have an IP address of Class A:


10.0.0.0 We need to convert it to Class C scheme. We cannot touch the Network ID. We
can only manipulate the Host IDs to get it converted to Network IDs.

10.0.0.0

↓ ↓ ↓↓
61
N HHH

Suppose this IP range is given to a certain Head Office and we have to assign different IPs
to its computers.

So, we play with the Host ID of second outlet.

N. N.H.H

10.0.0.0

254

Thus, the host ID changes to Network ID. Subnet Mask of above IP address becomes
255.255.0.0

This is the subnet mask of Class B. But we need to obtain Class C. thus, we play with the
next Host ID and the previous Network ID becomes fixed.

10. 0. 0. 0

Becomes fixed 2

Thus, the subnet mask becomes 255.255.255.0 and we get a Class C IP scheme.

Note :

1. If we have 1000 PCs then we do not do subnetting. We then provide DHCP Server
and not IP.

2. When we want to keep track of websites accessed, we then need IP addresses


instead of DHCP Server.

Suppose that we want to differentiate between two classes, say BSc Maths and BSc
Chemistry i.e. traffic is to be dived i.e. a BSc Chemistry student cannot access the data
of BSc Maths student and vice versa. For this to happen, we divide Class C IP address
further.

For example, suppose we have 192.168.1.0. We zoom the last outlet we get

192. 168. 1. 0

(Zoomed)

192.168.1.00000000

62
Since in this example we want to have two sub networks from a given network of Class C,
we need to play with 1 bit to have two logic conditions. Thus, two sub networks made are :

192.168.1.00000000 → N#1

192.168.1.10000000 → N#2

Note : We can only alter the Host ID and never the Network ID

127th and 128th networks are unnecessarily wasted here and are called as Broadcast
address.

Note: If we want to do intentional broadcasting we make all host bits of the Host ID as ‘1’.

Now, subnet masks are :-

192.168.1.0 → 255.255.255.0

192.168.1.128 → 255.255.255.128

► For making 6 networks:

Suppose we are given 192.168.5.0. For making six networks out of this given IP, we need to
manipulate with 3 bits i.e. 3-bit logic is applied here. Thus, we get:-

N#1 : 00000000 → 192.168.5.0

N#2 : 00100000 → 192.168.5.32

N#3 : 01000000 → 192.168.5.64

N#4 : 01100000 → 192.168.5.96

N#5 : 10000000 → 192.168.5.128

N#6 : 10100000 → 192.168.5.160

Subnet masks of the respective given networks will be :-

N#1 : 255.255.255.0

N#2 : 255.255.255.32

N#3 : 255.255.255.64

N#4 : 255.255.255.96

N#5 : 255.255.255.128

63
N#6 : 255.255.255.160

► For 12 Networks:- Given IP 192.168.5.0. For 12 networks, we need four bit logic.

N#1 : 00000000 → 192.168.5.0

N#2 : 00100000 → 192.168.5.16

N#3 : 00100000 → 192.168.5.32

N#4 : 00110000 → 192.168.5.48

N#5 : 01000000 → 192.168.5.54

N#6 : 01010000 → 192.168.5.70

N#7 : 01100000 → 192.168.5.86

N#8 : 01110000 → 192.168.5.102

N#9 : 10000000 → 192.168.5.118

N#10 : 10010000 → 192.168.5.134

N#11 : 10100000 → 192.168.5.150

N#12 : 10110000 → 192.168.5.166

Wireless technologies defined

Wireless is a term used to describe telecommunications in which electromagnetic waves -- rather


than some form of wire -- carry the signal over part or all of the communication path. The first
wireless transmitters went on the air in the early 20th century using radiotelegraphy (Morse code).

Later, as modulation made it possible to transmit voices and music via wireless, the medium came to
be called radio. With the advent of television, fax, data communication and the effective use of a
larger portion of the spectrum, the term "wireless" has been resurrected.

More information on wireless networking

Wireless technology is rapidly evolving and playing an increasing role in the lives of people
throughout the world. Various technologies and devices are being developed in response to the

64
growing use of wireless. In addition, larger numbers of people are relying on the technology directly
or indirectly.

Wireless access technologies are commonly divided into categories, based on speed and distance.

 Wireless Personal Area Network (WPAN) technologies are designed to reach only about 10
meters. IrDA and Bluetooth are two common WPAN examples. Emerging technologies in this
space include 802.15.4a (Zigbee) and 802.15.3c (UWB).
 Wireless Local Area Network (WLAN) technologies can deliver up to 200 Mbps at distances
up to 100 meters. 802.11a/b/g (Wi-Fi) are widely deployed WLAN examples. Proprietary
MIMO products and the new 802.11n high-speed WLAN standard are emerging technologies
in this category.
 Wireless Metropolitan Area Network (WMAN) technologies deliver up to 75 Mbps over
wireless "first mile" links that span several kilometers. There have been several iterations of
the 802.16 Broadband Wireless Access WMAN standard, certified under the brand WiMAX.
Fixed WiMAX is now being complemented by the emerging 802.20 Mobile WiMAX standard.
 Wireless Wide Area Network (WWAN) technologies now deliver up to a few hundred Kbps
over large service areas such as cities, regions or even countries. Commonly deployed
WWAN technologies include GSM/GPRS/EDGE and CDMA2000 1xRTT. These services are
gradually being complemented by newer third-generation technologies like UMTS/HSDPA
and CDMA EV-DO Rev.0/A. Future technologies here include HSUPA.

A wireless LAN (WLAN) is one in which a mobile user can connect to a local area network (LAN)
through a wireless (radio) connection. A wireless personal area network (WPAN) is a personal area
network for interconnecting devices centered around a person's workspace in which the
connections are wireless. Though IrDA and Bluetooth are quite advanced, WPAN technology
continues to develop rapidly.

Protocols and specifications

Wi-Fi is a term for certain types of WLAN that use specifications in the 802.11 family. The term Wi-Fi
was created by an organization called the Wi-Fi Alliance, which oversees tests that certify product
interoperability. A wireless LAN node that provides a public Internet connection via Wi-Fi from a
given location is called a hot spot. Many airports, hotels, and fast-food facilities offer public access to
Wi-Fi networks.

A wireless industry coalition, WiMAX (Worldwide Interoperability for Microwave Access), organized
to advance IEEE 802.16 standards for wireless broadband access, sometimes referred to as BWA,
networks. WiMax has a range of up to 30 miles, presenting provider networks with a viable wireless
last-mile solution.

Bluetooth is a telecommunications industry specification that describes how mobile phones,


computers and personal digital assistants (PDAs) can be easily interconnected using a short-range
wireless connection.

Ultra wideband (also known as UWB or digital pulse wireless) is a wireless technology for
transmitting large amounts of digital data over a wide spectrum of frequency bands with very low
power for a short distance (up to 230 feet) and carrying signals through doors and other obstacles
that tend to reflect signals at more limited bandwidths and a higher power.

WAP (Wireless Application Protocol) is a specification for a set of communication protocols to


standardize the way that wireless devices can be used for Internet access. Designed to provide a
WLAN with a level of security and privacy comparable to what is usually expected of a wired LAN,
65
Wired Equivalent Privacy (WEP) is a security protocol, specified in the IEEE Wi-Fi standard, 802.11.
Another security standard for users of computers equipped with Wi-Fi wireless connection is Wi-Fi
Protected Access (WPA). It is an improvement on, and is expected to replace, the original Wi-Fi
security standard, WEP.

802.11 is an evolving family of specifications for WLANs developed by a working group of the
Institute of Electrical and Electronics Engineers (IEEE). There are several specifications in the family,
and new ones are occasionally added.

Specifications that have not yet been formally approved or deployed, 802.11x refers to a group of
evolving WLAN standards that are under development as elements of the IEEE 802.11 family of
specifications. The 802.11 specifications are summarized in our 802.11 Fast Reference, which
includes a link to our definition of each specification.

66
Chapter 4: Security Threats and vulnerabilities

Ethical Hacking & Concepts

Hacking is the act of finding the possible entry points that exist in a computer system or a computer
network and finally entering into them. Hacking is usually done to gain unauthorized access to a
computer system or a computer network, either to harm the systems or to steal sensitive
information available on the computer.

Hacking is usually legal as long as it is being done to find weaknesses in a computer or network
system for testing purpose. This sort of hacking is what we call Ethical Hacking.

A computer expert who does the act of hacking is called a "Hacker". Hackers are those who seek
knowledge, to understand how systems operate, how they are designed, and then attempt to play
with these systems.

Types of Hacking
We can segregate hacking into different categories −

 Website Hacking − Hacking a website means taking unauthorized control over a web server
and its associated software such as databases and other interfaces.

 Network Hacking − Hacking a network means gathering information about a network by


using tools like Telnet, NS lookup, Ping, Tracert, Netstat, etc. with the intent to harm the
network system and hamper its operation.

 Email Hacking − It includes getting unauthorized access on an Email account and using it
without taking the consent of its owner.

 Ethical Hacking − Ethical hacking involves finding weaknesses in a computer or network


system for testing purpose and finally getting them fixed.

 Password Hacking − This is the process of recovering secret passwords from data that has
been stored in or transmitted by a computer system.

 Computer Hacking − This is the process of stealing computer ID and password by applying
hacking methods and getting unauthorized access to a computer system.

Advantages of Hacking
Hacking is quite useful in the following scenarios −

 To recover lost information, especially in case you lost your password.

 To perform penetration testing to strengthen computer and network security.


67
 To put adequate preventative measures in place to prevent security breaches.

 To have a computer system that prevents malicious hackers from gaining access.

Disadvantages of Hacking
Hacking is quite dangerous if it is done with harmful intent. It can cause −

 Massive security breach.

 Unauthorized system access on private information.

 Privacy violation.

 Hampering system operation.

 Denial of service attacks.

 Malicious attack on the system.

Process of Ethical Hacking

Ethical hacking has a set of distinct phases. It helps hackers to make a structured ethical hacking
attack.

Different security training manuals explain the process of ethical hacking in different ways, but as a
standard the entire process can be categorized into the following six phases.

Reconnaissance
Reconnaissance is the phase where the attacker gathers information about a target using
active or passive means.

Scanning
In this process, the attacker begins to actively probe a target machine or network for
vulnerabilities that can be exploited.

Gaining Access
In this process, the vulnerability is located and you attempt to exploit it in order to enter
into the system.

Maintaining Access
It is the process where the hacker has already gained access into a system. After gaining
access, the hacker installs some backdoors in order to enter into the system when he needs
access in this owned system in future.
68
Clearing Tracks
This process is actually an unethical activity. It has to do with the deletion of logs of all the
activities that take place during the hacking process.

Reporting
Reporting is the last step of finishing the ethical hacking process. Here the Ethical Hacker
compiles a report with his findings and the job that was done such as the tools used, the
success rate, vulnerabilities found, and the exploit processes.

Foot Printing & Scanning, Enumeration

Foot printing is a part of reconnaissance process which is used for gathering possible information
about a target computer system or network. Foot printing could be both passive and active.
Reviewing a company’s website is an example of passive foot printing, whereas attempting to gain
access to sensitive information through social engineering is an example of active information
gathering.

Foot printing is basically the first step where hacker gathers as much information as possible to find
ways to intrude into a target system or at least decide what type of attacks will be more suitable for
the target.

During this phase, a hacker can collect the following information −

 Domain name
 IP Addresses
 Namespaces
 Employee information
 Phone numbers
 E-mails
 Job Information

Enumeration belongs to the first phase of Ethical Hacking, i.e., “Information Gathering”. This is a
process where the attacker establishes an active connection with the victim and try to discover as
much attack vectors as possible, which can be used to exploit the systems further.

Enumeration can be used to gain information on −

 Network shares
 SNMP data, if they are not secured properly
69
 IP tables
 Usernames of different systems
 Passwords policies lists

Trojan & Viruses

Trojans are non-replication programs; they don’t reproduce their own codes by attaching
themselves to other executable codes. They operate without the permissions or knowledge of the
computer users.

Trojans hide themselves in healthy processes. However, we should underline that Trojans infect
outside machines only with the assistance of a computer user, like clicking a file that comes
attached with email from an unknown person, plugging USB without scanning, opening unsafe
URLs.

Trojans have several malicious functions −

 They create backdoors to a system. Hackers can use these backdoors to access a victim
system and its files. A hacker can use Trojans to edit and delete the files present on a victim
system, or to observe the activities of the victim.

 Trojans can steal all your financial data like bank accounts, transaction details, PayPal
related information, etc. These are called Trojan-Banker.

 Trojans can use the victim computer to attack other systems using Denial of Services.

 Trojans can encrypt all your files and the hacker may thereafter demand money to decrypt
them. These are Ransomware Trojans.

Virus

A computer virus is a type of malicious software program ("malware") that, when


executed, replicates itself by modifying other computer programs and inserting its own
code. Infected computer programs can include, as well, data files, or the "boot" sector of the hard
drive. When this replication succeeds, the affected areas are then said to be "infected" with a
computer virus.

Virus writers use social engineering deceptions and exploit detailed knowledge of security
vulnerabilities to initially infect systems and to spread the virus. The vast majority of viruses target
systems running Microsoft Windows, employing a variety of mechanisms to infect new hosts, and
often using complex anti-detection/stealth strategies to evade antivirus software.
The term "virus" is also commonly, but erroneously, used to refer to other types of malware.
Malware" encompasses computer viruses along with many other forms of malicious software, such
as computer "worms", ransomware, trojan horses, key loggers, rootkits, spyware, adware,

70
malicious Browser Helper Object (BHOs) and other malicious software. The majority of active
malware threats are actually trojan horse programs or computer worms rather than computer
viruses.

The Creeper virus was first detected on ARPANET

Sniffing

Sniffing is the process of monitoring and capturing all the packets passing through a given network
using sniffing tools. It is a form of “tapping phone wires” and get to know about the conversation. It
is also called wiretapping applied to the computer networks.

There is so much possibility that if a set of enterprise switch ports is open, then one of their
employees can sniff the whole traffic of the network. Anyone in the same physical location can plug
into the network using Ethernet cable or connect wirelessly to that network and sniff the total
traffic.

In other words, Sniffing allows you to see all sorts of traffic, both protected and unprotected. In the
right conditions and with the right protocols in place, an attacking party may be able to gather
information that can be used for further attacks or to cause other issues for the network or system
owner.

What can be sniffed?


One can sniff the following sensitive information from a network −

 Email traffic
 FTP passwords
 Web traffics
 Telnet passwords
 Router configuration
 Chat sessions
 DNS traffic

A sniffer normally turns the NIC of the system to the promiscuous mode so that it listens to all the
data transmitted on its segment.

Promiscuous mode refers to the unique way of Ethernet hardware, in particular, network interface
cards (NICs), that allows an NIC to receive all traffic on the network, even if it is not addressed to
this NIC. By default, a NIC ignores all traffic that is not addressed to it, which is done by comparing
the destination address of the Ethernet packet with the hardware address (a.k.a. MAC) of the
71
device. While this makes perfect sense for networking, non-promiscuous mode makes it difficult to
use network monitoring and analysis software for diagnosing connectivity issues or traffic
accounting.

Types of Sniffing
Sniffing can be either Active or Passive in nature.

Passive Sniffing
In passive sniffing, the traffic is locked but it is not altered in any way. Passive sniffing allows
listening only. It works with Hub devices.
Active Sniffing
In active sniffing, the traffic is not only locked and monitored, but it may also be altered in
some way as determined by the attack. Active sniffing is used to sniff a switch-based
network.

Web Server, Application, SQL Injection

A server by definition is a dedicated computing system running services to users and other
computers on a network. Examples of service range from public services such as online gaming to
sharing sensitive files inside a large organization. In the context of client-server architecture, a
servers is a computer program running to serve the requests of other programs, known as the
"clients". Thus, the server performs some computational task on behalf of "clients". The clients
either run on the same computer, or connect through the network. For example, a server would host
a game to the world while clients would access the game remotely. There are various forms of
providing services to clients such as an Apache Web Server limited to HTTP or a BEA WebLogic
Application Server that does HTTP plus more.

Web applications have been created to perform practically every useful function you could possibly
implement online. Here are some web application functions that have risen to prominence in recent
years:

 Shopping (Amazon)
 Social networking (Facebook)
 Banking (Citibank)
 Web search (Google)
 Auctions (eBay)
 Gambling (Betfair)
 Web logs (Blogger)
 Web mail (Gmail)
 Interactive information (Wikipedia)

In addition to the public Internet, web applications have been widely adopted inside organizations to
support key business functions. Many of these provide access to highly sensitive data and
functionality like HR functions in an organization.

72
SQL injection is a set of SQL commands that are placed in a URL string or in data structures in order
to retrieve a response that we want from the databases that are connected with the web
applications. This type of attacks generally takes place on webpages developed using PHP or
ASP.NET.

An SQL injection attack can be done with the following intentions −

 To dump the whole database of a system,

 To modify the content of the databases, or

 To perform different queries that are not allowed by the application.

This type of attack works when the applications don’t validate the inputs properly, before passing
them to an SQL statement. Injections are normally placed put in address bars, search fields, or data
fields.

The easiest way to detect if a web application is vulnerable to an SQL injection attack is to use the "
‘ " character in a string and see if you get any error.

SQLMAP is one of the best tools available to detect SQL injections. It can be downloaded
from http://sqlmap.org/

IDS, Fire Walls & Honey Pots

IDS (Intrusion Detection System) systems only detect an intrusion, log the attack and send an alert to
the administrator. IDS systems do not slow networks down like IPS as they are not inline. IDS
systems if not fine-tuned, just like IPS will also produce false positives. IDS can be used initially to see
how the system behaves without actually blocking anything.

IPS (Intrusion Prevention System) systems are deployed inline and actually take action by blocking
the attack, as well as logging the attack and adding the source IP address to the block list for a
limited amount of time; or even permanently blocking the address depending on the defined
settings. Hackers take part in lots of port scans and address scans, intending to find loop holes within
organizations. IPS systems would recognize these types of scans and take actions such as block,
drop, quarantine and log traffic. However, this is the basic functionality of IPS. IPS systems have
many advanced capabilities in sensing and stopping such attacks.

Firewall

A firewall is a computer program that monitors the system and blocks the entry of viruses and other
unwanted programs. Put simply, it regulates the connection between your system and the Internet.
Firewalls are of two types: hardware and software. A hardware firewall is a piece of hardware that
sits between your modem and the system. Often these are wired or wireless routers or broadband

73
gateways. A software firewall is a piece of software installed in the system to protect your computer
from unauthorized access or entry.

What does a firewall do?

 A firewall blocks open ports through which an intruder can gain access to your system
and the valuable data you have stored in it.
 As all information passes through firewall, you can know what is happening in the
network.
 It allows you to create rules or set privileges for the type of traffic that can pass through
the firewall in both directions.
 It blocks malicious viruses from entering your system.

Honey Pots

A honeypot is a system that's put on a network so it can be probed and attacked. Because the
honeypot has no production value, there is no "legitimate" use for it. This means that any interaction
with the honeypot, such as a probe or a scan, is by definition suspicious.

There are two types of honeypots:

 Research: Most attention to date has focused on research honeypots, which are used to
gather information about the actions of intruders. Production: Less attention has been paid
to production honeypots, which are actually used to protect organizations. Production
honeypots are being recognized for the detection capabilities they can provide and for the
ways they can supplement both network- and host-based intrusion protection.

How honeypots work

Honeypots can also be described as being either low interaction or high interaction, a distinction
based on the level of activity that the honeypot allows an attacker. A low-interaction system offers
limited activity; in most cases, it works by emulating services and operating systems. The main
advantages of low-interaction honeypots are that they are relatively easy to deploy and maintain
and they involve minimal risk because an attacker never has access to a real operating system to
harm others.

In contrast, high-interaction honeypots involve real operating systems and applications, and nothing
is emulated. By giving attackers real systems to interact with, organizations can learn a great deal
about an attacker's behaviour. High-interaction honeypots make no assumptions about how an
attacker will behave, and they provide an environment that tracks all activity.

Penetration Testing goes beyond a normal testing by evaluating identified vulnerabilities to verify if
the vulnerability is real or a false positive. A Penetration Test would attempt to attack those
vulnerabilities in the same manner as a malicious hacker to verify which vulnerabilities are genuine
reducing the real list of system vulnerabilities to a handful of security weaknesses. The most
effective Penetration Tests are the ones that target a very specific system with a very specific goal.
Quality over quantity is the true test of a successful Penetration Test.

Some fundamentals for developing a scope of work for a Penetration Test are as follows:
 Definition of Target System(s): This specifies what systems should be tested. This includes
the location on the network, types of systems, and business use of those systems.

74
 Timeframe of Work Performed: When the testing should start and what is the timeframe
provided to meet specified goals. Best practice is NOT to limit the time scope to business
hours.

 How Targets Are Evaluated: What types of testing methods such as


scanning or exploitation are and not permitted? What is the risk associated with permitted
specific testing methods? What is the impact of targets that become inoperable due to
penetration attempts? Examples are; using social networking by pretending to be an
employee, denial of service attack on key systems, or executing scripts on vulnerable
servers.

 Tools and software: What tools and software are used during the Penetration Test? This is
important and a little controversial.

 Notified Parties: Who is aware of the Penetration Test? This is very important when looking
at web applications that may be hosted by another party such as a cloud service provider
that could be impacted from your services

 Definition of Target Space: This defines the specific business functions


included in the Penetration Test.

 Identification of Critical Operation Areas: Define systems that should not be touched to
avoid a negative impact from the Penetration Testing services.

Vulnerability Assessment: This is the process in which network devices, operating systems and
application software are scanned in order to identify the presence of known and unknown
vulnerabilities. Vulnerability is a gap, error, or weakness in how a system is designed, used, and
protected. When a vulnerability is exploited, it can result in giving unauthorized access, escalation of
privileges, denial-of-service to the asset, or other outcomes. Vulnerability Assessments typically stop
once a vulnerability is found, meaning that the Penetration Tester doesn't execute an attack against
the vulnerability to verify if it's genuine. A Vulnerability Assessment deliverable provides potential
risk associated with all the vulnerabilities found with possible remediation steps.

Vulnerability scans are only useful if they calculate risk. The downside of many security audits is
vulnerability scan results that make security audits thicker without providing any real value. Many
vulnerability scanners have false positives or identify vulnerabilities that are not really there.
Assigning risk to vulnerabilities gives a true definition and sense of how vulnerable a system is.

75
Chapter 5: Cryptography

Introduction to Cryptography

The Basics of Cryptography

When Julius Caesar sent messages to his generals, he didn't trust his messengers. So he replaced
every A in his messages with a D, every B with an E, and so on through the alphabet. Only someone
who knew the "shift by 3" rule could decipher his messages.

Encryption and decryption

Data that can be read and understood without any special measures is called plaintext or cleartext.
The method of disguising plaintext in such a way as to hide its substance is called encryption.
Encrypting plaintext results in unreadable gibberish called ciphertext. You use encryption to ensure
that information is hidden from anyone for whom it is not intended, even those who can see the
encrypted data. The process of reverting ciphertext to its original plaintext is called decryption.
Figure 1-1 illustrates this process.

Figure 1-1. Encryption and decryption

What is cryptography?

Cryptography is the science of using mathematics to encrypt and decrypt data. Cryptography
enables you to store sensitive information or transmit it across insecure networks (like the Internet)
so that it cannot be read by anyone except the intended recipient.

While cryptography is the science of securing data, cryptanalysis is the science of analyzing and
breaking secure communication. Classical cryptanalysis involves an interesting combination of
analytical reasoning, application of mathematical tools, pattern finding, patience, determination,
and luck. Cryptanalysts are also called attackers.

Cryptology embraces both cryptography and cryptanalysis.

Strong cryptography

There are two kinds of cryptography in this world: cryptography that will stop your kid sister from
reading your files, and cryptography that will stop major governments from reading your files. This
book is about the latter.

--Bruce Schneier, Applied Cryptography: Protocols, Algorithms, and Source Code in C.

OpenPGP is also about the latter sort of cryptography. Cryptography can be strong or weak, as
explained above. Cryptographic strength is measured in the time and resources it would require to
recover the plaintext. The result of strong cryptography is ciphertext that is very difficult to decipher
76
without possession of the appropriate decoding tool. How difficult? Given all of today's computing
power and available time — even a billion computers doing a billion checks a second — it is not
possible to decipher the result of strong cryptography before the end of the universe.

One would think, then, that strong cryptography would hold up rather well against even an
extremely determined cryptanalyst. Who's really to say? No one has proven that the strongest
encryption obtainable today will hold up under tomorrow's computing power. However, the strong
cryptography employed by OpenPGP is the best available today. Vigilance and conservatism will
protect you better, however, than claims of impenetrability.

How does cryptography work?

A cryptographic algorithm, or cipher, is a mathematical function used in the encryption and


decryption process. A cryptographic algorithm works in combination with a key — a word, number,
or phrase — to encrypt the plaintext. The same plaintext encrypts to different ciphertext with
different keys. The security of encrypted data is entirely dependent on two things: the strength of
the cryptographic algorithm and the secrecy of the key.

A cryptographic algorithm, plus all possible keys and all the protocols that make it work comprise a
cryptosystem. OpenPGP is a cryptosystem.

Conventional cryptography

In conventional cryptography, also called secret-key or symmetric-key encryption, one key is used
both for encryption and decryption. The Data Encryption Standard (DES) is an example of a
conventional crypto system that is widely employed by the Federal Government. Figure 1-2 is an
illustration of the conventional encryption process.

Figure 1-2. Conventional encryption

Caesar's Cipher

An extremely simple example of conventional cryptography is a substitution cipher. A substitution


cipher substitutes one piece of information for another. This is most frequently done by offsetting
letters of the alphabet. Two examples are Captain Midnight's Secret Decoder Ring, which you may
have owned when you were a kid, and Julius Caesar's cipher. In both cases, the algorithm is to offset
the alphabet and the key is the number of characters to offset it.

For example, if we encode the word "SECRET" using Caesar's key value of 3, we offset the alphabet
so that the 3rd letter down (D) begins the alphabet.

77
So starting with

ABCDEFGHIJKLMNOPQRSTUVWXYZ

and sliding everything up by 3, you get

DEFGHIJKLMNOPQRSTUVWXYZABC

where D=A, E=B, F=C, and so on.

Using this scheme, the plaintext, "SECRET" encrypts as "VHFUHW." To allow someone else to read
the ciphertext, you tell them that the key is 3.

Obviously, this is exceedingly weak cryptography by today's standards, but hey, it worked for Caesar,
and it illustrates how conventional cryptography works.

Symmetric key encryption

Symmetric key encryption is a type of encryption that makes use of a single key for both the
encryption and decryption process. Some of the encryption algorithms that use symmetric keys
include: AES (Advanced Encryption Standard), Blowfish, DES (Data Encryption Standard), Triple DES,
Serpent, and Twofish.

If you want to apply symmetric key encryption to a file transfer environment, both the sender and
receiver should have a copy of the same key. The sender will use his copy of the key for encrypting
the file, while the receiver will use his copy for decrypting it.

So if you manage a secure file transfer server that only supports symmetric encryption and one of
your users wants to encrypt a file first before uploading it, one of you (either the user or you, the
server admin) should first generate a key and then send the other person a copy of that key.

Asymmetric key encryption

Asymmetric key encryption, on the other hand, makes use of two keys. A private key and a public
key. The public key is used for encrypting, while the private key is used for decrypting. Two of the
most widely used asymmetric key algorithms are: RSA and DSA.

If you're going to use asymmetric key encryption in a file transfer environment, the sender would
need to hold the public key, while the receiver would need to hold the corresponding private key.

So, going back to the scenario given in the previous section, if you manage a file transfer server and
one of your users wants to encrypt a file first before uploading it, it would typically be your duty to
generate the key pair. You should then send the public key to your user and leave the private key on
the server.

Which is stronger?

Actually, it's difficult to compare the cryptographic strengths of symmetric and asymmetric key
encryptions. Even though asymmetric key lengths are generally much longer (e.g. 1024 and 2048)
than symmetric key lengths (e.g. 128 and 256), it doesn't, for example, necessarily follow that a file
encrypted with a 2048-bit RSA key (an asymmetric key) is already tougher to crack than a file
encrypted with a 256-bit AES key (a symmetric key).

78
Instead, it would be more appropriate to compare asymmetric and symmetric encryptions on the
basis of two properties:

 Their computational requirements, and


 Their ease of distribution

Symmetric key encryption doesn't require as many CPU cycles as asymmetric key encryption, so you
can say it's generally faster. Thus, when it comes to speed, symmetric trumps asymmetric. However,
symmetric keys have a major disadvantage especially if you're going to use them for securing file
transfers.

Because the same key has to be used for encryption and decryption, you will need to find a way to
get the key to your recipient if he doesn't have it yet. Otherwise, your recipient won't be able to
decrypt the files you send him. However way you do it, it has to be done in a secure manner or else
anyone who gets a hold of that key can simply intercept your encrypted file and decrypt it with the
key.

The issue of key distribution becomes even more pronounced in a file transfer environment, which
can involve a large number of users and likely distributed over a vast geographical area. Some users,
most of whom you may never have met, might even be located halfway around the world.
Distributing a symmetric key in a secure manner to each of these users would be nearly impossible.

Asymmetric key encryption doesn't have this problem. For as long as you keep your private key
secret, no one would be able to decrypt your encrypted file. So you can easily distribute the
corresponding public key without worrying about who gets a hold of it (well, actually, there are
spoofing attacks on public keys but that's for another story). Anyone who holds a copy of that public
key can encrypt a file prior to uploading to your server. Then once the file gets uploaded, you can
decrypt it with your private key.

Data Encryption Standard (DES):

DES is a symmetric block cipher (shared secret key), with a key length of 56-bits. Published as the
Federal Information Processing Standards (FIPS) 46 standard in 1977, DES was officially withdrawn in
2005 [although NIST has approved Triple DES (3DES) through 2030 for sensitive government
information].

The federal government originally developed DES encryption over 35 years ago to provide
cryptographic security for all government communications. The idea was to ensure government
systems all used the same, secure standard to facilitate interconnectivity.

To show that the DES was inadequate and should not be used in important systems anymore, a
series of challenges were sponsored to see how long it would take to decrypt a message. Two
organizations played key roles in breaking DES: distributed.net and the Electronic Frontier
Foundation (EFF).

 The DES I contest (1997) took 84 days to use a brute force attack to break the encrypted
message.
 In 1998, there were two DES II challenges issued. The first challenge took just over a month
and the decrypted text was "The unknown message is: Many hands make light work". The
second challenge took less than three days, with the plaintext message "It's time for those
128-, 192-, and 256-bit keys".
 The final DES III challenge in early 1999 only took 22 hours and 15 minutes. Electronic
Frontier Foundation's Deep Crack computer (built for less than $250,000) and distributed
79
net's computing network found the 56-bit DES key, deciphered the message, and they (EFF &
distributed.net) won the contest. The decrypted message read "See you in Rome (Second
AES Candidate Conference, March 22-23, 1999)", and was found after checking about 30
percent of the key space...Finally proving that DES belonged to the past.

Even Triple DES (3DES), a way of using DES encryption three times, proved ineffective against brute
force attacks (in addition to slowing down the process substantially).

Advanced Encryption Standard (AES):

Published as a FIPS 197 standard in 2001. AES data encryption is a more mathematically efficient and
elegant cryptographic algorithm, but its main strength rests in the option for various key lengths.
AES allows you to choose a 128-bit, 192-bit or 256-bit key, making it exponentially stronger than the
56-bit key of DES. In terms of structure, DES uses the Feistily network which divides the block into
two halves before going through the encryption steps. AES on the other hand, uses permutation-
substitution, which involves a series of substitution and permutation steps to create the encrypted
block. The original DES designers made a great contribution to data security, but one could say that
the aggregate effort of cryptographers for the AES algorithm has been far greater.

One of the original requirements by the National Institute of Standards and Technology (NIST) for
the replacement algorithm was that it had to be efficient both in software and hardware
implementations (DES was originally practical only in hardware implementations). Java and C
reference implementations were used to do performance analysis of the algorithms. AES was chosen
through an open competition with 15 candidates from as many research teams around the world,
and the total amount of resources allocated to that process was tremendous. Finally, in October
2000, a NIST press release announced the selection of Rijndael as the proposed Advanced Encryption
Standard (AES).

Comparing DES and AES

DES AES

Developed 1977 2000

Key Length 56 bits 128, 192, or 256 bits

Cipher Type Symmetric block cipher Symmetric block cipher

Block Size 64 bits 128 bits

Security Proven inadequate Considered secure

Hash functions

The system described above has some problems. It is slow, and it produces an enormous volume of
data — at least double the size of the original information. An improvement on the above scheme is
the addition of a one-way ** in the process. A one-way hash function takes variable-length input —
in this case, a message of any length, even thousands or millions of bits — and produces a fixed-

80
length output; say, 160-bits. The hash function ensures that, if the information is changed in any way
— even by just one bit — an entirely different output value is produced.

OpenPGP uses a cryptographically strong hash function on the plaintext the user is signing. This
generates a fixed-length data item known as a message digest. (Again, any change to the
information results in a totally different digest.)

Then OpenPGP uses the digest and the private key to create the "signature." OpenPGP transmits the
signature and the plaintext together. Upon receipt of the message, the recipient uses OpenPGP to
recompute the digest, thus verifying the signature. OpenPGP can encrypt the plaintext or not;
signing plaintext is useful if some of the recipients are not interested in or capable of verifying the
signature.

As long as a secure hash function is used, there is no way to take someone's signature from one
document and attach it to another, or to alter a signed message in any way. The slightest change in a
signed document will cause the digital signature verification process to fail.

Figure 1-7. Secure digital signatures

Digital signatures play a major role in authenticating and validating other OpenPGP users' keys.

RSA

short for the surnames of its designers Ron Rivest, Adi Shamir and Leonard Adleman

 Not used to encrypt data directly because of speed constraints and also because its yield is
small (see this post for a good explanation; also this TechNet article).
o Usually RSA is used to share a secret key and then a symmetric key algorithm is used
for the actual encryption.
o RSA can be used for digital signing but is slower. DSA (see below) is preferred.
However, RSA signatures are faster to verify. To sign data a hash is made of it and
the hash encrypted with the private key. (Note: RSA requires that a hash be made
rather than encrypt the data itself).
o RSA does not require the use of any particular hash function.

81
 Public and Private keys are based on two large prime numbers which must be kept secret.
RSA’s security is based on the fact that factorization of large integers is difficult. (The public
and private keys are large integers which are derived from the two large prime numbers).
 PKCS#1 is a standard for implementing the RSA algorithm. The RSA algorithm can be
attacked if certain criteria are met so the PKCS#1 defines things such that these criteria are
not met. See this post for more info.
 Was originally patented by the RSA but has since (circa 2000) expired.
 SSH v1 only uses RSA keys (for identity verification).

RC4

Rivest Cipher 4, or Ron’s Code 4 – also known as ARC4 or ARCFOUR (Alleged RC4).

 Used to be an unpatented trade-secret for RSA Data Security Inc (RSADSI). Then someone
posted the source code online, anonymously, and it got into the public domain.
 Very fast, but less studied than other algorithms.
 RC4 is good if the key is never reused. Then its considered secure by many.
 In practice RC4 is not recommended. TLS 1.1 and above forbid RC4 (also this RFC).
CloudFlare recommends against it (check this blog post too). Microsoft recommends against
it. Current recommendations overall are to use TLS 1.2 (which forbids RC4) and use AES-
GCM.
 See this blog post too.
 RC4 is a stream cipher. It’s the most widely used stream cipher.
 Recently block ciphers were found to have issues (e.g. BEAST, Lucky13) because of which
RC4 rose in importance. Now such attacks are mitigated (use GCM mode for instance) and
RC4 is strongly recommended against.
 In 1994, Ronald Rivest designed RC5 for RSA Security. RC5 has a variable number of rounds
ranging from 0 to 255 with block size bits of 32, 64 or 128. Keys can range from 0 to 2040
bits. Users can choose between rounds, block sizes and keys. When the output and input
blocks and the keys are all the same size, an RC5 block can match the same number of block
sizes from permutations to integers. RC5 is known for its technical flexibility and the security
it provides.

RC6

Rivest Cipher 6 or Ron’s Code 6 – designed by Ron Rivest and others

 Was one of the finalists in the AES competition


 Proprietary algorithm. Patented by RSA Security.
 RC5 is a predecessor of RC6. Other siblings include RC2 and RC4.
 More on RC5 and RC6 at this RSA link.
 RC5 and RC6 are block ciphers.

MD5

Message-Digest 5 – designed by Ron Rivest to replace MD4

 As with MD4 it creates a digest of 128-bits.


 MD5 too is no longer recommended as vulnerabilities have been found in it and actively
exploited.

82
SHA 0

(a.k.a. SHA) – Secure Hash Algorithm 0 – designed by the NSA

 Creates a 160-bit hash.


 Not widely used.

SHA-1

Secure Hash Algorithm 1 – designed by the NSA

 Creates a 160-bit hash.


 Is very similar to SHA-0 but corrects many alleged weaknesses. Is related to MD-4 too.
 Is very widely used but is not recommended as there are theoretical attacks on it that could
become practical as technology improves.
 SHA-2 is the new recommendation. Microsoft and Google will stop accepting certificates
with SHA-1 hashes, for instance, from January 2017.

SHA-2

Secure Hash Algorithm 2 – designed by the NSA.

 Significantly different from SHA-1.


 Patented. But royalty free.
 SHA-2 defines a family of hash functions.
 Creates hashes of 224, 256, 384 or 512 bits. These variants are called SHA-224, SHA-256,
SHA-384, SHA-512, SHA-512/224, and SHA-512/256.
 SHA-256 and SHA-512 new hash functions. They are similar to each other. These are the
popular functions of this family.
 SHA-512 is supported by TrueCrypt.
 SHA-256 is used by DKIM signing.
 SHA-256 and SHA-512 are recommended for DNSSEC.
 SHA-224 and SHA-384 are truncated versions of the above two.
 SHA-512/224 and SHA-512/256 are also truncated versions of the above two with some
other differences.
 There are theoretical attacks against SHA-2 but no practical ones.

SHA-3

Secure Hash Algorithm 3 – winner of the NIST hash function competition

 Not meant to replace SHA-2 currently.


 The actual algorithm name is Keccak.

PK Infrastructure, Digital Signature

A public key infrastructure (PKI) is a set of roles, policies, and procedures needed to create, manage,
distribute, use, store, and revoke digital certificates and manage public-key encryption. The purpose
of a PKI is to facilitate the secure electronic transfer of information for a range of network activities
such as e-commerce, internet banking and confidential email. It is required for activities where
simple passwords are an inadequate authentication method and more rigorous proof is required to
83
confirm the identity of the parties involved in the communication and to validate the information
being transferred.

In cryptography, a PKI is an arrangement that binds public keys with respective identities of entities
(like people and organizations). The binding is established through a process of registration and
issuance of certificates at and by a certificate authority (CA). Depending on the assurance level of the
binding, this may be carried out by an automated process or under human supervision.

The PKI role that assures valid and correct registration is called a registration authority (RA). An RA is
responsible for accepting requests for digital certificates and authenticating the entity making the
request. In a Microsoft PKI, a registration authority is usually called a subordinate CA.

An entity must be uniquely identifiable within each CA domain on the basis of information about
that entity. A third-party validation authority (VA) can provide this entity information on behalf of
the CA.

Digital Signature

Digital signatures allow us to verify the author, date and time of signatures, authenticate the
message contents. It also includes authentication function for additional capabilities.

A digital signature should not only be tied to the signing user, but also to the message.

84
Applications

There are several reasons to implement digital signatures to communications:

Authentication

Digital signatures help to authenticate the sources of messages. For example, if a bank’s branch
office sends a message to central office, requesting for change in balance of an account. If the
central office could not authenticate that message is sent from an authorized source, acting of such
request could be a grave mistake.

Integrity

Once the message is signed, any change in the message would invalidate the signature.

Non-repudiation

By this property, any entity that has signed some information cannot at a later time deny having
signed it.

Introduction to SSL/TLS

SSL (Secure Socket Layer) is an old protocol deprecated in favor of TLS (Transport Layer Security).

TLS is a protocol for secure transmission of data heavily based on SSLv3. It offers confidentiality,
integrity and authentication. In layman’s terms that means that it:

 Confidentiality: hides the content of the messages

 Integrity: detects when they’ve been tampered with

 Authentication: ensures that whoever is sending them is who he says he is.

Additionally, it detects missing and duplicated messages.

TLS is the primary way to secure web traffic, and is mostly used for that purpose. A whole lot of
pages trust that TLS is secure (from the smallest online shop to Facebook), that is why things like
POODLE and Heartbleed receive so much press.

At this point I want to make something abundantly clear, SSL IS BROKEN, ALL THREE VERSIONS. The
latest one, version 3, had its confidentiality severely compromised by POODLE. DO NOT USE SSL.

Confidentiality and integrity in TLS

First I’ll briefly explain how TLS offers confidentiality and integrity. We will leave authentication at
the end as that will be the bulk of the tutorial.

85
I will be using a lot of crypto jargon here so do not fret if any of this seems too alien to you, generally
you won’t be fiddling with this directly as TLS negotiates much of this for you.

However, I do recommend getting familiar with the basics of public key encryption. Not the inner
workings mind you, just how they are used in practice to encrypt/authenticate.

If you are really interested on these subjects I’d recommend researching:

 AES

 Block ciphers modes of operation

 Cryptographic hashes

 SHA-2

 HMAC

 Key exchange

 Diffie-Hellman

 Public key cryptography

The Stanford course in Coursera about Cryptography is an awesome starting point.

For confidentiality TLS uses either Diffie-Hellman (elliptic curve mode supported) or RSA for the key
exchange. With the whole Heartbleed debacle, it is now recommended using a key exchange
mechanism with forward secrecy, which implies Diffie-Hellman in this case. The actual encryption is
normally done using AES with several modes available (CBC, CCM, GCM).

For integrity normally HMAC is used, today is pretty much mandatory to use SHA256.

Basics of authentication in TLS

Authentication is the part that you will most likely have to fiddle with the most and the one that
actually costs you money, it is also essential for the security of your communications: You. Cannot.
Have. Secure. Communications. Without. Authentication.

How is that you might ask? Well the thing about confidentiality and integrity is that they are
worthless without authentication. If there is no way to ensure that the guy that says "I am gmail.com
I swear" is in fact gmail.com and not evil8yoldhacker.com, then you could potentially encrypt and
validate a spurious connection.

So without authentication you would be open to, for example, MitM (Man in the Middle) attacks.
But how can you authenticate a server if you have never seen it before, and let alone exchanged any
credentials with it?

The answer is TLS certificates. Certificates are just a public key with a bunch of information attached
to it, such as the FQDN being authenticated, a contact email, the “issued on” and “expires on" dates,

86
among other things. The server stores and keeps secret the corresponding private key. TLS uses
these keys to authenticate the server to the client.

Recall that in public key cryptography, messages encrypted with the public key can only be
decrypted using the private key, but messages encrypted with the private key can be decrypted with
either. The owner of the keys keeps the private key secret, and distributes the public key freely.

Now, the usual way to authenticating someone using public key cryptography is the following
(Assume that Bob wants to authenticate Alice):

1. Bob sends Alice a random message encrypted with Alice’s public key

2. Alice decrypts the message using her private key and sends it back to Bob

3. Bob compares the message from Alice against the one that it sends. If they match Alice is
who she says she is, because only her could have decrypted the message, because only she
has her private key

TLS uses a variation of this technique, but is essentially the same.

Now, even with this trick validation of a certificate is not straightforward, so for didactic purposes
we’ll begin with an oversimplified, naive and flawed solution.

Solution #1:

1. A hash of the certificate is made, encrypted with the private key, and then appended to the
certificate to create a new certificate

2. The server sends this new certificate to the clients that connect to it

3. To verify the certificate, the client decrypts the hash using the public key of the certificate,
then calculates its own hash and compares them, if they are equal the certificate is valid

4. It then sends a random message to the server encrypted with the provided public key, if the
server sends the original unencrypted message back, then is considered authenticated

This process ensures that:

 The provided public key corresponds to the private key used to encrypt the hash

 The server has access to the private key

The encrypted hash created appended to the certificate is called a digital signature. In this example,
the server has digitally signed its own certificate, this is called a self-signed certificate.

Now, this scheme does not authenticate at all if you think about it. If an attacker manages to
intercept the communications or divert the traffic, it can replace the public key on the certificate
with his own, redo the digital signature with his own private key, and feed that to the client.
87
The problem lies in the fact that all the information necessary for verification is provided by the
server, so the only thing you can be sure of is that the party that you’re talking to has the private key
corresponding to the public key that it itself provided.

This is why when you connect to an entity with a self-signed certificate browser will give a warning,
they cannot ensure that whoever you are communicating with is who they say they are.

However, these kind of certificates are really useful in some scenarios. They can be created free of
charge, quickly, and with little hassle. Thus they are good for some internal communications and
prototyping.

Solution #2:

So that did not work out. How can we solve it? Well, since the problem is that the server provides all
the information for authenticating the certificate, why don’t we just move some of that information
to the client?

Let us drive over to the client after lunch with a copy of the certificate in an USB drive, and store it
directly on the client. Later when the server sends it’s certificate:

1. The client generates its own hash of the certificate and decrypts the provided hash with the
public key of the copy of the certificate that was provided earlier in the USB drive, this way it
ensures that: 1. the certificate has not changed, 2. whoever signed it has the correct private
key.

2. It then sends a random message to the server encrypted with the public key in the copy of
the certificate that was stored earlier, if the server sends back the original unencrypted
message we consider it authenticated

If someone intercepts or diverts the connection, they have no hope of passing our random number
challenge and providing a valid certificate, without the private key that is computationally infeasible.

The “driving over to the client after lunch with an USB drive” is called an out of band process.

Now this solution actually authenticates, and is sometimes used for internal communications.
However, it is not very practical for several use cases. It would be cumbersome to have to download
a certificate every time you need to access a new secure website like a bank or e-shop, or worse,
waiting for someone to drive over to you house after lunch with an USB drive.

Moreover, you have to ensure that the certificate is not tampered with on the way, which is one of
the problems TLS should be solving for us in the first place. There is also the problem of what to do
when the certificate expires, or how to revoke it if the private key becomes compromised.

Solution #3:

Well, we got something working, but it is not quite ready for use on the landing page of your online
store yet. How can we solve it? Well, here is where money comes in.

88
Meet Bob, he is a well-known member of the community, a truly and responsible fellow loyal to a
fault, and he comes up with a business. He will create a self-signed certificate, and will give it freely
to everybody using an out of band process (let's say, during a neighborhood-wide free-of-charge
barbecue), he will then charge you a fee to digitally sign your certificate using his private key.

See, up until this time every certificate was self-signed, this time it is Bob that will sign it, now we’re
in a situation where:

 Everybody has the certificate that Bob so generously gave them


 Everybody just trusts Bob (who wouldn't, he saves orphan puppies from fires in his spare
time)

Therefore, anyone that has his certificate signed with Bob’s private key can have it authenticated by
anyone that has Bob’s certificate (because Bob’s certificate has his public key).

Bob will make sure nobody impersonates anyone. If he receives an email from you saying that you
want certain certificate signed for your company, Bob will go to your house personally and ensure
that it is your certificate, that it is your company, and that you did sent out that email...you can
never be too careful with all these 8-year-old evil hackers.

When Bob’s certificates is about to expire he will make sure he gives you a new one.

If anyone wants to revoke his or her certificate they can just tell Bob to put it in his list of revoked
certificates, when somebody tries to authenticate you with a certificate they will call Bob to check if
that certificate has been revoked.

Well, this is how the real world kind of works, with a few major changes:

 Bob is a CA (Certification Authority) such as DigiCert, Comodo, Symantec, GoDaddy,


GlobalSign, etc (and no, the don’t save puppies, and sometimes f!@# up really bad)

 The out of band process is not a barbeque. Their certificates already come bundled with
your operating system and browser (check /etc/ssl/certs if you’re on Linux)

 The revocation mechanisms are called CRL and OCSP (OCSP was supposed to supersede CRL,
but the whole thing is kind of a mess right now)

 You kind of HAVE to trust them

They do charge you though. Depending on the kind of certificate the prices vary.

The process where a third party that both the server and the client trust signs the certificate of the
server creates a chain of trust. With TLS, we create chains of trust using CAs as our third parties.

Note that your data will not have a stronger or weaker encryption depending on how much you pay,
all TLS connections use some form of AES, how strong depends of what the client and server are able
and willing to handle/use. For example, many servers refuse to use SSLv3 since the whole POODLE
scandal, some older clients do not support the newest encryption algorithms, some servers have not
been updated, some clients use Internet Explorer.

Now, you might wonder, if every certificate gives me equally strong encryption, integrity and
authentication, why are some more costly than others?
89
Well, CAs generally charge you more for certificates that will be used on several machines, so for
example a certificate for *.talpor.com will cost more than one for www.talpor.com (notice the
shameless self-promotion). However, what CAs really sell is not certificates, is trust.

90
Chapter 6: Forensics

Introduction to Forensic Analysis

Computer forensics is the practice of collecting, analysing and reporting on digital data in a way that
is legally admissible. It can be used in the detection and prevention of crime and in any dispute
where evidence is stored digitally. Computer forensics follows a similar process to other forensic
disciplines, and faces similar issues.

Uses of computer forensics

There are few areas of crime or dispute where computer forensics cannot be applied. Law
enforcement agencies have been among the earliest and heaviest users of computer forensics and
consequently have often been at the forefront of developments in the field.

Computers may constitute a ‘scene of a crime’, for example with hacking or denial of service attacks
or they may hold evidence in the form of emails, internet history, documents or other files relevant
to crimes such as murder, kidnap, fraud and drug trafficking.

It is not just the content of emails, documents and other files which may be of interest to
investigators but also the ‘metadata’ associated with those files. A computer forensic examination
may reveal when a document first appeared on a computer, when it was last edited, when it was last
saved or printed and which user carried out these actions.

More recently, commercial organizations have used computer forensics to their benefit in a variety
of cases such as;

* Intellectual Property theft


* Industrial espionage
* Employment disputes
* Fraud investigations
* Forgeries
* Bankruptcy investigations
* Inappropriate email and internet use in the work place
* Regulatory compliance

The four main principles from this guide (with references to law enforcement removed) are as
follows:

 No action should change data held on a computer or storage media which may be
subsequently relied upon in court.
 In circumstances where a person finds it necessary to access original data held on a
computer or storage media, that person must be competent to do so and be able to give
evidence explaining the relevance and the implications of their actions.
 An audit trail or other record of all processes applied to computer-based electronic evidence
should be created and preserved. An independent third-party should be able to examine
those processes and achieve the same result.
 The person in charge of the investigation has overall responsibility for ensuring that the law
and these principles are adhered to.

Live acquisition
91
In what situations would changes to a suspect’s computer by a computer forensic examiner be
necessary?

Traditionally, the computer forensic examiner would make a copy (or acquire) information from a
device which is turned off. A write-blocker would be used to make an exact bit for bit copy of the
original storage medium. The examiner would work from this copy, leaving the original
demonstrably unchanged.

However, sometimes it is not possible or desirable to switch a computer off. It may not be possible if
doing so would, for example, result in considerable financial or other loss for the owner. The
examiner may also wish to avoid a situation whereby turning a device off may render valuable
evidence to be permanently lost. In both these circumstances the computer forensic examiner
would need to carry out a ‘live acquisition’ which would involve running a small program on the
suspect computer in order to copy (or acquire) the data to the examiner’s hard drive.

By running such a program and attaching a destination drive to the suspect computer, the examiner
will make changes and/or additions to the state of the computer which were not present before his
actions. However, the evidence produced would still usually be considered admissible if the
examiner was able to show why such actions were considered necessary, that they recorded those
actions and that they are to explain to a court the consequences of those actions.

Stages of an examination

We’ve divided the computer forensic examination process into six stages, presented in their usual
chronological order.

Readiness

Forensic readiness is an important and occasionally overlooked stage in the examination process. In
commercial computer forensics it can include educating clients about system preparedness; for
example, forensic examinations will provide stronger evidence if a device’s auditing features have
been activated prior to any incident occurring.

For the forensic examiner themselves, readiness will include appropriate training, regular testing and
verification of their software and equipment, familiarity with legislation, dealing with unexpected
issues (e.g., what to do if indecent images of children are found present during a commercial job)
and ensuring that the on-site acquisition (data extraction) kit is complete and in working order.

Evaluation

The evaluation stage includes the receiving of instructions, the clarification of those instructions if
unclear or ambiguous, risk analysis and the allocation of roles and resources. Risk analysis for law
enforcement may include an assessment on the likelihood of physical threat on entering a suspect’s
property and how best to counter it.

Commercial organizations also need to be aware of health and safety issues, conflict of interest
issues and of possible risks – financial and to their reputation – on accepting a particular project.

Collection
92
The main part of the collection stage, acquisition, has been introduced above.

If acquisition is to be carried out on-site rather than in a computer forensic laboratory, then this
stage would include identifying and securing devices which may store evidence and documenting
the scene. Interviews or meetings with personnel who may hold information relevant to the
examination (which could include the end users of the computer, and the manager and person
responsible for providing computer services, such as an IT administrator) would usually be carried
out at this stage.

The collection stage also involves the labelling and bagging of evidential items from the site, to be
sealed in numbered tamper-evident bags. Consideration should be given to securely and safely
transporting the material to the examiner’s laboratory.

Analysis

Analysis depends on the specifics of each job. The examiner usually provides feedback to the client
during analysis and from this dialogue the analysis may take a different path or be narrowed to
specific areas. Analysis must be accurate, thorough, impartial, recorded, repeatable and completed
within the time-scales available and resources allocated.

There are myriad tools available for computer forensics analysis. It is our opinion that the examiner
should use any tool they feel comfortable with as long as they can justify their choice. The main
requirements of a computer forensic tool are that it does what it is meant to do and the only way for
examiners to be sure of this is for them to regularly test and calibrate the tools they rely on before
analysis takes place.

Dual-tool verification can confirm result integrity during analysis (if with tool ‘A’ the examiner finds
artefact ‘X’ at location ‘Y’, then tool ‘B’ should replicate these results).

Presentation

This stage usually involves the examiner producing a structured report on their findings, addressing
the points in the initial instructions along with any subsequent instructions. It would also cover any
other information which the examiner deems relevant to the investigation.

The report must be written with the end reader in mind; in many cases the reader will be non-
technical, and so reader-appropriate terminology should be used. The examiner should also be
prepared to participate in meetings or telephone conferences to discuss and elaborate on the
report.

Review

As with the readiness stage, the review stage is often overlooked or disregarded. This may be due to
the perceived costs of doing work that is not billable, or the need ‘to get on with the next job’.

However, a review stage incorporated into each examination can help save money and raise the
level of quality by making future examinations more efficient and time effective.

A review of an examination can be simple, quick and can begin during any of the above stages. It
may include a basic analysis of what went wrong, what went well, and how the learning from this
can be incorporated into future examinations’. Feedback from the instructing party should also be
sought.

93
Any lessons learnt from this stage should be applied to the next examination and fed into the
readiness stage.

Issues facing computer forensics

The issues facing computer forensics examiners can be broken down into three broad categories:
technical, legal and administrative.

Technical issues

Encryption – Encrypted data can be impossible to view without the correct key or password.
Examiners should consider that the key or password may be stored elsewhere on the computer or
on another computer which the suspect has had access to. It could also reside in the volatile
memory of a computer (known as RAM which is usually lost on computer shut-down; another
reason to consider using live acquisition techniques, as outlined above.

Increasing storage space – Storage media hold ever greater amounts of data, which for the
examiner means that their analysis computers need to have sufficient processing power and
available storage capacity to efficiently deal with searching and analysing large amounts of data.

New technologies – Computing is a continually evolving field, with new hardware, software and
operating systems emerging constantly. No single computer forensic examiner can be an expert on
all areas, though they may frequently be expected to analyze something which they haven’t
previously encountered. In order to deal with this situation, the examiner should be prepared and
able to test and experiment with the behavior of new technologies. Networking and sharing
knowledge with other computer forensic examiners is very useful in this respect as it’s likely
someone else has already come across the same issue.

Anti-forensics – Anti-forensics is the practice of attempting to thwart computer forensic analysis.


This may include encryption, the over-writing of data to make it unrecoverable, the modification of
files’ metadata and file obfuscation (disguising files). As with encryption, the evidence that such
methods have been used may be stored elsewhere on the computer or on another computer which
the suspect has had access to. In our experience, it is very rare to see anti-forensics tools used
correctly and frequently enough to totally obscure either their presence or the presence of the
evidence that they were used to hide.

Legal issues

Legal issues may confuse or distract from a computer examiner’s findings. An example here would
be the ‘Trojan Defense’. A Trojan is a piece of computer code disguised as something benign but
which carries a hidden and malicious purpose. Trojans have many uses, and include key-logging),
uploading and downloading of files and installation of viruses. A lawyer may be able to argue that
actions on a computer were not carried out by a user but were automated by a Trojan without the
user’s knowledge; such a Trojan Defense has been successfully used even when no trace of a Trojan
or other malicious code was found on the suspect’s computer. In such cases, a competent opposing
lawyer, supplied with evidence from a competent computer forensic analyst, should be able to
dismiss such an argument. A good examiner will have identified and addressed possible arguments
from the “opposition” while carrying out the analysis and in writing their report.

Administrative issues

Accepted standards – There are a plethora of standards and guidelines in computer forensics, few of
which appear to be universally accepted. The reasons for this include: standard-setting bodies being
94
tied to particular legislations; standards being aimed either at law enforcement or commercial
forensics but not at both; the authors of such standards not being accepted by their peers; or high
joining fees for professional bodies dissuading practitioners from participating.

Fit to practice – In many jurisdictions there is no qualifying body to check the competence and
integrity of computer forensics professionals. In such cases anyone may present themselves as a
computer forensic expert, which may result in computer forensic examinations of questionable
quality and a negative view of the profession as a whole.

Some Key Terms

1. Hacking: modifying a computer in a way which was not originally intended in order to
benefit the hacker’s goals.
2. Denial of Service attack: an attempt to prevent legitimate users of a computer system from
having access to that system’s information or services.
3. Metadata: data about data. It can be embedded within files or stored externally in a
separate file and may contain information about the file’s author, format, creation date and
so on.
4. Write blocker: a hardware device or software application which prevents any data from
being modified or added to the storage medium being examined.
5. Bit copy: ‘bit’ is a contraction of the term ‘binary digit’ and is the fundamental unit of
computing. A bit copy refers to a sequential copy of every bit on a storage medium, which
includes areas of the medium ‘invisible’ to the user.
6. RAM: Random Access Memory. RAM is a computer’s temporary workspace and is volatile,
which means its contents are lost when the computer is powered off.
7. Key-logging: the recording of keyboard input giving the ability to read a user’s typed
passwords, emails and other confidential information.

BIOS, BOOT Sequence & Boot Environment

BIOS is an acronym for basic input/output system, the built-in software that determines what a
computer can do without accessing programs from a disk. The BIOS is an important part of any
computer system. On personal computers (PCs), for example, the BIOS contains all the code
required to control the keyboard, display screen, disk drives, serial communications, and a
number of miscellaneous functions.

The ROM BIOS Explained

The BIOS is typically placed in a ROM chip that comes with the computer (it is often called a ROM
BIOS). This ensures that the BIOS will always be available and will not be damaged by disk failures. It
also makes it possible for a computer to boot itself. Because RAM is faster than ROM, though, many
computer manufacturers design systems so that the BIOS is copied from ROM to RAM each time the
computer is booted. This is known as shadowing.

Many modern PCs have a flash BIOS, which means that the BIOS has been recorded on a flash
memory chip, which can be updated if necessary.

PC BIOS Standardization

The PC BIOS is fairly standardized, so all PCs are similar at this level (although there are different
BIOS versions). Additional DOS functions are usually added through software modules. This means
you can upgrade to a newer version of DOS without changing the BIOS.

95
Newer PC BIOSes that can handle Plug-and-Play (PnP) devices are known as PnP BIOSes, or PnP-
aware BIOSes. These BIOSes are always implemented with flash memory rather than ROM.

Boot Sequence

Definition - What does Boot Sequence mean?

Boot sequence is the order in which a computer searches for nonvolatile data storage devices
containing program code to load the operating system (OS). Typically, a Macintosh structure uses
ROM and Windows uses BIOS to start the boot sequence. Once the instructions are found, the CPU
takes control and loads the OS into system memory.

The devices that are usually listed as boot order options in the BIOS settings are hard disks, floppy
drives, optical drives, flash drives, etc. The user is able to change the boot sequence via the CMOS
setup.

Boot sequence is also called as boot order or BIOS boot order.

Prior to boot sequence is the power-on self-test (POST), which is the initial diagnostic test performed
by a computer when it is switched on. When POST is finished, the boot sequence begins. If there are
problems during POST, the user is alerted by beep codes, POST codes or on-screen POST error
messages.

Unless programmed otherwise, the BIOS looks for the OS on drive A first, then looks for the drive C.
It is possible to modify the boot sequence from BIOS settings. Different BIOS models have different
key combination and onscreen instructions to enter the BIOS and change the boot sequence.
Normally, after the POST, BIOS will try to boot using the first device assigned in the BIOS boot order.
If that device is not suitable for booting, then the BIOS will try to boot from the second device listed,
and this process continues till the BIOS finds the boot code from the devices listed.

If the boot device is not found, an error message is displayed and the system crashes or freezes.
Errors can be caused by an unavailable boot device, boot sector viruses or an inactive boot partition.

A boot loader is a computer program that loads an operating system or some other system software
for the computer after completion of the power-on self-tests; it is the loader for the operating
system itself. Within the hard reboot process, it runs after completion of the self-tests, then loads
and runs the software. A boot loader is loaded into main memory from persistent memory, such as a
hard disk drive or, in some older computers, from a medium such as punched cards, punched tape,
or magnetic tape. The boot loader then loads and executes the processes that finalize the boot. Like
POST processes, the boot loader code comes from a "hard-wired" and persistent location; if that
location is too limited for some reason, that primary boot loader calls a second-stage boot loader or
a secondary program loader.

On modern general purpose computers, the boot up process can take tens of seconds, or even
minutes, and typically involves performing a power-on self-test, locating and initializing peripheral
devices, and then finding, loading and starting an operating system. The process of hibernating or
sleeping does not involve booting. Minimally, some embedded systems do not require a noticeable
boot sequence to begin functioning and when turned on may simply run operational programs that
are stored in ROM. All computing systems are state machines, and a reboot may be the only method
to return to a designated zero-state from an unintended, locked state.

In addition to loading an operating system or stand-alone utility, the boot process can also load a
storage dump program for diagnosing problems in an operating system.
96
Boot is short for bootstrap or bootstrap load and derives from the phrase to pull oneself up by one's
bootstraps. The usage calls attention to the requirement that, if most software is loaded onto a
computer by other software already running on the computer, some mechanism must exist to load
the initial software onto the computer. Early computers used a variety of ad-hoc methods to get a
small program into memory to solve this problem. The invention of read-only memory (ROM) of
various types solved this paradox by allowing computers to be shipped with a startup program that
could not be erased. Growth in the capacity of ROM has allowed ever more elaborate start up
procedures to be implemented.

Modern boot loaders

When a computer is turned off, its software—including operating systems, application code, and
data—remains stored on non-volatile memory. When the computer is powered on, it typically does
not have an operating system or its loader in random-access memory (RAM). The computer first
executes a relatively small program stored in read-only memory (ROM) along with a small amount of
needed data, to access the nonvolatile device or devices from which the operating system programs
and data can be loaded into RAM.

The small program that starts this sequence is known as a bootstrap loader, bootstrap or boot
loader. This small program's only job is to load other data and programs which are then executed
from RAM. Often, multiple-stage boot loaders are used, during which several programs of increasing
complexity load one after the other in a process of chain loading.

Some computer systems, upon receiving a boot signal from a human operator or a peripheral device,
may load a very small number of fixed instructions into memory at a specific location, initialize at
least one CPU, and then point the CPU to the instructions and start their execution. These
instructions typically start an input operation from some peripheral device (which may be switch-
selectable by the operator). Other systems may send hardware commands directly to peripheral
devices or I/O controllers that cause an extremely simple input operation (such as "read sector zero
of the system device into memory starting at location 1000") to be carried out, effectively loading a
small number of boot loader instructions into memory; a completion signal from the I/O device may
then be used to start execution of the instructions by the CPU.

Smaller computers often use less flexible but more automatic boot loader mechanisms to ensure
that the computer starts quickly and with a predetermined software configuration. In many desktop
computers, for example, the bootstrapping process begins with the CPU executing software
contained in ROM (for example, the BIOS of an IBM PC) at a predefined address (some CPUs,
including the Intel x86 series are designed to execute this software after reset without outside help).
This software contains rudimentary functionality to search for devices eligible to participate in
booting, and load a small program from a special section (most commonly the boot sector) of the
most promising device, typically starting at a fixed entry point such as the start of the sector.

Boot loaders may face peculiar constraints, especially in size; for instance, on the IBM PC and
compatibles, a boot sector should typically work in only 32 KB[24] (later relaxed to 64 KB[25]) of
system memory and not use instructions not supported by the original 8088/8086 processors. The
first stage of boot loaders (FSBL, first-stage boot loader) located on fixed disks and removable drives
must fit into the first 446 bytes of the Master Boot Record in order to leave room for the default 64-
byte partition table with four partition entries and the two-byte boot signature, which the BIOS
requires for a proper boot loader — or even less, when additional features like more than four
partition entries (up to 16 with 16 bytes each), a disk signature (6 bytes), a disk timestamp (6 bytes),
an Advanced Active Partition (18 bytes) or special multi-boot loaders have to be supported as well in
some environments. In floppy and super floppy Volume Boot Records, up to 59 bytes are occupied
for the Extended BIOS Parameter Block on FAT12 and FAT16 volumes since DOS 4.0, whereas the

97
FAT32 EBPB introduced with DOS 7.1 requires even 71 bytes, leaving only 441 bytes for the boot
loader when assuming a sector size of 512 bytes. Microsoft boot sectors therefore traditionally
imposed certain restrictions on the boot process, for example, the boot file had to be located at a
fixed position in the root directory of the file system and stored as consecutive sectors, conditions
taken care of by the SYS command and slightly relaxed in later versions of DOS. The boot loader was
then able to load the first three sectors of the file into memory, which happened to contain another
embedded boot loader able to load the remainder of the file into memory. When they added LBA
and FAT32 support, they even switched to a two-sector boot loader using 386 instructions. At the
same time other vendors managed to squeeze much more functionality into a single boot sector
without relaxing the original constraints on the only minimal available memory and processor
support. For example, DR-DOS boot sectors are able to locate the boot file in the FAT12, FAT16 and
FAT32 file system, and load it into memory as a whole via CHS or LBA, even if the file is not stored in
a fixed location and in consecutive sectors.

Examples of first-stage bootloaders include coreboot, Libreboot and Das U-Boot.

Second-stage boot loader

Second-stage boot loaders, such as GNU GRUB, BOOTMGR, Syslinux, NTLDR or BootX, are not
themselves operating systems, but are able to load an operating system properly and transfer
execution to it; the operating system subsequently initializes itself and may load extra device drivers.
The second-stage boot loader does not need drivers for its own operation, but may instead use
generic storage access methods provided by system firmware such as the BIOS or Open Firmware,
though typically with restricted hardware functionality and lower performance.

Many boot loaders (like GNU GRUB, Windows's BOOTMGR, and Windows NT/2000/XP's NTLDR) can
be configured to give the user multiple booting choices. These choices can include different
operating systems (for dual or multi-booting from different partitions or drives), different versions of
the same operating system (in case a new version has unexpected problems), different operating
system loading options (e.g., booting into a rescue or safe mode), and some standalone programs
that can function without an operating system, such as memory testers (e.g., memtest86+), a basic
shell (as in GNU GRUB), or even games (see List of PC Booter games). Some boot loaders can also
load other boot loaders; for example, GRUB loads BOOTMGR instead of loading Windows directly.
Usually a default choice is preselected with a time delay during which a user can press a key to
change the choice; after this delay, the default choice is automatically run so normal booting can
occur without interaction.

The boot process can be considered complete when the computer is ready to interact with the user,
or the operating system is capable of running system programs or application programs. Typical
modern personal computers boot in about one minute, of which about 15 seconds are taken by a
power-on self-test (POST) and a preliminary boot loader, and the rest by loading the operating
system and other software. Time spent after the operating system loading can be considerably
shortened to as little as 3 seconds[28] by bringing the system up with all cores at once, as with
coreboot. Large servers may take several minutes to boot and start all their services.

Many embedded systems must boot immediately. For example, waiting a minute for a digital
television or a GPS navigation device to start is generally unacceptable. Therefore, such devices have
software systems in ROM or flash memory so the device can begin functioning immediately; little or
no loading is necessary, because the loading can be precomputed and stored on the ROM when the
device is made.

Large and complex systems may have boot procedures that proceed in multiple phases until finally
the operating system and other programs are loaded and ready to execute. Because operating
98
systems are designed as if they never start or stop, a boot loader might load the operating system,
configure itself as a mere process within that system, and then irrevocably transfer control to the
operating system. The boot loader then terminates normally as any other process would.

FAT & NTFS File System

Windows 95 OSR2, Windows 98, and Windows Me include an updated version of the FAT file
system. This updated version is called FAT32. The FAT32 file system allows for a default cluster size
as small as 4 KB, and includes support for EIDE hard disk sizes larger than 2 gigabytes (GB).

NOTE: Microsoft Windows NT 4.0 does not support the FAT32 file system.

For additional information about supported file systems in Windows NT 4.0, click the article number
below to view the article in the Microsoft Knowledge Base:
100108 Overview of FAT, HPFS, and NTFS File Systems

FAT32 Features

FAT32 provides the following enhancements over previous implementations of the FAT file system:

 FAT32 supports drives up to 2 terabytes in size.

NOTE: Microsoft Windows 2000 only supports FAT32 partitions up to a size of 32 GB.
 FAT32 uses space more efficiently. FAT32 uses smaller clusters (that is, 4-KB clusters for
drives up to 8 GB in size), resulting in 10 to 15 percent more efficient use of disk space
relative to large FAT or FAT16 drives.
 FAT32 is more robust. FAT32 can relocate the root folder and use the backup copy of the file
allocation table instead of the default copy. In addition, the boot record on FAT32 drives is
expanded to include a backup copy of critical data structures. Therefore, FAT32 drives are
less susceptible to a single point of failure than existing FAT16 drives.
 FAT32 is more flexible. The root folder on a FAT32 drive is an ordinary cluster chain, so it can
be located anywhere on the drive. The previous limitations on the number of root folder
entries no longer exist. In addition, file allocation table mirroring can be disabled, allowing a
copy of the file allocation table other than the first one to be active. These features allow for
dynamic resizing of FAT32 partitions. Note, however, that although the FAT32 design allows
for this capability, it will not be implemented by Microsoft in the initial release.

FAT32 Compatibility Considerations

To maintain the greatest compatibility possible with existing programs, networks, and device drivers,
FAT32 was implemented with as little change as possible to the existing Windows architecture,
internal data structures, Application Programming Interfaces (APIs), and on-disk format. However,
because 4 bytes are now required to store cluster values, many internal and on-disk data structures
and published APIs have been revised or expanded. In some cases, existing APIs will not work on
FAT32 drives. Most programs will be unaffected by these changes. Existing tools and drivers should
continue to work on FAT32 drives. However, MS-DOS block device drivers (for example, Aspidisk.sys)
and disk tools will need to be revised to support FAT32 drives.
99
All of the Microsoft bundled disk tools (Format, Fdisk, Defrag, and MS-DOS- based and Windows-
based ScanDisk) have been revised to work with FAT32. In addition, Microsoft is working with
leading device driver and disk tool manufacturers to support them in revising their products to
support FAT32.

NOTE: A FAT32 volume cannot be compressed by using Microsoft DriveSpace or DriveSpace 3.

FAT32 Performance
Converting to the FAT32 file system is one of the biggest performance enhancements you can make
to your Windows 98-based computer.

Dual-Boot Computers`
At this time, Windows 95 OSR2, Windows 98, Windows 2000, and Windows Me are the only
Microsoft operating systems that can access FAT32 volumes. MS-DOS, the original version of
Windows 95, and Windows NT 4.0 do not recognize FAT32 partitions, and are unable to boot from a
FAT32 volume. Also, FAT32 volumes cannot be accessed properly if the computer is started by using
another operating system (for example, a Windows 95 or MS-DOS boot disk).

Windows 95 OSR2 and Windows 98 can be started in Real mode (for example, to run a game) and
can use FAT32 volumes.

Creating FAT32 Drives

In Windows 95 OSR2, Windows 98, and Windows Me, if you run the Fdisk tool on a hard disk that is
over 512 megabytes (MB) in size, Fdisk prompts you whether or not to enable large disk support. If
you answer "Yes" (enabling large disk support), any partition you create that is larger than 512 MB is
marked as a FAT32 partition.

Windows 98 and Windows Me also includes a FAT32 conversion tool that you can use to convert an
existing drive to the FAT32 file system. To use the conversion tool, follow these steps:

1. Click Start, point to Programs, point to Accessories, point to System Tools, and then click
Drive Converter (FAT32).
2. Click Next.
3. Click the drive that you want to convert to the FAT32 file system, and then click Next.
4. Follow the instructions on the screen.

Support Boundaries

Microsoft will support the functionality of the FAT32 file system for error-free reading, and saving of
files either in Real mode or Protect mode. Microsoft supports the Real-mode and Protected-mode
tools that are included with Windows 95.

For legacy (older) programs that cannot be installed on a FAT32 volume, or do not properly save files
or read them, you must contact the manufacturer of the software package.

100
NOTE: Although the FAT32 file system supports hard disks up to 2 terabytes in size, some hard disks
may not be able to contain bootable partitions that are larger than 7.8 GB because of limitations in
your computer's basic input/output system (BIOS) INT13 interface. Please contact your hardware
manufacturer to determine if your computer's BIOS supports the updated INT13 extensions.

For additional information about FAT32, click the article number below to view the article in the
Microsoft Knowledge Base:

NTFS File System


NTFS (NT file system; sometimes New Technology File System) is the file system that the
Windows NT operating system uses for storing and retrieving files on a hard disk. NTFS is the
Windows NT equivalent of the Windows 95 file allocation table (FAT) and the OS/2 High
Performance File System (HPFS). However, NTFS offers a number of improvements over FAT and
HPFS in terms of performance, extendibility, and security.

Notable features of NTFS include:

 Use of a b-tree directory scheme to keep track of file clusters


 Information about a file's clusters and other data is stored with each cluster, not just a
governing table (as FAT is)
 Support for very large files (up to 2 to the 64th power or approximately 16 billion bytes in
size)
 An access control list (ACL) that lets a server administrator control who can access specific
files
 Integrated file compression
 Support for names based on Unicode
 Support for long file names as well as "8 by 3" names
 Data security on both removable and fixed disks

How NTFS Works

When a hard disk is formatted (initialized), it is divided into partitions or major divisions of the
total physical hard disk space. Within each partition, the operating system keeps track of all the
files that are stored by that operating system. Each file is actually stored on the hard disk in one
or more clusters or disk spaces of a predefined uniform size. Using NTFS, the sizes of clusters
range from 512 bytes to 64 kilobytes. Windows NT provides a recommended default cluster size
for any given drive size. For example, for a 4 GB (gigabyte) drive, the default cluster size is 4 KB
(kilobytes). Note that clusters are indivisible. Even the smallest file takes up one cluster and a 4.1
KB file takes up two clusters (or 8 KB) on a 4 KB cluster system.

The selection of the cluster size is a trade-off between efficient use of disk space and the
number of disk accesses required to access a file. In general, using NTFS, the larger the hard disk
the larger the default cluster size, since it's assumed that a system user will prefer to increase
performance (fewer disk accesses) at the expense of some amount of space inefficiency.

101
Validation, Forensic Acquisition

With the field of digital forensics growing at an almost warp-like speed, there are many issues out
there that can disrupt and discredit even the most experienced forensic examiner. One of the issues
that continues to be of utmost importance is the validation of the technology and software
associated with performing a digital forensic examination. The science of digital forensics is founded
on the principles of repeatable processes and quality evidence. Knowing how to design and properly
maintain a good validation process is a key requirement for any digital forensic examiner. This article
will attempt to outline the issues faced when drafting tool and software validations, the legal
standards that should be followed when drafting validations, and a quick overview of what should be
included in every validation.

Setting the Standard: Standards and Legal Baselines for Software/Tool Validation

According to the National Institute of Standards and Technology (NIST), test results must be
repeatable and reproducible to be considered admissible as electronic evidence. Digital forensics test
results are repeatable when the same results are obtained using the same methods in the same
testing environment. Digital forensics test results are reproducible when the same test results are
obtained using the same method in a different testing environment (different mobile phone, hard
drive, and so on). NIST specifically defines these terms as follows:

Repeatability refers to obtaining the same results when using the same method on identical test
items in the same laboratory by the same operator using the same equipment within short intervals
of time.

Reproducibility refers to obtaining the same results being obtained when using the same method on
identical test items in different laboratories with different operators utilizing different equipment.

In the legal community, the Daubert Standard can be used for guidance when drafting software/tool
validations. The Daubert Standard allows novel tests to be admitted in court, as long as certain
criteria are met. According to the ruling in Daubert v. Merrell Dow Pharmaceuticals Inc. the following
criteria were identified to determine the reliability of a particular scientific technique:

1. Has the method in question undergone empirical testing?


2. Has the method been subjected to peer review?
3. Does the method have any known or potential error rate?
4. Do standards exist for the control of the technique's operation?
5. Has the method received general acceptance in the relevant scientific community?

The Daubert Standard requires an independent judicial assessment of the reliability of the scientific
test or method. This reliability assessment, however, does not require, nor does it permit, explicit
identification of a relevant scientific community and an express determination of a particular degree
of acceptance within that community. Additionally, the Daubert Standard was quick to point out that
the fact that a theory or technique has not been subjected to peer review or has not been published
does not automatically render the tool/software inadmissible. The ruling recognizes that scientific
principles must be flexible and must be the product of reliable principles and methods. Although the
Daubert Standard was in no way directed toward digital forensics validations, the scientific baselines
and methods it suggests are a good starting point for drafting validation reports that will hold up in a
court of law and the digital forensics community.

The Scientific Method and Software/Tool Validations: A Perfect Fit


102
In the Daubert ruling, The Court defined scientific methodology as “the process of formulating
hypotheses and then conducting experiments to prove or falsify the hypothesis.” The Scientific
Method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or
correcting and integrating previous knowledge. To be termed scientific, the method must be based
on gathering, observing, or investigating, and showing measurable and repeatable results. Most of
the time, the scientific process starts with a simple question that leads to a hypothesis, which then
leads to experimentation, and an ultimate conclusion. To exemplify, if you are validating a particular
hardware write blocking device you may want to start with the simple question “Does this tool
successfully allow normal write-block operation to occur to source media?” Since it is assumed that
the write-blocking device supports various types of media (SATA, IDE, and so on) you may be
required to list the various requirements of the tool. Because if this, it is good practice for an
examiner to use the scientific method as a baseline for formulating digital forensic validations. It is
recommended that forensic examiners follow these four basic steps as a starting point for an
internal validation program:

1) Develop the Plan

Developing the scope of the plan may involve background and defining what the software or tool
should do in a detailed fashion. Developing the scope of the plan also involves creating a protocol for
testing by outlining the steps, tools, and requirements of such tools to be used during the test. This
may include evaluation of multiple test scenarios for the same software or tool. To illustrate, if
validating a particular forensic software imaging tool, that tool could be tested to determine
whether or not it successfully creates, hashes, and verifies a particular baseline image that has been
previously setup. There are several publically available resources and guides that can be useful in
establishing what a tool should do such as those available from NIST’s Computer Forensic Tool
Testing Project (CFTT) available from http://www.cftt.nist.gov. The CFTT also publishes detailed
validation reports on various types of forensic hardware and software ranging from mobile phones
to disk imaging tools. In addition to CFTT, Marshall University has published various software and
tool validation reports that are publically available for download from:

http://forensics.marshall.edu/Digital/Digital-Publications.html

These detailed reports can be used to get a feel for how your own internal protocol should be
drafted. The scope of the plan may also include items such as: tool version, testing manufacturer,
and how often the tests will be done. These factors should be established based on your
organization standards. Typically, technology within a lab setting is re-validated quarterly or
biannually at the very least.

2) Develop a Controlled Data Set

This area may be the longest and most difficult part of the validation process as it is the most
involved. This is because it involves setting-up specific devices and baseline images and then adding
data to the specific areas of the media or device. Acquisitions would then need to be performed and
documented after each addition to validate the primary baseline. This baseline may include a
dummy mobile phone, USB thumb drive, or hard drive depending on the software or hardware tool
you are testing. In addition to building your own baseline images, Brian Carrier has posted several
publically available disk images designed to test specific tool capabilities, such as the ability to
recover deleted files, find keywords, and process images. These data sets are documented and are
available at http://dftt.sourceforge.net. Once baseline images are created, tested, and validated it is
a good idea to document what is contained within these images. This will not only assist in future

103
validations, but may also be handy for internal competency and proficiency examinations for digital
examiners.

3) Conduct the Tests in a Controlled Environment

Outside all the recommendations and standards set forth by NIST and the legal community, it only
makes sense that a digital forensics examiner would perform an internal validation of the software
and tools being used in the laboratory. In some cases these validations are arbitrary and can occur
either in a controlled or uncontrolled environment. Since examiners are continuously bearing
enormous caseloads and work responsibilities, consistent and proper validations sometimes fall
through the cracks and are validated in a somewhat uncontrolled “on-the-fly” manner. It’s also a
common practice in digital forensics for examiners to “borrow” validations from other laboratories
and fail to validate their own software and tools. Be very careful with letting this happen. Keep in
mind that in order for digital forensics to be practicing true scientific principles, the processes used
must be proven to be repeatable and reproducible. In order for this to occur, the validation should
occur within a controlled environment within your laboratory with the tools that you will be using. If
the examiner uses a process, software, or even a tool that is haphazard or too varied from one
examination to the next, the science then becomes more of an arbitrary art. Simply put, validations
not only protect the integrity of the evidence, they may also protect your credibility. As stated
previously, using a repeatable, consistent, scientific method in drafting these validations is always
recommended.

4) Validate the Test Results against Known and Expected Results

At this point, testing is conducted against the requirements set forth for the software or tool in the
previous steps. Keep in mind that results generated through the experimentation and validation
stage must be repeatable. Validation should go beyond a simple surface scan when it comes to the
use of those technologies in a scientific process. With that said, it is recommended that each
requirement be tested at least three times. If there are any variables that may affect the outcome of
the validation (e.g. failure to write-block, software bugs) they should be determined after three test
runs. There may be cases, however, where more or fewer test runs may be required to generate
valid results.

It’s also important to realize that you are probably not the first to use and validate a particular
software or tool, so chances are that if you are experiencing inconsistent results, the community
may be experiencing the same results as well. Utilizing peer review may be a valuable asset when
performing these validations. Organizations such as the High Technology Crime Investigation
Association (HTCIA) and the International Association of Computer Investigative Specialists (IACIS)
maintain active member e-mail lists for members that can be leveraged for peer review. There are
also various lists and message boards pertaining to mobile phone forensics that can be quite helpful
when validating a new mobile technology. In addition, most forensic software vendors maintain
message boards for software, which can be used to research bugs or inconsistencies arising during
validation testing.

Sterilization & Write Blocking

Write blockers are devices that allow acquisition of information on a drive without creating the
possibility of accidentally damaging the drive contents. They do this by allowing read commands to
pass but by blocking write commands, hence their name.
104
There are two ways to build a write-blocker: the blocker can allow all commands to pass from the
computer to the drive except for those that are on a particular list. Alternatively, the blocker can
specifically block the write commands and let everything else through.

Write blockers may also include drive protection which will limit the speed of a drive attached to the
blocker. Drives that run at higher speed work harder (the head moves back and forth more often
due to read errors). This added protection could allow drives that cannot be read at high speed
(UDMA modes) to be read at the slower modes (PIO).

There are two types of write blockers, Native and Tailgate. A Native device uses the same interface
on for both in and out, for example a IDE to IDE write block. A Tailgate device uses one interface for
one side and a different one for the other, for example a Firewire to SATA write block.

Steve Bress and Mark Menz invented hard drive write blocking (US Patent 6,813,682).

There are both hardware and software write blockers. Some software write blockers are designed
for a specific operating system. One designed for Windows will not work on Linux. Most hardware
write blockers are software independent.

Hardware Write Blockers

Hardware write blockers can be either IDE-to-IDE or Firewire/USB-to-IDE. Simson prefers the IDE-to-
IDE because they deal better with errors on the drive and make it easier to access special
information that is only accessible over the IDE interface.

First Responder Process

Introduction to First Responder Procedures

The term first responder refers to a person who first arrives at a crime scene and accesses the
victim’s computer system once the incident has been reported. The first responder may be a
network administrator, law enforcement officer, or investigating officer. Generally, the first
responder is a person who comes from the forensic laboratory or from a particular agency for initial
investigation.

If a crime occurs that affects a company’s servers or individual workstations, the company first
contacts the forensic laboratory or agency for crime investigation. The laboratory or agency then
sends the first responder to the crime scene for initial investigation. The first responder is
responsible for protecting, integrating, and preserving the evidence obtained from the crime scene.
The first responder needs to have complete knowledge of computer forensic investigation
procedures. He or she preserves all evidence in a simple, protected, and forensically sound manner.
The first responder must investigate the crime scene in a lawful manner so that any obtained
evidence will be acceptable in a court of law.

Electronic Evidence

Electronic evidence is data relevant to an investigation that is transferred by or stored on an


electronic device. This type of evidence is found when data on any physical device is collected for
examination. Electronic evidence has the following properties:

 It may be hidden, similar to fingerprint evidence or DNA evidence.

105
 It can be broken, changed, damaged, or cracked by improper handling; therefore, particular
precautions must be taken to document, gather, safeguard, and examine these types of
evidences.
 It can expire after a period of time.

Sources of Electronic Evidence

Electronic information is usually stored on magnetic or optical storage devices, such as floppy disks,
flash drives, memory cards, backup tapes, CD-ROMs, and DVD-ROMs. Hard drives, including
removable drives and laptop drives, often contain significant information in hidden files. Computer
systems—in particular PCs and network servers in which electronic data are organized, stored,
deleted, and accessed—should not be ignored. All e-mail servers and their backup schedules are also
critical, and any Internet-related files should be obtained from Internet service providers or specific
network servers.

Role of the First Responder

As the first person to arrive at the crime scene, the first responder plays an important role in
computer forensic investigation. After all the evidence is collected from the crime scene, the
investigation process starts. If the evidence collected by the first responder is forensically sound, it is
significantly easier for the investigation team to find the actual cause of the crime.

The following are the main responsibilities of the first responder:

• Identifying the crime scene: After arriving at the crime scene, the first responder identifies the
scope of the crime scene and establishes a perimeter. The perimeter will include a particular area,
room, several rooms, or even an entire building, depending on whether the computers are
networked. The first responder should list the computer systems involved in the incident.

• Protecting the crime scene: Like any other case, a search warrant is required for the search and
seizure of digital and electronic evidence. Therefore, the first responder should protect all
computers and electronic devices while waiting for the officer in charge.

• Preserving temporary and fragile evidence: In the case of temporary and fragile evidence that
could change or disappear, such as screen information and running programs, the first responder
does not wait for the officer in charge. Rather, he or she takes immediate photographs of this
evidence.

Electronic Devices: Types and Collecting Potential Evidence

• Collecting all information about the incident: The first responder conducts preliminary interviews
of all persons present at the crime scene and asks questions about the incident.

• Documenting all findings: The first responder starts documenting all information about the
collected evidence in the chain of custody document. The chain of custody document contains
information such as case number, name and title of the individual from whom the report is received,
address and telephone number, location where the evidence is obtained, date and time when the
evidence is obtained, and a complete description of each item.

• Packaging and transporting the electronic evidence: After collecting the evidence, the first
responder labels all the evidence and places it in evidence storage bags, which protect it from
sunlight and extreme temperatures. These bags also block wireless signals so that wireless devices

106
cannot acquire data from the evidence. The storage bags are then transported to the forensic
laboratory.

Electronic Devices: Types and Collecting

Potential Evidence

The following are some of the types of electronic devices relevant to a crime scene:
Computer systems: A computer system generally consists of the central processing unit (CPU),
motherboard, memory, case, data storage devices, monitor, keyboard, and mouse. Digital evidence
is found in files that are stored on memory cards, hard drives, USB drives, other removable storage
devices, and media such as floppy disks, CDs, DVDs, cartridges, and tapes.

Hard drives: A hard drive is an electronic storage device that stores data magnetically.

Thumb drives: A thumb drive is a removable data storage device with a USB connection.

Memory cards: A memory card is a removable electronic storage device that is used in many devices
such as digital cameras, computers, and PDAs.

Smart cards, dongles, and biometric scanners: Evidence is found in the data on the card or inside
the devices themselves.

Answering machines: These store voice messages, time and date information, and when messages
were left. To find the evidence, an investigator should check the voice recordings for deleted
messages, most recent numbers called, messages, recorded phone numbers, and tapes or digital
recording data.

Digital cameras: To find the evidence, an investigator should check the stored images, removable
media, and time and date stamps of the images.

MP3 players: To find the evidence, an investigator should check the information stored on the
device.

Pagers: To find the evidence, an investigator should check the stored addresses, text messages, e-
mails, voice messages, and phone numbers.

Personal digital assistants: PDAs are handheld devices that have computing, telephone or fax,
paging, and networking features. To find the evidence, an investigator should check the address
book, meeting calendar, documents, and e-mails.

Printers: To find the evidence, an investigator should check the usage logs, time and date
information, and network identity information.

Removable storage devices (tapes, CDs, DVDs, and floppies): Evidence is found on the devices
themselves.

Telephones: To find the evidence, an investigator should check stored names, stored phone
numbers, and caller identification information.

Modems: Evidence is found on the devices themselves.

Scanners: Evidence is found in user usage logs and time and date stamps.

107
Copiers: Evidence is found in user texts, user usage logs, and time and date stamps.

Credit card skimmers: To find the evidence, an investigator should check the card expiration date,
user’s address, card numbers, and user’s name.
Fax machines: To find the evidence, an investigator should check the documents, phone numbers,
film cartridges, and sent and received logs.

First Responder Toolkit

The first responder has to create a toolkit before a cybercrime event happens and prior to any
potential evidence collection. Once a crime is reported, someone should immediately report to the
site and should not have to waste any time gathering materials.

The first responder toolkit is a set of tested tools designed to help in collecting genuine presentable
evidence. It helps the first responder understand the limitations and capabilities of electronic
evidence at the time of collection. The act of creating a toolkit makes the first responder familiar
with computer forensic tools and their functionalities.

The first responder has to select trusted computer forensic tools that provide output-specific
information and determine system dependencies. For example, any program running on the victim’s
computer generally uses common libraries for routine system commands. If the first responder
starts collecting evidence with the trusted tools, it will be easy to determine the system
dependencies.

Creating a First Responder Toolkit

Creating a first responder toolkit includes the following procedures:

1. Create a trusted forensic computer or test bed: This trusted forensic computer or test bed will be
used to test the functionality of the collected tools. Prior to testing any tool, the investigator should
make sure that this is a trusted resource.

To create a trusted forensic computer, follow these steps:

• Choose the operating system type. Create two different test bed machines: one for Windows and
one for Linux.

• Completely sanitize the forensic computer. This includes formatting the hard disk completely to
remove any data, using software such as BCWipe for Windows or Wipe for Linux.

• Install the operating system and required software from trusted resources. If the operating system
is downloaded, verify the hashes prior to installation.

• Update and patch the forensic computer.

• Install a file integrity monitor to test the integrity of the file system.

2. Document the details of the forensic computer: Documenting the forensic computer or test bed is
the second step in creating a first responder toolkit. It helps the forensic expert easily understand
the situation and the tools used, and will help to reproduce results if they come into question for any
reason. The forensic computer or test bed documentation should include the following:

108
• Version name and type of the operating system

• Names and types of the different software

• Names and types of the installed hardware

3. Document the summary of collected tools: The third step in creating a first responder toolkit is to
document the summary of the collected tools. This allows the first responder to become more
familiar with and understand the working of each tool. Information about the following should be
included while documenting the summary of tools:

• Acquisition of the tool

• Detailed description of the tool

• Working of the tool

• Tool dependencies and system effects

4. Test the tools: After documenting the summary of the collected tools, the investigator should test
them on the forensic computer or test bed to examine the performance and output. He or she
should examine the effects of each tool on the forensic computer. He or she should also monitor any
changes in the forensic computer caused by the tools.

Evidence-Collecting Tools and Equipment

The investigator should have general crime scene processing tools, such as the following:

• Cameras

• Notepads

• Sketchpads

• Evidence forms

• Crime scene tape

• Markers

The following are some of the tools and equipment used to collect the evidence:

• Documentation tools:

• Cable tags

• Indelible felt-tip markers

• Stick-on labels

• Disassembly and removal tools:

• Flat-head and Phillips-head screwdrivers

109
• Hex-nut drivers

• Needlenose pliers

• Secure-bit drivers

• Small tweezers

• Specialized screwdrivers

• Standard pliers

• Star-type nut drivers

• Wire cutters

• Package and transport supplies:

• Antistatic bags

• Antistatic bubble wrap

• Cable ties

• Evidence bags

• Evidence tape

• Packing materials

• Sturdy boxes of various sizes

• Other tools:

• Gloves

• Hand truck

• Magnifying glass

• Printer paper

• Seizure disk

• Unused floppy disks

• Notebook computers:

• Licensed software

• Bootable CDs

110
• External hard drives

• Network cables

• Software tools:

• DIBS Mobile Forensic Workstation

• AccessData’s Ultimate Toolkit

• Teel Technologies SIM Tools

• Hardware tools:
• Paraben forensic hardware

• Digital Intelligence forensic hardware

• Tableau Hardware Accelerator

• WiebeTech forensic hardware tools

• Logicube forensic hardware tools

First Response Basics

The following are some first response basics:

• Under no circumstances should anyone except qualified forensic analysts make any attempts to
collect or recover data from any computer system or device that holds electronic information.

• Any information present inside the collected electronic devices is potential evidence and should be
treated accordingly.

• Any attempts to recover data by untrained persons could either compromise the integrity of the
files or result in the files being inadmissible in administrative or legal proceedings.

• The workplace or office must be secured and protected to maintain the integrity of the crime
scene and the electronic storage media.

Incident Response: Different Situations

The first response to an incident may involve one of three different groups of people, each of which
will have different tasks based on the circumstance of the incident. The three groups are as follows:

• System administrators
• Local managers or other non-forensic staff
• Laboratory forensic staff

First Response by System Administrators

The system administrator plays an important role in ensuring network protection and maintenance,
as well as playing a vital role in the investigation. Once a system administrator discovers an incident,
it must be reported according to the current organizational incident reporting procedures. The

111
system administrator should not touch the system unless directed to do so by either the
incident/duty manager or one of the forensic analysts assigned to the case.

First Response by Non-Forensic Staff

Non-forensic staff members are responsible for securing the crime scene and making sure that it is
retained in a secure state until the forensic team advises otherwise. They should also make notes
about the scene and those present to hand over to the attending forensic team. The surrounding
area of the suspect computer should be secured, not just the computer itself.

First Response by Laboratory Forensic Staff

The first response by laboratory forensic staff involves the following six stages:

1. Securing and evaluating the electronic crime scene: This ensures that all personnel are removed
from the crime scene area. At this point in the investigation, the states of any electronic devices are
not altered.
• Search warrant for search and seizure
• Plan the search and seizure

Incident Response: Different Situations


• Conduct the initial search of the scene
• Health and safety issues

2. Conducting preliminary interviews: All personnel, subjects, or any others at the crime scene are
identified. Their position at the time of entry and their reasons for being at the crime scene are
recorded.

• Ask questions
• Check consent issues
• Witness signatures
• Initial interviews

3. Documenting the electronic crime scene: Documentation of the electronic crime scene is a
continuous process during the investigation, creating a permanent record of the scene.

• Photograph the scene


• Sketch the scene

4. Collecting and preserving the electronic evidence: Electronic evidence is volatile in nature and
easily broken, so particular precautions must be taken to prevent damage.

• Collect evidence
• Deal with powered-off or powered-on computers at the time of seizure
• Seize portable computers
• Preserve the electronic evidence

5. Packaging the electronic evidence: All evidence should be well documented, and all containers
should be properly labeled and numbered.

112
6. Transporting the electronic evidence: Special precautions must be taken while transporting
electronic evidence. Make sure that proper transportation procedures are followed to avoid physical
damage.

• Ensure proper handling and transportation to the forensic laboratory


• Ensure the chain of custody is strictly followed

Securing and Evaluating the Electronic Crime Scene

The following checklist should be followed when securing and evaluating an electronic crime scene:

• Follow the policies of the legal authority for securing the crime scene.

• Verify the type of incident.

• Make sure that the scene is safe for the responders.

• Isolate other persons who are present at the scene.

• Locate and help the victim.

• Verify any data that is related to the offense.

• Transmit additional flash messages to other responding units.

• Request additional help at the scene if needed.

• Establish a security perimeter to see if the offenders are still present at the crime scene area.

• Protect evidence that is at risk of being easily lost.

• Protect perishable data such as pagers and caller ID boxes.

• Make sure that the devices that contain perishable data are secured, documented, and
photographed.

• Find the telephone lines that are connected to devices such as modems and caller ID boxes.

• Document, disconnect, and label telephone lines and network cables.

• Observe the present situation at the scene and record observations.

• Protect physical evidence or hidden fingerprints that may be found on keyboards, mice, diskettes,
and CDs.

Warrant for Search and Seizure

The investigating officer or first responder must perform the investigation process in a lawful
manner, which means a search warrant is required for search and seizure. The following are the two
types of relevant search warrants:

113
• Electronic storage device search warrant: This allows for search and seizure of computer
components such as the following:

• Hardware

• Software

• Storage devices

• Documentation

• Service provider search warrant: If the crime is committed through the Internet, the first
responder needs information about the victim’s computer from the service provider. A service
provider search warrant allows the first responder to get this information. The first responder can
get the following information from the service provider:

• Service records

• Billing records

• Subscriber information

Planning the Search and Seizure

A search and seizure plan should contain the following details:

• Description of the incident

• Incident manager

• Case name or title for the incident

• Location of the incident

• Applicable jurisdiction and relevant legislation

• Location of the equipment to be seized:

• Structure’s type and size

• Where the computers are located

• Who was present at the incident

• Whether the location is potentially dangerous


• Details of what is to be seized (make, model, location, ID, etc.):

• Types

• Serial numbers

• If the seized computers were running or powered down

114
• Whether the computers were networked, and if so, what type of network, where data is stored on
The network, where the backups are held, if the system administrator is cooperative, if it is
necessary to take the server down, and the business impact of this action

• Other work to be performed at the scene (e.g., full search and evidence required)

• Search and seizure type (overt/covert)

• Local management involvement

Initial Search of the Scene

Once the forensic team has arrived at the scene and unloaded their equipment, they will move to
the location of the incident and try to identify any evidence. A perpetrator may attempt to use a
self-destruct program or reformat the storage media upon the arrival of the team. If a suspected
perpetrator is using the system, an investigator should pull the power cord immediately.

An investigator should isolate the computer system (whether it is a workstation, a standalone, or


network server) or other forms of media so that digital evidence will not be lost. In many cases
computer systems are backed up on a regular basis. If attackers erase files from the primary storage
device, these files may still remain on the backup storage media.

Health and Safety Issues

In order to protect the staff and preserve evidence such as fingerprints, investigators should follow
these health and safety precautions:

• All elements of an agency’s health and safety plan should be clearly documented.

• Health and safety considerations should be followed at all stages of the investigation by
everyone involved.

• The health and safety program should be frequently monitored and documented by designated
agency representatives.
• All forensic teams should wear protective latex gloves for all search and seizure on-site operations.

Conducting Preliminary Interviews

Questions to Ask When a Client Calls the Forensic Investigator

When a client first calls the investigator, the investigator should ask the following questions:
• What happened?

• Who is the incident manager?

• What is the case name or title for the incident?

• What is the location of the incident?

• Under what jurisdiction are the case and seizure to be performed?

115
• What is to be seized (make, model, location, and ID)?

• What other work will need to be performed at the scene (e.g., full search and evidence required)?

• Is the search and seizure to be overt or covert, and will local management be informed?

Consent

A properly worded banner displayed at login and an acceptable-use policy informing users of
monitoring activities and how any collected information will be used will satisfy the consent burden
in the majority of cases. There are instances when the user is present and consent from the user is
required. It should never be taken as generally acceptable for system administrators to conduct
unplanned and random monitoring activities. In cases such as this, appropriate forms for the
jurisdiction should be used and must be carried in the first responder toolkit. Monitoring activities
should be a part of a well-documented procedure that is clearly detailed in the obtained consent.

Witness Signatures

Depending on the legislation in the jurisdiction, a signature (or two) may or may not be required to
certify collection of evidence. Typically, one witness signature is required if it is the forensic analyst
or law enforcement officer performing the seizure. Where two are required, guidance should be
sought to determine who the second signatory should be.

The witness signature verifies that the information in the consent form and other written documents
was
correctly explained to, and supposedly understood by, the signatory or the signatory’s legally
authorized representative, and that informed consent was given freely. Whoever signs as a witness
must have a clear understanding of that role and may be called upon to provide a witness statement
or attend court proceedings.

Conducting Preliminary Interviews

When preparing a case, computer forensic professionals (CFPs) start their investigation by collecting
evidence and conducting preliminary interviews. As a part of their preliminary investigation, they
talk to everyone present at the site at the time of the offense. After identifying the persons present
at the time of the crime, the CFPs conduct individual interviews and note everyone’s physical
position and his or her reason for being there.

As part of the investigation process, the CFP first determines whether the suspect has committed a
crime
or has violated any departmental policies. Usually, departments establish certain policies regarding
the usage of computers. Adhering to departmental policies and applicable laws, the CFP gathers
evidence and collects information from individuals, such as the following:

• Actual holders or users of any electronic devices present at the crime scene

• Usernames and Internet service providers

• Passwords required to access the system, software, or data

• Purpose of using the system

• Unique security schemes or destructive devices

116
• Any off-site data storage

• Hardware and software documents

If the evidence the CFP gathers suggests that the suspect has committed a crime, the evidence will
be presented in court. If the evidence suggests that the suspect has breached company policy, the
CFP will hand over the evidence at the corporate inquiry.

A CFP should keep the following points in mind during preliminary interviews:

• Identify all persons present.

• If the suspect is present at the time of the search and seizure, the incident manager or the
laboratory manager may consider asking some questions. However, they must comply with the
relevant human resources or legislative guidelines for their jurisdiction.

• During an initial interview suspects are often taken off guard, having been given little time to
create a false story. This means that they will often answer questions such as, “What are the
passwords for the account?” truthfully.

• If the system administrator is present at the time of the initial interview, he or she may help
provide important information such as how many systems are involved, who is associated with a
particular account, and what the relevant passwords are.

• A person having physical custody of evidence is responsible for the safety and security of that
evidence.

• Whenever possible, evidence must be secured in such a way that only a person with complete
authority is allowed access. Typical questions could include the following:

• Are there any physical keys to the system?

• What are the users’ IDs and passwords?

• What e-mail addresses are in use? What are the users’ IDs and passwords for them?

Documenting the Electronic Crime Scene


Documentation of the electronic crime scene is a continuous process during the investigation that
makes a permanent record of the scene. When documenting, an investigator should keep the
following points in mind:

• It is essential to properly take note of the physical location and states of computers, digital storage
media, and other electronic devices.

• Document the physical crime scene, noting the position of the mouse and the location of elements
found near the system.

• Document details of any related or difficult-to-find electronic components.

• Record the state of computer systems, digital storage media, and electronic devices, including the
power status of the computer.

117
• Take a photograph of the computer monitor’s screen and note what was on the screen.

• The crime scene should be documented in detail and comprehensively at the time of the
investigation.

Photographing the Scene

On arrival, the first step taken by the forensic team should be to photograph the scene. It is very
important that this be done in a way that will not alter or damage the scene, and everything should
be clearly visible.

The best course of action is to take various photographs of the crime scene. For example, an
investigator should first take a photograph of the building and/or office number. This should be
followed by an entry photograph (what is seen as one enters the crime scene) and then by a series
of 360-degree photographs. These are overlapping photographs depicting the entire crime scene. It
is important to proceed all the way from the entire scene down to the smallest piece of evidence.
Crime scene photographs should be taken of the work area, including things such as computer disks,
handwritten documents, and other components of the system. Photos should also be taken of the
back of the computer system to accurately show how cables are linked. If this cannot be done on-
site, then all cables must be labeled so the computer system can be reconnected at the forensic
laboratory and photographed.

Sketching the Scene

After securing the scene, the CFP has to prepare a sketch of the crime scene. This sketch should
include all details about the objects present and their locations within the office area. As with
photographs, forensic professionals prepare many sketches of the complete scene, all the way down
to smallest piece of evidence.

Collecting and Preserving Electronic Evidence

When an incident is reported in which a computer is thought to have played a part, that computer
can Incorrectly be the first and only item seized. The crime scene should be investigated in a way
that covers the entire area, with the computer being at the middle of the circle.

All collected evidence should be marked clearly so that it can be easily identified later. Pieces of
evidence found at the crime scene should be first photographed, identified within documents, and
then properly gathered.

Markings on the evidence should, at the very least, include date and time of collection and the
initials of the collecting person. Evidence should be identified, recorded, seized, bagged, and tagged
on-site, with no attempts to determine contents or status.

Order of Volatility

Volatility is the measure of how perishable electronically stored data are. When collecting evidence,
the order of collection should proceed from the most volatile to the least volatile. The following list
is the order of volatility for a typical system, beginning with the most volatile:

1. Registers and cache


2. Routing table, process table, kernel statistics, and memory
3. Temporary file systems
4. Disks or other storage media

118
5. Remote logging and monitoring data that is related or significant to the system in question
6. Physical configuration and network topology
7. Archival media

Dealing with Powered-Off Computers

At this point in the investigation, an investigator should not change the state of any electronic
devices or equipment. If it is switched off, the investigator should leave it off and take it into
evidence.

Dealing with Powered-On Computers

When dealing with a powered-on computer, the investigator should stop and think before taking any
action. The contents of RAM may contain vital information. For example, data that is encrypted on
the hard disk may be unencrypted in RAM. Also, running process information is stored in RAM. All of
this vital information will be lost when the computer is shut down or when the power supply is
removed.

If a computer is switched on and the screen is viewable, the investigator should photograph the
screen and document the running programs. If a computer is on and the monitor shows a
screensaver, the investigator should move the mouse slowly without pressing any mouse button,
and then photograph and document the programs.

Dealing with a Networked Computer

If the victim’s computer is connected to the Internet, the first responder must follow this procedure
in order to protect the evidence:

• Unplug the network cable from the router and modem in order to prevent further attacks.

• Do not use the computer for the evidence search because it may alter or change the integrity of
existing evidence.

• Photograph all devices connected to the victim’s computer, particularly the router and modem,
from several angles. If any devices, such as a printer or scanner, are present near the computer, take
photographs of those devices as well.

• If a screensaver is visible, move the mouse slowly.

• If the computer is on, take a photograph of the screen and document any running programs.

• Unplug all cords and devices connected to the computer and label them for later identification.

• Unplug the main power cord from the wall socket.

• Pack the collected electronic evidence properly and place it in a static-free bag.

• Keep the collected evidence away from magnets, high temperatures, radio transmitters, and other
elements that may damage the integrity of the evidence.

• Document all steps involved in searching and seizing the victim’s computer for later investigation.

119
Dealing with Open Files and Startup Files

When malware attacks a computer system, some files are created in the startup folder to run the
malware program. The first responder can get vital information from these files by following this
procedure:

• Open any recently created documents from the startup or system32 folder in Windows and the
rc.local file in Linux.

• Document the date and time of the files.

• Examine the open files for sensitive data such as passwords or images.

• Search for unusual MAC (modified, accessed, or changed) times on vital folders and startup files.

• Use the dir command for Windows or the ls command for Linux to locate the actual access times
on the files and folders.

Operating System Shutdown Procedure

It is important to shut down the system in a manner that will not damage the integrity of any files.
Different operating systems have different shutdown procedures. Some operating systems can be
shut down by simply unplugging the power cord from the wall socket, while others have a more
elaborate shutdown procedure that must be followed, as detailed below:

MS-DOS/Windows 3.x/Windows 9x, Windows NT, Windows XP, Windows Vista, Windows 7:

• Take a photograph of the screen.

• Document any running programs.

• Unplug the power cord from the wall socket.

UNIX/Linux:

• Right-click on Menu and click Console.

• If root user is logged in, enter the password and type sync;sync;halt to shut down the system.

• If the root user is not logged in and the password is available, type su to switch to the root user,
enter the password, and type sync;sync;halt to shut down the system.

• If password is not available, unplug the power cord from the wall socket.

Mac OS:

• Record the time from the menu bar.


• Click Special and then Shut Down.
• Unplug the power cord from the wall socket.

Preserving Electronic Evidence

120
The following are the steps that should be taken to preserve electronic evidence:

• Document the actions and changes observed in the monitor, system, printer, and other
electronic devices.

• Verify whether the monitor is on, off, or in sleep mode.

• Remove the power cable if the device is off. Do not turn the device on.

• Take a photo of the monitor screen if the device is on.

• Check dial-up, cable, ISDN, and DSL connections.

• Remove the power cord from the router or modem.

• Remove any floppy disks that are available at the scene to safeguard the potential evidence.

• Keep tape on drive slots and power connectors.

• Photograph the connections between the computer system and related cables, and label them
individually.

• Label every connector and cable connected to peripheral devices.

Seizing Portable Computers

• Photograph the computer and connected equipment.

• Record which cables are connected to which ports.

• Photograph the connectors at the back of the computer and individually label them.

• Remove the battery.

Packaging and Transporting Electronic Evidence

Evidence Bag Contents List

The panel on the front of evidence bags must, at the very least, contain the following details:

• Date and time of seizure

• Investigator who seized the evidence

• Names of the officers who took photographs or prepared a sketch

• Exhibit number

• Where the evidence was seized from

• Sites where individual items were found

121
• Names of the suspected persons

• A short summary of the details of the seizure

• Details of the contents of the evidence bag

Packaging Electronic Evidence

Investigators should keep these items in mind when packaging electronic evidence:

• Make sure the gathered electronic evidence is correctly documented, labeled, and listed
before packaging.

• Pay special attention to hidden or trace evidence, and take the necessary actions to safeguard it.

• Pack magnetic media in antistatic packaging.

• Do not use materials such as plastic bags for packaging because they may produce static electricity.

• Avoid folding and scratching storage devices such as diskettes, CD-ROMs, and tapes.

• Make sure that all containers that contain evidence are labeled in the appropriate way.

Chain of Custody

The CFP must follow the correct chain of custody when documenting a case. The chain of custody is
a written description created by individuals who are responsible for the evidence from the beginning
until the end of the case. The chain of custody form is easy to use. The individual who takes
ownership of a piece of evidence has the responsibility to safeguard and preserve it so that it can be
later used for legal inquiry.

Chain of Custody Documentation

A chain of custody document contains the following information about the obtained evidence:

• Case number

• Name, title, address, and telephone number of the person from whom the evidence was received

• Location where obtained

• Reason for evidence being obtained

• Date/time evidence was obtained

• Item number/quantity/description

• Name of the evidence

122
• Color

• Manufacturing company name

• Marking information

• Packaging information

First Responder Common Mistakes

Often, when a computer crime incident occurs, the system or network administrator assumes the
role of the first responder at the crime scene. The system or network administrator might not know
the standard first responder procedure or have a complete knowledge of forensic investigation, so
he or she might make the following common mistakes:

• Shutting down or rebooting the victim’s computer. In this case, all volatile data is lost. The
processes that are running on the victim’s computer are also lost.

• Assuming that some components of the victim’s computer may be reliable and usable. In this case,
using some commands on the victim’s computer may activate Trojans, malware, and time bombs
that delete vital data.

• Not having access to baseline documentation about the victim’s computer.

• Not documenting the data collection process.

123
Chapter 7: Emerging Cyber Concept

Cloud Computing

Simply put, cloud computing is the delivery of computing services—servers, storage, databases,
networking, software, analytics and more—over the Internet (“the cloud”). Companies offering
these computing services are called cloud providers and typically charge for cloud computing
services based on usage, similar to how you are billed for water or electricity at home.

Advantages

Cloud computing is a big shift from the traditional way businesses think about IT resources. What is
it about cloud computing? Why is cloud computing so popular? Here are 6 common reasons
organizations are turning to cloud computing services:

1. Cost

Cloud computing eliminates the capital expense of buying hardware and software and setting up and
running on-site datacenters—the racks of servers, the round-the-clock electricity for power and
cooling, the IT experts for managing the infrastructure. It adds up fast.

2. Speed

Most cloud computing services are provided self service and on demand, so even vast amounts of
computing resources can be provisioned in minutes, typically with just a few mouse clicks, giving
businesses a lot of flexibility and taking the pressure off capacity planning.

3. Global scale

The benefits of cloud computing services include the ability to scale elastically. In cloud speak, that
means delivering the right amount of IT resources—for example, more or less computing power,
storage, bandwidth—right when its needed and from the right geographic location.

4. Productivity

On-site datacenters typically require a lot of “racking and stacking”—hardware set up, software
patching and other time-consuming IT management chores. Cloud computing removes the need for
many of these tasks, so IT teams can spend time on achieving more important business goals.

5. Performance

The biggest cloud computing services run on a worldwide network of secure datacenters, which are
regularly upgraded to the latest generation of fast and efficient computing hardware. This offers
several benefits over a single corporate datacenter, including reduced network latency for
applications and greater economies of scale.

6. Reliability

124
Cloud computing makes data backup, disaster recovery and business continuity easier and less
expensive, because data can be mirrored at multiple redundant sites on the cloud provider’s
network.

Types of cloud services: IaaS, PaaS, SaaS

Most cloud computing services fall into three broad categories: infrastructure as a service (IaaS),
platform as a service (PaaS) and software as a service (Saas). These are sometimes called the cloud
computing stack, because they build on top of one another. Knowing what they are and how they
are different makes it easier to accomplish your business goals.

Infrastructure-as-a-service (IaaS)

The most basic category of cloud computing services. With IaaS, you rent IT infrastructure—servers
and virtual machines (VMs), storage, networks, operating systems—from a cloud provider on a pay-
as-you-go basis.

Platform as a service (PaaS)

Platform-as-a-service (PaaS) refers to cloud computing services that supply an on-demand


environment for developing, testing, delivering and managing software applications. PaaS is
designed to make it easier for developers to quickly create web or mobile apps, without worrying
about setting up or managing the underlying infrastructure of servers, storage, network and
databases needed for development.

Software as a service (SaaS)

Software-as-a-service (SaaS) is a method for delivering software applications over the Internet, on
demand and typically on a subscription basis. With SaaS, cloud providers host and manage the
software application and underlying infrastructure and handle any maintenance, like software
upgrades and security patching. Users connect to the application over the Internet, usually with a
web browser on their phone, tablet or PC.

Types of cloud deployments: public, private, hybrid

Not all clouds are the same. There are three different ways to deploy cloud computing resources:
public cloud, private cloud and hybrid cloud.

Public cloud

Public clouds are owned and operated by a third-party cloud service provider, which deliver their
computing resources like servers and storage over the Internet. Microsoft Azure is an example of a
public cloud. With a public cloud, all hardware, software and other supporting infrastructure is
owned and managed by the cloud provider. You access these services and manage your account
using a web browser.

Private cloud

A private cloud refers to cloud computing resources used exclusively by a single business or
organization. A private cloud can be physically located on the company’s on-site datacenter. Some
125
companies also pay third-party service providers to host their private cloud. A private cloud is one in
which the services and infrastructure are maintained on a private network.

Hybrid cloud

Hybrid clouds combine public and private clouds, bound together by technology that allows data and
applications to be shared between them. By allowing data and applications to move between private
and public clouds, hybrid cloud gives businesses greater flexibility and more deployment options.

How cloud computing works

Cloud computing services all work a little differently, depending on the provider. But many provide a
friendly, browser-based dashboard that makes it easier for IT professionals and developers to order
resources and manage their accounts. Some cloud computing services are also designed to work
with REST APIs and a command-line interface (CLI), giving developers multiple options.

Solid State Devices, Flash Memory

Flash memory is a solid-state chip that maintains stored data without any external power
source. It is commonly used in portable electronics and removable storage devices, and to
replace computer hard drives

In computer lingo, there's a difference between memory and storage. Random-access memory, or
RAM (or simply memory), holds the program a computer is executing, as well as any data. Like a
person's short-term memory, RAM is fleeting and requires power to do its job. Storage, on the other
hand, holds all the stuff of your digital life -- apps, files, photos and music. It retains that stuff even if
the power is switched off. Both RAM and storage boast their capacity based on the number of bytes
they can hold. For a modern computer, RAM typically comes in 4, 6 or 8 gigabytes. Storage can have
almost 100 times more capacity -- the hard drive of a typical laptop, for example, can hold 500
gigabytes.

Here's where it gets a little sticky. Some storage devices have what's referred to as flash memory, a
confusing term that blurs the line between RAM and storage. Devices with flash memory still hold
lots of info, and they do it whether the power's on or not. But unlike hard drives, which contain
spinning platters and turntable-like arms bearing read-write heads, flash-memory devices have no
mechanical parts. They're built from transistors and other components you'd find on a computer
chip. As a result, they enjoy a label -- solid state -- reserved for devices that take advantage of
semiconductor properties.

Here are a few examples of flash memory:

 Your computer's BIOS chip


 CompactFlash (most often found in digital cameras)
 SmartMedia (most often found in digital cameras)
 Memory Stick (most often found in digital cameras)
 PCMCIA Type I and Type II memory cards (used as solid-state disks in laptops)
 Memory cards for video game consoles

126
Flash memory is a type of EEPROM chip, which stands for Electronically Erasable Programmable
Read Only Memory. It has a grid of columns and rows with a cell that has two transistors at each
intersection.

The two transistors are separated from each other by a thin oxide layer. One of the transistors is
known as a floating gate, and the other one is the control gate. The floating gate's only link to
the row, or word line, is through the control gate. As long as this link is in place, the cell has a
value of 1. To change the value to a 0 requires a curious process called Fowler-Nordheim
tunneling.

Raid Configurations

RAID

RAID is a technology that is used to increase the performance and/or reliability of data storage. The
abbreviation stands for Redundant Array of Inexpensive Disks. A RAID system consists of two or
more drives working in parallel. These disks can be hard discs, but there is a trend to also use the
technology for SSD (solid state drives). There are different RAID levels, each optimized for a specific
situation. These are not standardized by an industry group or standardization committee. This
explains why companies sometimes come up with their own unique numbers and implementations.
This article covers the following RAID levels:

 RAID 0 – striping
 RAID 1 – mirroring
 RAID 5 – striping with parity
 RAID 6 – striping with double parity
 RAID 10 – combining mirroring and striping

The software to perform the RAID-functionality and control the drives can either be located on a
separate controller card (a hardware RAID controller) or it can simply be a driver. Some versions of
Windows, such as Windows Server 2012 as well as Mac OS X, include software RAID functionality.
Hardware RAID controllers cost more than pure software, but they also offer better performance,
especially with RAID 5 and 6.

RAID-systems can be used with a number of interfaces, including SCSI, IDE, SATA or FC (fiber
channel.) There are systems that use SATA disks internally, but that have a FireWire or SCSI-interface
for the host system.

Sometimes disks in a storage system are defined as JBOD, which stands for ‘Just a Bunch Of Disks’.
This means that those disks do not use a specific RAID level and acts as stand-alone disks. This is
often done for drives that contain swap files or spooling data.

Below is an overview of the most popular RAID levels:

RAID level 0 – Striping

In a RAID 0 system data are split up into blocks that get written across all the drives in the array. By
using multiple disks (at least 2) at the same time, this offers superior I/O performance. This
performance can be enhanced further by using multiple controllers, ideally one controller per disk.

Advantages
127
 RAID 0 offers great performance, both in read and write operations. There is no overhead
caused by parity controls.
 All storage capacity is used, there is no overhead.
 The technology is easy to implement.

Disadvantages

 RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It should
not be used for mission-critical systems.

Ideal use

RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, such as
on an image retouching or video editing station.

If you want to use RAID 0 purely to combine the storage capacity of twee drives in a single volume,
consider mounting one drive in the folder path of the other drive. This is supported in Linux, OS X as
well as Windows and has the advantage that a single drive failure has no impact on the data of the
second disk or SSD drive.

RAID level 1 – Mirroring

Data are stored twice by writing them to both the data drive (or set of data drives) and a mirror
drive (or set of drives). If a drive fails, the controller uses either the data drive or the mirror drive for
data recovery and continues operation. You need at least 2 drives for a RAID 1 array.

Advantages

 RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single
drive.
 In case a drive fails, data do not have to be rebuild, they just have to be copied to the
replacement drive.
 RAID 1 is a very simple technology.

Disadvantages

 The main disadvantage is that the effective storage capacity is only half of the total drive
capacity because all data get written twice.
 Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means the
failed drive can only be replaced after powering down the computer it is attached to. For
servers that are used simultaneously by many people, this may not be acceptable. Such
systems typically use hardware controllers that do support hot swapping.

Ideal use

RAID-1 is ideal for mission critical storage, for instance for accounting systems. It is also suitable for
small servers in which only two data drives will be used.

RAID level 5

128
RAID 5 is the most common secure RAID level. It requires at least 3 drives but can work with up to
16. Data blocks are striped across the drives and on one drive a parity checksum of all the block data
is written. The parity data are not written to a fixed drive, they are spread across all drives, as the
drawing below shows. Using the parity data, the computer can recalculate the data of one of the
other data blocks, should that data no longer be available. That means a RAID 5 array can withstand
a single drive failure without losing data or access to data. Although RAID 5 can be achieved in
software, a hardware controller is recommended. Often extra cache memory is used on these
controllers to improve the write performance.

Advantages

 Read data transactions are very fast while write data transactions are somewhat slower (due
to the parity that has to be calculated).
 If a drive fails, you still have access to all data, even while the failed drive is being replaced
and the storage controller rebuilds the data on the new drive.

Disadvantages

 Drive failures have an effect on throughput, although this is still acceptable.


 This is complex technology. If one of the disks in an array using 4TB disks fails and is
replaced, restoring the data (the rebuild time) may take a day or longer, depending on the
load on the array and the speed of the controller. If another disk goes bad during that time,
data are lost forever.

Ideal use

RAID 5 is a good all-round system that combines efficient storage with excellent security and decent
performance. It is ideal for file and application servers that have a limited number of data drives.

RAID level 6 – Striping with double parity

RAID 6 is like RAID 5, but the parity data are written to two drives. That means it requires at least 4
drives and can withstand 2 drives dying simultaneously. The chances that two drives break down at
exactly the same moment are of course very small. However, if a drive in a RAID 5 systems dies and
is replaced by a new drive, it takes hours or even more than a day to rebuild the swapped drive. If
another drive dies during that time, you still lose all of your data. With RAID 6, the RAID array will
even survive that second failure.

Advantages

 Like with RAID 5, read data transactions are very fast.


 If two drives fail, you still have access to all data, even while the failed drives are being
replaced. So RAID 6 is more secure than RAID 5.

Disadvantages

 Write data transactions are slower than RAID 5 due to the additional parity data that have to
be calculated. In one report I read the write performance was 20% lower.
 Drive failures have an effect on throughput, although this is still acceptable.
 This is complex technology. Rebuilding an array in which one drive failed can take a long
time.

Ideal use
129
RAID 6 is a good all-round system that combines efficient storage with excellent security and decent
performance. It is preferable over RAID 5 in file and application servers that use many large drives
for data storage.

RAID level 10 – combining RAID 1 & RAID 0

It is possible to combine the advantages (and disadvantages) of RAID 0 and RAID 1 in one single
system. This is a nested or hybrid RAID configuration. It provides security by mirroring all data on
secondary drives while using striping across each set of drives to speed up data transfers.

Advantages

 If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild time is
very fast since all that is needed is copying all the data from the surviving mirror to a new
drive. This can take as little as 30 minutes for drives of 1 TB.

Disadvantages

 Half of the storage capacity goes to mirroring, so compared to large RAID 5 or RAID 6 arrays,
this is an expensive way to have redundancy.

What about RAID levels 2, 3, 4 and 7?

These levels do exist but are not that common (RAID 3 is essentially like RAID 5 but with the parity
data always written to the same drive). This is just a simple introduction to RAID-systems.

RAID is no substitute for back-up!

All RAID levels except RAID 0 offer protection from a single drive failure. A RAID 6 system even
survives 2 disks dying simultaneously. For complete security, you do still need to back-up the data
from a RAID system.

 That back-up will come in handy if all drives fail simultaneously because of a power spike.
 It is a safeguard when the storage system gets stolen.
 Back-ups can be kept off-site at a different location. This can come in handy if a natural
disaster or fire destroys your workplace.
 The most important reason to back-up multiple generations of data is user error. If someone
accidentally deletes some important data and this goes unnoticed for several hours, days or
weeks, a good set of back-ups ensure you can still retrieve those files.

Disclaimer: Different topics on this study material are referred from books and internet and credit
goes to author of those books/content. For any error/discrepancy, please contact me at
[email protected].

130

You might also like