Storage and Information Management (8 It 3)
Storage and Information Management (8 It 3)
Storage and Information Management (8 It 3)
I
Introduction to Storage Technology: Data proliferation and the varying value of data
with time & usage, Sources of data and states of data creation, Data center
requirements and evolution to accommodate storage needs, Overview of basic storage
management skills and activities, The five pillars of technology, Overview of storage
infrastructure components, Evolution of storage, Information Lifecycle Management
concept, Data categorization within an enterprise, Storage and Regulations.
II
Storage Systems Architecture: Intelligent disk subsystems overview, Contrast of
integrated vs. modular arrays, Component architecture of intelligent disk
subsystems, Disk physical structure components, properties, performance, and
specifications, Logical partitioning of disks, RAID & parity algorithms, hot sparing,
Physical vs. logical disk organization, protection, and back end management, Array
caching properties and algorithms, Front end connectivity and queuing properties,
Front end to host storage provisioning, mapping, and operation, Interaction of file
systems with storage, Storage system connectivity protocols.
III
Introduction to Networked Storage: JBOD, DAS, SAN, NAS, & CAS evolution,
Direct Attached Storage (DAS) environments: elements, connectivity, & management,
Storage Area Networks (SAN): elements & connectivity, Fibre Channel principles,
standards, & network management principles, SAN management principles, Network
Attached Storage (NAS): elements, connectivity options, connectivity protocols (NFS,
CIFS, ftp), & management principles, IP SAN elements, standards (SCSI, FCIP, FCP),
connectivity principles, security, and management principles, Content Addressable
Storage (CAS): elements, connectivity options, standards, and management principles,
Hybrid Storage solutions overview including technologies like virtualization & appliances.
IV
Introduction to Information Availability: Business Continuity and Disaster Recovery
Basics, Local business continuity techniques, Remote business continuity techniques,
Disaster Recovery principles & techniques.
V
Managing & Monitoring: Management philosophies (holistic vs. system &
component), Industry management standards (SNMP, SMI-S, CIM), Standard
framework applications, Key management metrics (thresholds, availability,
capacity, security, performance), Metric analysis methodologies & trend analysis,
Reactive and pro- active management best practices, Provisioning & configuration change
planning, Problem reporting, prioritization, and handling techniques, Management tools
overview.
Unit 1
Data Proliferation
As more companies begin measuring their data stores in petabytes, the realities of data
proliferation become very apparent. In fact, the proliferation of the data proliferation topic
itself is out of control. It’s ironic that Microsoft Word doesn’t recognize petabyte as a real
word, yet the corporation’s data stores have reached petabyte levels. Information is
exploding, and it has become virtually impossible for companies to keep up.
Consider Wal-Mart. The chain has more than 6,000 stores, and some have almost a half-
million SKUs each. You think your Excel spreadsheets from finance are bad? Wal-Mart’s
database tables have literally 100 billion rows. The retailer’s POS systems have to ring up
some 276 million items – in one day.
Data Source
A data source is any of the following types of sources for (mostly) digitized data:
a database
a computer file
a data stream
Data from such sources is usually formatted and contains a certain amount of metadata.
Often when data is captured in one electronic system and then transferred to another, there
is a loss of audit trail or the inherent data cannot be absolutely verified. There are systems
that provide for absolute data export but then the system imported into has to allow for all
available data fields to be imported. Similarly, there are transaction logs in many modern
database systems. The acceptance of these transaction records into any new system could
be very important for any verification of such imported data.
The People's Map data creation process can sometimes be confusing to our new users, so
here follows a brief explanation of how the data creation process works and what data will
appear in what layer, at what time of the process.
Data Center
A data center is a facility used to house computer systems and associated components,
such as telecommunications and storage systems. It generally includes redundant or backup
power supplies, redundant data communications connections, environmental controls (e.g.,
air conditioning, fire suppression) and security devices.
IT operations are a crucial aspect of most organizational operations. One of the main
concerns is business continuity; companies rely on their information systems to run their
operations. If a system becomes unavailable, company operations may be impaired or
stopped completely. It is necessary to provide a reliable infrastructure for IT operations, in
order to minimize any chance of disruption. Information security is also a concern, and for
this reason a data center has to offer a secure environment which minimizes the chances of
a security breach. A data center must therefore keep high standards for assuring the
integrity and functionality of its hosted computer environment. This is accomplished through
redundancy of both fiber optic cables and power, which includes emergency backup power
generation.
Telcordia GR-3160, NEBS Requirements for Telecommunications Data Center Equipment
and Spaces, provides guidelines for data center spaces within telecommunications networks,
and environmental requirements for the equipment intended for installation in those spaces.
These criteria were developed jointly by Telcordia and industry representatives. They may
be applied to data center spaces housing data processing or Information Technology (IT)
equipment. The equipment may be used to:
Effective data center operation requires a balanced investment in both the facility and the
housed equipment. The first step is to establish a baseline facility environment suitable for
equipment installation. Standardization and modularity can yield savings and efficiencies in
the design and construction of telecommunications data centers.
Standardization means integrated building and equipment engineering. Modularity has the
benefits of scalability and easier growth, even when planning forecasts are less than
optimal. For these reasons, telecommunications data centers should be planned in repetitive
building blocks of equipment, and associated power and support (conditioning) equipment
when practical. The use of dedicated centralized systems requires more accurate forecasts
of future needs to prevent expensive over construction, or perhaps worse — under
construction that fails to meet future needs.
The "lights-out" data center, also known as a darkened or a dark data center, is a data
center that, ideally, has all but eliminated the need for direct access by personnel, except
under extraordinary circumstances. Because of the lack of need for staff to enter the data
center, it can be operated without lighting. All of the devices are accessed and managed by
remote systems, with automation programs used to perform unattended operations. In
addition to the energy savings, reduction in staffing costs and the ability to locate the site
further from population centers, implementing a lights-out data center reduces the threat of
malicious attacks upon the infrastructure.
storage management
The phrase storage management is a general storage industry term used to describe the
tools, processes, and policies used to manage storage networks and storage services such
as virtualization, replication, mirroring, security, compression, traffic analysis, and other
services. The phrase also encompasses other storage technologies, such as process
automation, storage management and real-time infrastructure products, and storage
provisioning.
Computer data storage, often called storage or memory, refers to computer components
and recording media that retain digital data used for computing for some interval of time.
Computer data storage provides one of the core functions of the modern computer, that of
information retention.
Primary storage (or main memory or internal memory), often referred to simply as memory,
is the only one directly accessible to the CPU. The CPU continuously reads instructions
stored there and executes them as required. Any data actively operated on is also stored
there in uniform manner.
Secondary storage (also known as external memory or auxiliary storage), differs from
primary storage in that it is not directly accessible by the CPU. The computer usually uses
its input/output channels to access secondary storage and transfers the desired data using
intermediate area in primary storage. Secondary storage does not lose the data when the
device is powered down—it is non-volatile. Per unit, it is typically also two orders of
magnitude less expensive than primary storage.
Tertiary storage or tertiary memory, provides a third level of storage. Typically it involves a
robotic mechanism which will mount (insert) and dismount removable mass storage media
into a storage device according to the system's demands; this data is often copied to
secondary storage before use. It is primarily used for archiving rarely accessed information
since it is much slower than secondary storage (e.g. 5–60 seconds vs. 1-10 milliseconds).
This is primarily useful for extraordinarily large data stores, accessed without human
operators. Typical examples include tape libraries and optical jukeboxes.
Off-line storage is a computer data storage on a medium or a device that is not under the
control of a processing unit. The medium is recorded, usually in a secondary or tertiary
storage device, and then physically removed or disconnected. It must be inserted or
connected by a human operator before a computer can access it again. Unlike tertiary
storage, it cannot be accessed without human interaction.
Characteristics of storage
Volatility
Non-volatile memory
Will retain the stored information even if it is not constantly supplied with electric power.
It is suitable for long-term storage of information.
Volatile memory
Requires constant power to maintain the stored information. The fastest memory
technologies of today are volatile ones (not a universal rule). Since primary storage is
required to be very fast, it predominantly uses volatile memory.
Differentiation
Mutability
Accessibility
Random access
Any location in storage can be accessed at any moment in approximately the same
amount of time. Such characteristic is well suited for primary and secondary storage.
Sequential access
The accessing of pieces of information will be in a serial order, one after the other;
therefore the time to access a particular piece of information depends upon which piece of
information was last accessed. Such characteristic is typical of off-line storage.
Addressability
Location-addressable
Each individually accessible unit of information in storage is selected with its numerical
memory address. In modern computers, location-addressable storage usually limits to
primary storage, accessed internally by computer programs, since location-addressability is
very efficient, but burdensome for humans.
File addressable
Information is divided into files of variable length, and a particular file is selected with
human-readable directory and file names. The underlying device is still location-
addressable, but the operating system of a computer provides the file system abstraction to
make the operation more understandable. In modern computers, secondary, tertiary and
off-line storage use file systems.
Content-addressable
Each individually accessible unit of information is selected based on the basis of (part of)
the contents stored there. Content-addressable storage can be implemented using software
(computer program) or hardware (computer device), with hardware being faster but more
expensive option. Hardware content addressable memory is often used in a computer's CPU
cache.
Capacity
Raw capacity
The total amount of stored information that a storage device or medium can hold. It is
expressed as a quantity of bits or bytes (e.g. 10.4 megabytes).
Memory storage density
The compactness of stored information. It is the storage capacity of a medium divided
with a unit of length, area or volume (e.g. 1.2 megabytes per square inch).
Performance
Latency
The time it takes to access a particular location in storage. The relevant unit of
measurement is typically nanosecond for primary storage, millisecond for secondary
storage, and second for tertiary storage. It may make sense to separate read latency and
write latency, and in case of sequential access storage, minimum, maximum and average
latency.
Throughput
The rate at which information can be read from or written to the storage. In computer
data storage, throughput is usually expressed in terms of megabytes per second or MB/s,
though bit rate may also be used. As with latency, read rate and write rate may need to be
differentiated. Also accessing media sequentially, as opposed to randomly, typically yields
maximum throughput.
Energy use
Storage devices that reduce fan usage, automatically shut-down during inactivity, and
low power hard drives can reduce energy consumption 90 percent.
2.5 inch hard disk drives often consume less power than larger ones. Low capacity solid-
state drives have no moving parts and consume less power than hard disks. Also, memory
may use more power than hard disks.
5 Pillars of technology
Operating Systems
An operating system (OS) is software, consisting of programs and data, that runs on
computers, manages computer hardware resources, and provides common services for
execution of various application software.
Operating system development is one of the most complicated activities in which a
computing hobbyist may engage. A hobby operating system may be classified as one whose
code has not been directly derived from an existing operating system, and has few users
and active developers.
Applications
Application software, also known as an application or an "app", is computer software
designed to help the user to perform singular or multiple related specific tasks. Examples
include enterprise software, accounting software, office suites, graphics software and media
players. Many application programs deal principally with documents.
Application software applies the power of a particular computing platform or system
software to a particular purpose. Some apps such as Microsoft Office are available in
versions for several different platforms; others have narrower requirements.
Database
Database research has been carried out since the early days of dealing with the database
concept in the 1960s. It has taken place at research and development groups of companies
(e.g., notably at IBM Research), research institutes, and Academia. Research has been done
both through Theory and Prototypes. The interaction between research and database
related product development has been very productive to the database area, and many
related key concepts and technologies emerged from it. Notable are the Relational and the
Entity-relationship models, the Atomic transaction concept and related Concurrency control
techniques, Query optimization methods, etc.
Networks
Networks are often classified as local area network (LAN), wide area network (WAN),
metropolitan area network (MAN), personal area network (PAN), virtual private network
(VPN), campus area network (CAN), storage area network (SAN), and others, depending on
their scale, scope and purpose, e.g., controller area network (CAN) usage, trust level, and
access right often differ between these types of networks. LANs tend to be designed for
internal use by an organization's internal systems and employees in individual physical
locations, such as a building, while WANs may connect physically separate parts of an
organization and may include connections to third parties.
ILM Policy consists of the overarching storage and information policies that drive
management processes. Policies are dictated by business goals and drivers. Therefore,
policies generally tie into a framework of overall IT governance and management; change
control processes; requirements for system availability and recovery times; and service
level agreements (SLAs).
Operational
Operational aspects of ILM include backup and data protection; disaster recovery, restore,
and restart; archiving and long-term retention; data replication; and day-to-day processes
and procedures necessary to manage a storage architecture.
Infrastructure
Infrastructure facets of ILM include the logical and physical architectures; the applications
dependent upon the storage platforms; security of storage; and data center constraints.
Within the application realm, the relationship between applications and the production, test,
and development requirements are generally most relevant for ILM.
Definition
ILM includes every phase of a "record" from its beginning to its end. And while it is
generally applied to information that rises to the classic definition of a record (Records
management), it applies to any and all informational assets. During its existence,
information can become a record by being identified as documenting a business transaction
or as satisfying a business need. In this sense ILM has been part of the overall approach of
ECM Enterprise content management.
However, in a more general perspective the term "business" must be taken in a broad
sense, and not forcibly tied to direct commercial or enterprise contexts. While most records
are thought of as having a relationship to enterprise business, not all do. Much recorded
information serves to document an event or a critical point in history. Examples of these are
birth, death, medical/health and educational records. e-Science, for example, is an
emerging area where ILM has become relevant.
In the year 2004, attempts have been made by the Information Technology and Information
Storage industries (SNIA association) to assign a new broader definition to Information
Lifecycle Management (ILM) according to this broad view:
Information Lifecycle Management comprises the policies, processes, practices, and tools
used to align the business value of information with the most appropriate and cost effective
IT infrastructure from the time information is conceived through its final disposition.
Information is aligned with business processes through management policies and service
levels associated with applications, metadata, information, and data.
Functionality
For the purposes of business records, there are five phases identified as being part of the
lifecycle continuum along with one exception. These are:
Distribution is the process of managing the information once it has been created or
received. This includes both internal and external distribution, as information that leaves an
organization becomes a record of a transaction with others.
Use takes place after information is distributed internally, and can generate business
decisions, document further actions, or serve other purposes.
Maintenance is the management of information. This can include processes such as filing,
retrieval and transfers. While the connotation of 'filing' presumes the placing of information
in a prescribed container and leaving it there, there is much more involved. Filing is actually
the process of arranging information in a predetermined sequence and creating a system to
manage it for its useful existence within an organization. Failure to establish a sound
method for filing information makes its retrieval and use nearly impossible. Transferring
information refers to the process of responding to requests, retrieval from files and
providing access to users authorized by the organization to have access to the information.
While removed from the files, the information is tracked by the use of various processes to
ensure it is returned and/or available to others who may need access to it.
Disposition is the practice of handling information that is less frequently accessed or has
met its assigned retention periods. Less frequently accessed records may be considered for
relocation to an 'inactive records facility' until they have met their assigned retention period.
"Although a small percentage of organizational information never loses its value, the value
of most information tends to decline over time until it has no further value to anyone for
any purpose. The value of nearly all business information is greatest soon after it is created
and generally remains active for only a short time --one to three years or so-- after which
its importance and usage declines. The record then makes its life cycle transition to a semi-
active and finally to an inactive state." [1] Retention periods are based on the creation of an
organization-specific retention schedule, based on research of the regulatory, statutory and
legal requirements for management of information for the industry in which the organization
operates. Additional items to consider when establishing a retention period are any business
needs that may exceed those requirements and consideration of the potential historic,
intrinsic or enduring value of the information. If the information has met all of these needs
and is no longer considered to be valuable, it should be disposed of by means appropriate
for the content. This may include ensuring that others cannot obtain access to outdated or
obsolete information as well as measures for protection privacy and confidentiality.'
remains accessible. Media is subject to both degradation and obsolescence over its lifespan,
and therefore, policies and procedures must be established for the periodic conversion and
migration of information stored electronically to ensure it remains accessible for its required
retention periods.