Solaris Volume Manager
Solaris Volume Manager
Solaris Volume Manager
Administration Guide
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No
part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S.
and other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks, registered trademarks, or service marks
of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks
of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun
Microsystems, Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the
pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a
non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs
and otherwise comply with Sun’s written license agreements.
Federal Acquisitions: Commercial Software–Government Users Subject to Standard License Terms and Conditions.
DOCUMENTATION IS PROVIDED “AS IS” AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE
DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. Tous droits réservés
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution, et la
décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, par quelque moyen que ce soit, sans
l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y en a. Le logiciel détenu par des tiers, et qui comprend la technologie relative
aux polices de caractères, est protégé par un copyright et licencié par des fournisseurs de Sun.
Des parties de ce produit pourront être dérivées du système Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque déposée aux
Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, docs.sun.com, AnswerBook, AnswerBook2, et Solaris sont des marques de fabrique ou des marques déposées, ou
marques de service, de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays. Toutes les marques SPARC sont utilisées sous licence et sont des
marques de fabrique ou des marques déposées de SPARC International, Inc. aux Etats-Unis et dans d’autres pays. Les produits portant les marques
SPARC sont basés sur une architecture développée par Sun Microsystems, Inc.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun reconnaît
les efforts de pionniers de Xerox pour la recherche et le développement du concept des interfaces d’utilisation visuelle ou graphique pour l’industrie
de l’informatique. Sun détient une licence non exclusive de Xerox sur l’interface d’utilisation graphique Xerox, cette licence couvrant également les
licenciés de Sun qui mettent en place l’interface d’utilisation graphique OPEN LOOK et qui en outre se conforment aux licences écrites de Sun.
CETTE PUBLICATION EST FOURNIE “EN L’ETAT” ET AUCUNE GARANTIE, EXPRESSE OU IMPLICITE, N’EST ACCORDEE, Y COMPRIS DES
GARANTIES CONCERNANT LA VALEUR MARCHANDE, L’APTITUDE DE LA PUBLICATION A REPONDRE A UNE UTILISATION
PARTICULIERE, OU LE FAIT QU’ELLE NE SOIT PAS CONTREFAISANTE DE PRODUIT DE TIERS. CE DENI DE GARANTIE NE
S’APPLIQUERAIT PAS, DANS LA MESURE OU IL SERAIT TENU JURIDIQUEMENT NUL ET NON AVENU.
020115@3062
Contents
Preface 17
3
How to Interact With Solaris Volume Manager 36
▼ How to Access the Solaris Volume Manager Graphical User Interface 37
Solaris Volume Manager Requirements 38
Overview of Solaris Volume Manager Components 38
Volumes 39
State Database and State Database Replicas 43
Hot Spare Pools 44
Disk Sets 44
Solaris Volume Manager Configuration Guidelines 45
General Guidelines 45
File System Guidelines 45
Overview of Creating Solaris Volume Manager Elements 46
Prerequisites for Creating Solaris Volume Manager Elements 46
Contents 5
Background Information for Changing RAID 1 Volume Options 90
How Booting Into Single-User Mode Affects RAID 1 Volumes 90
Scenario—RAID 1 Volumes (Mirrors) 90
Contents 7
16 Hot Spare Pools (Tasks) 153
Hot Spare Pools (Task Map) 153
Creating a Hot Spare Pool 154
▼ How to Create a Hot Spare Pool 154
▼ How to Add Additional Slices to a Hot Spare Pool 155
Associating a Hot Spare Pool With Volumes 156
▼ How to Associate a Hot Spare Pool With a Volume 156
▼ How to Change the Associated Hot Spare Pool 157
Maintaining Hot Spare Pools 159
▼ How to Check Status of Hot Spares and Hot Spare Pools 159
▼ How to Replace a Hot Spare in a Hot Spare Pool 160
▼ How to Delete a Hot Spare from a Hot Spare Pool 161
▼ How to Enable a Hot Spare 162
Contents 9
21 Maintaining Solaris Volume Manager (Tasks) 217
Solaris Volume Manager Maintenance (Task Map) 217
Viewing the Solaris Volume Manager Configuration 218
▼ How to View the Solaris Volume Manager Volume Configuration 218
Renaming Volumes 221
Background Information for Renaming Volumes 221
Exchanging Volume Names 222
▼ How to Rename a Volume 223
Working with Configuration Files 224
▼ How to Create Configuration Files 224
▼ How to Initialize Solaris Volume Manager from a Configuration File 224
Changing Solaris Volume Manager Defaults 226
▼ How to Increase the Number of Default Volumes 226
How to Increase the Number of Default Disk Sets 227
Growing a File System 228
Background Information for Expanding Slices and Volumes 228
▼ How to Grow a File System 229
Overview of Replacing and Enabling Components in RAID 1 and RAID 5 Volumes
230
Enabling a Component 230
Replacing a Component With Another Available Component 231
Maintenance and Last Erred States 232
Background Information For Replacing and Enabling Slices in Mirrors and RAID 5
Volumes 232
Contents 11
Index 285
13
14 Solaris Volume Manager Administration Guide • May 2002
Figures
FIGURE 3–1 View of the Enhanced Storage tool (Solaris Volume Manager) in the
Solaris Management Console 37
FIGURE 3–2 Relationship Among a Volume, Physical Disks, and Slices 41
FIGURE 4–1 Basic Hardware Diagram 48
FIGURE 7–1 Stripe Example 65
FIGURE 7–2 Concatenation Example 67
FIGURE 7–3 Concatenated Stripe Example 68
FIGURE 9–1 Mirror Example 82
FIGURE 9–2 RAID 1+ 0 Example 83
FIGURE 13–1 RAID 5 Volume Example 132
FIGURE 13–2 Expanded RAID 5 Volume Example 133
FIGURE 15–1 Hot Spare Pool Example 149
FIGURE 17–1 Transactional Volume Example 167
FIGURE 17–2 Shared Log Transactional Volume Example 168
FIGURE 19–1 Disk Sets Example 201
FIGURE 22–1 Small system configuration 236
15
16 Solaris Volume Manager Administration Guide • May 2002
Preface
The Solaris Volume Manager Administration Guide explains how to use Solaris Volume
Manager to manage your system’s storage needs, including creating, modifying, and
using RAID 0 (concatenation and stripe) volumes, RAID 1 (mirror) volumes, and
RAID 5 volumes, in addition to soft partitions and transactional log devices.
Chapter 1 provides a detailed “roadmap” to the concepts and tasks described in this
book and should be used solely as a navigational aid to the book’s content.
Chapter 4 provides an overall scenario for understanding the Solaris Volume Manager
product.
17
Chapter 5 describes concepts related to state databases and state database replicas.
Chapter 6 explains how to perform tasks related to state databases and state database
replicas.
Chapter 8 explains how to perform tasks related to RAID 0 (stripe and concatenation)
volumes.
Chapter 11 describes concepts related to the Solaris Volume Manager soft partitioning
feature.
Chapter 15 describes concepts related to hot spares and hot spare pools.
Chapter 16 explains how to perform tasks related to hot spares and hot spare pools.
Chapter 21 explains some general maintenance tasks that are not related to a specific
Solaris Volume Manager component.
Chapter 22 provides some “best practices” information about configuring and using
Solaris Volume Manager.
Chapter 23 provides concepts and instructions for using the Solaris Volume Manager
SNMP agent and for other error checking approaches.
Appendix B provides tables that summarize commands and other helpful information.
Related Books
Solaris Volume Manager is one of several system administration tools available for the
Solaris operating environment. Information about overall system administration
features and functions, as well as related tools are provided in the following:
■ System Administration Guide: Basic Administration
■ System Administration Guide: Advanced Administration
Typographic Conventions
The following table describes the typographic conventions used in this book.
AaBbCc123 The names of commands, files, and Edit your .login file.
directories; on-screen computer output
Use ls -a to list all files.
machine_name% you have
mail.
Preface 19
TABLE P–1 Typographic Conventions (Continued)
Typeface or Symbol Meaning Example
AaBbCc123 Book titles, new words, or terms, or Read Chapter 6 in User’s Guide.
words to be emphasized
These are called class options.
You must be root to do this.
Shell Prompt
The Solaris Volume Manager Administration Guide describes how to set up and maintain
systems using Solaris Volume Manager to manage storage for high availability,
flexibility, and reliability.
This chapter serves as a high-level guide to find information for certain Solaris Volume
Manager tasks, such as setting up storage capacity. This chapter does not address all
the tasks that you will need to use Solaris Volume Manager. Instead, it provides an
easy way to find procedures describing how to perform common tasks associated with
the following Solaris Volume Manager concepts:
■ Storage Capacity
■ Availability
■ I/O Performance
■ Administration
■ Troubleshooting
Caution – If you do not use Solaris Volume Manager correctly, you can destroy data.
Solaris Volume Manager provides a powerful way to reliably manage your disks and
data on them. However, you should always maintain backups of your data,
particularly before you modify an active Solaris Volume Manager configuration.
21
Getting Started With Solaris Volume
Manager
Set up storage Create storage that spans slices by creating a “How to Create a RAID 0 (Stripe)
RAID 0 or a RAID 5 volume. The RAID 0 or Volume” on page 74
RAID 5 volume can then be used for a file
“How to Create a RAID 0
system or any application, such as a database
(Concatenation) Volume”
that accesses the raw device
on page 75
“How to Create a RAID 1
Volume From Unused Slices”
on page 95
“How to Create a RAID 1
Volume From a File System”
on page 97
“How to Create a RAID 5
Volume” on page 138
Expand an existing file Increase the capacity of an existing file system “How to Expand Space for
system by creating a RAID 0 (concatenation) volume, Existing Data” on page 76
then adding additional slices.
Expand an existing RAID 0 Expand an existing RAID 0 volume by “How to Expand an Existing
(concatenation or stripe) concatenating additional slices to it. RAID 0 Volume” on page 78
volume
Expand a RAID 5 volume Expand the capacity of a RAID 5 volume by “How to Expand a RAID 5
concatenating additional slices to it. Volume” on page 142
Increase the size of a UFS file Grow a file system by using the growfs “How to Grow a File System”
system on a expanded command to expand the size of a UFS while it on page 229
volume is mounted and without disrupting access to
the data.
Subdivide slices or logical Subdivide logical volumes or slices by using “How to Create a Soft Partition”
volumes into smaller soft partitions. on page 126
partitions, breaking the 8 slice
hard partition limit
Create a file system Create a file system on a RAID 0 (stripe or “Creating File Systems (Tasks)”
concatenation), RAID 1 (mirror), RAID 5, or in System Administration Guide:
transactional volume, or on a soft partition. Basic Administration
Maximize data availability Use Solaris Volume Manager’s mirroring “How to Create a RAID 1
feature to maintain multiple copies of your Volume From Unused Slices”
data. You can create a RAID 1 volume from on page 95
unused slices in preparation for data, or you
“How to Create a RAID 1
can mirror an existing file system, including
Volume From a File System”
root (/) and /usr.
on page 97
Add data availability with Increase data availability with minimum of “How to Create a RAID 5
minimum hardware cost hardware by using Solaris Volume Manager’s Volume” on page 138
RAID 5 volumes.
Increase data availability for Increase data availability for a RAID 1 or a “Creating a Hot Spare Pool”
an existing RAID 1 or RAID 5 RAID 5 volume, by creating a hot spare pool on page 154
volume then associate it with a mirror’s submirrors, or
“Associating a Hot Spare Pool
a RAID 5 volume.
With Volumes” on page 156
Increase file system Increase overall file system availability after “About File System Logging”
availability after reboot reboot, by adding UFS logging (transactional on page 165
volume) to the system. Logging a file system
reduces the amount of time that the fsck
command has to run when the system reboots.
Tune RAID 1 volume Specify the read and write policies for a RAID 1 volume “RAID 1 Volume Read and
read and write to improve performance for a given configuration. Write Policies” on page 86
policies
“How to Change RAID 1
Volume Options” on page 110
Optimize device Creating RAID 0 (stripe) volumes optimizes performance “Creating RAID 0 (Stripe)
performance of devices that make up the stripe. The interlace value can Volumes” on page 74
be optimized for random or sequential access.
Maintain device Expands stripe or concatenation that has run out of space “Expanding Storage Space”
performance within a by concatenating a new component to it. A concatenation on page 76
RAID 0 (stripe) of stripes is better for performance than a concatenation
of slices.
Graphically Use the Solaris Management Console to administer your Online help from within
administer your volume management configuration. Solaris Volume Manager
volume management (Enhanced Storage) node of
configuration the Solaris Management
Console application
Graphically Use the Solaris Management Console graphical user Online help from within the
administer slices and interface to administer your disks and file systems, Solaris Management Console
file systems performing such tasks as partitioning disks and application
constructing UFS file systems.
Optimize Solaris Solaris Volume Manager performance is dependent on a “Solaris Volume Manager
Volume Manager well-designed configuration. Once created, the Configuration Guidelines”
configuration needs monitoring and tuning. on page 45
“Working with Configuration
Files” on page 224
Plan for future Because file systems tend to run out of space, you can “Creating RAID 0
expansion plan for future growth by putting a file system into a (Concatenation) Volumes”
concatenation. on page 75
“Expanding Storage Space”
on page 76
Replace a failed slice If a disk fails, you must replace the slices used in your “Responding to RAID 1
Solaris Volume Manager configuration. In the case of Volume Component Failures”
RAID 0 volume, you have to use a new slice, delete and on page 112
recreate the volume, then restore data from a backup.
“How to Replace a Component
Slices in RAID 1 and RAID 5 volumes can be replaced
in a RAID 5 Volume”
and resynchronized without loss of data.
on page 143
Recover from boot Special problems can arise when booting the system, due “How to Recover From
problems to a hardware problem or operator error. Improper /etc/vfstab
Entries” on page 258
“How to Recover From
Insufficient State Database
Replicas” on page 265
“How to Recover From a Boot
Device Failure” on page 261
Work with Problems with transactional volumes can occur on either “How to Recover a
transactional volume the master or logging device, and they can either be Transactional Volume With a
problems caused by data or device problems. All transactional Panic” on page 192
volumes sharing the same logging device must be fixed
“How to Recover a
before they return to a usable state.
Transactional Volume With
Hard Errors” on page 193
Storage Hardware
There are many different devices on which data can be stored. The selection of devices
to best meet your storage needs depends primarily on three factors:
■ Performance
■ Availability
■ Cost
27
You can use Solaris Volume Manager to help manage the tradeoffs in performance,
availability and cost. You can often mitigate many of the tradeoffs completely with
Solaris Volume Manager.
Solaris Volume Manager works well with any supported storage on any system that
runs the Solaris™ Operating Environment.
RAID Levels
RAID is an acronym for Redundant Array of Inexpensive (or Independent) Disks.
Basically, this term refers to a set of disks (called an array, or, more commonly, a
volume) that appears to the user as a single large disk drive. This array provides,
depending on the configuration, improved reliability, response time, and/or storage
capacity.
Technically, there are six RAID levels, 0-5,. Each level refers to a method of distributing
data while ensuring data redundancy. (RAID level 0 does not provide data
redundancy, but is usually included as a RAID classification because it is the basis for
the majority of RAID configurations in use.) Very few storage environments support
RAID levels 2, 3, and 4, so they are not described here.
This section provides guidelines for working with Solaris Volume Manager RAID 0
(concatenation and stripe) volumes, RAID 1 (mirror) volumes, RAID 5 volumes, soft
partitions, transactional (logging) volumes, and file systems that are constructed on
volumes.
Note – The storage mechanisms listed are not mutually exclusive. You can use them in
combination to meet multiple goals. For example, you could create a RAID 1 volume
for redundancy, then create soft partitions on it to increase the number of discrete file
systems that are possible.
Improved No Yes No No No
write
performance
■ RAID 0 devices (stripes and concatenations), and soft partitions do not provide any
redundancy of data.
■ Concatenation works well for small random I/O.
■ Striping performs well for large sequential I/O and for random I/O distributions.
■ Mirroring might improve read performance; write performance is always
degraded.
■ Because of the read-modify-write nature of RAID 5 volumes, volumes with greater
than about 20 percent writes should probably not be RAID 5. If redundancy is
required, consider mirroring.
■ RAID 5 writes will never be as fast as mirrored writes, which in turn will never be
as fast as unprotected writes.
■ Soft partitions are useful for managing very large storage devices.
Note – In addition to these generic storage options, see “Hot Spare Pools” on page 44
for more information about using Solaris Volume Manager to support redundant
devices.
Performance Issues
In general, if you do not know if sequential I/O or random I/O predominates on file
systems you will be implementing on Solaris Volume Manager volumes, do not
implement these performance tuning tips. These tips can degrade performance if they
are improperly implemented.
Random I/O
If you have a random I/O environment, such as an environment used for databases
and general-purpose file servers, you want all disk spindles to be approximately equal
amounts of time servicing I/O requests.
For example, assume that you have 40 Gbytes of storage for a database application. If
you stripe across four 10 Gbyte disk spindles, and if the I/O load is truly random and
evenly dispersed across the entire range of the table space, then each of the four
spindles will tend to be equally busy, which will generally improve performance.
The target for maximum random I/O performance on a disk is 35 percent or lower
usage, as reported by the iostat command. Disk use in excess of 65 percent on a
typical basis is a problem. Disk use in excess of 90 percent is a significant problem. The
solution to having disk use values that are too high is to create a new RAID 0 volume
with more disks (spindles).
Note – Simply attaching additional disks to an existing volume will not improve
performance. You must create a new volume with the ideal parameters to optimize
performance.
The interlace size of the stripe doesn’t matter because you just want to spread the data
across all the disks. Any interlace value greater than the typical I/O request will do.
For example, assume a typical I/O request size of 256 Kbyte and striping across 4
spindles. A good choice for stripe unit size in this example would be: 256 Kbyte / 4 =
64 Kbyte, or smaller.
This strategy ensures that the typical I/O request is spread across multiple disk
spindles, thus increasing the sequential bandwidth.
In sequential applications, the typical I/O size is usually large (greater than 128
Kbytes, often greater than 1 Mbytes). Assume an application with a typical I/O
request size of 256 Kbytes and assume striping across 4 disk spindles. 256 Kbytes / 4 =
64 Kbytes. So, a good choice for the interlace size would be 32 to 64 Kbyte.
This chapter explains the overall structure of Solaris Volume Manager and provides
the following information:
■ “What Does Solaris Volume Manager Do?” on page 35
■ “Solaris Volume Manager Requirements” on page 38
■ “Overview of Solaris Volume Manager Components” on page 38
■ “Solaris Volume Manager Configuration Guidelines” on page 45
■ “Overview of Creating Solaris Volume Manager Elements” on page 46
In some instances, Solaris Volume Manager can also improve I/O performance.
35
A volume is functionally identical to a physical disk in the view of an application or a
file system (such as UFS). Solaris Volume Manager converts I/O requests directed at a
volume into I/O requests to the underlying member disks.
Solaris Volume Manager volumes are built from slices (disk partitions) or from other
Solaris Volume Manager volumes. An easy way to build volumes is to use the
graphical user interface built into the Solaris Management Console. The Enhanced
Storage tool within the Solaris Management Console presents you with a view of all
the existing volumes. By following the steps in wizards, you can easily build any kind
of Solaris Volume Manager volume or component. You can also build and modify
volumes by using Solaris Volume Manager command-line utilities.
If, for example, you want to create more storage capacity as a single volume, you
could use Solaris Volume Manager to make the system treat a collection of many small
slices as one larger slice or device. After you have created a large volume from these
slices, you can immediately begin using it just as any “real” slice or device.
Solaris Volume Manager can increase the reliability and availability of data by using
RAID 1 (mirror) volumes and RAID 5 volumes. Solaris Volume Manager hot spares
can provide another level of data availability for mirrors and RAID 5 volumes.
Once you have set up your configuration, you can use the Enhanced Storage tool
within the Solaris Management Console to report on its operation.
FIGURE 3–1 View of the Enhanced Storage tool (Solaris Volume Manager) in the Solaris Management Console
1. Start Solaris Management Console on the host system by using the following
command:
% /usr/sbin/smc
3. Double-click Storage.
5. If prompted to log in, log in as root or as a user who has equivalent access.
6. Double-click the appropriate icon to manage volumes, hot spare pools, state
database replicas, and disk sets.
Tip – To help with your tasks, all tools in the Solaris Management Console display
information in the bottom section of the page or at the left side of a wizard panel.
Choose Help at any time to find additional information about performing tasks in this
interface.
RAID 0 volumes (stripe, A group of physical slices To increase storage “Volumes” on page 39
concatenation, that appear to the system capacity, performance, or
concatenated stripe), RAID as a single, logical device data availability.
1 (mirror) volumes, RAID
5 volumes
State database (state A database that stores Solaris Volume Manager “State Database and State
database replicas) information on disk about cannot operate until you Database Replicas”
the state of your Solaris have created the state on page 43
Volume Manager database replicas.
configuration
Hot spare pool A collection of slices (hot To increase data “Hot Spare Pools”
spares) reserved to be availability for RAID 1 and on page 44
automatically substituted RAID 5 volumes.
in case of component
failure in either a
submirror or RAID 5
volume
Disk set A set of shared disk drives To provide data “Disk Sets” on page 44
in a separate namespace redundancy and
that contain volumes and availability and to provide
hot spares and that can be a separate namespace for
non-concurrently shared easier administration.
by multiple hosts
Volumes
A volume is a name for a group of physical slices that appear to the system as a single,
logical device. Volumes are actually pseudo, or virtual, devices in standard UNIX®
terms.
You can use either the Enhanced Storage tool within the Solaris Management Console
or the command-line utilities to create and administer volumes.
Volume Description
RAID 0 (stripe or Can be used directly, or as the basic building blocks for mirrors and
concatenation) transactional devices. By themselves, RAID 0 volumes do not provide
data redundancy.
RAID 5 Replicates data by using parity information. In the case of disk failure,
the missing data can be regenerated by using available data and the
parity information. A RAID 5 volume is generally composed of slices.
One slice’s worth of space is allocated to parity information, but it is
distributed across all slices in the RAID 5 volume.
Transactional Used to log a UFS file system. (UFS logging is a preferable solution to
this need, however.) A transactional volume is composed of a master
device and a logging device. Both of these devices can be a slice, RAID 0
volume, RAID 1 volume, or RAID5 volume. The master device contains
the UFS file system.
Soft partition Divides a slice or logical volume into one or more smaller, extensible
volumes.
You can use most file system commands (mkfs, mount, umount, ufsdump,
ufsrestore, and so forth) on volumes. You cannot use the format command,
however. You can read, write, and copy files to and from a volume, as long as the
volume contains a mounted file system.
Physical disks d0
A and B
Disk A c1t1d0s2
c1t1d0s2
c2t2d0s2
d0
Disk B
c2t2d0s2
You can expand a mounted or unmounted UFS file system that is contained within a
volume without having to halt or back up your system. (Nevertheless, backing up
your data is always a good idea.) After you expand the volume, use the growfs
command to grow the file system.
Note – After a file system has been expanded, it cannot be shrunk. Not shrinking the
size of a file system is a UFS limitation. Similarly, after a Solaris Volume Manager
partition has been increased in size, it cannot be reduced.
Applications and databases that use the raw volume must have their own method to
“grow” the added space so that the application or database can recognize it. Solaris
Volume Manager does not provide this capability.
The file system can be expanded to use only part of the additional disk space by using
the -s size option to the growfs command.
Note – When you expand a mirror, space is added to the mirror’s underlying
submirrors. Likewise, when you expand a transactional volume, space is added to the
master device. The growfs command is then run on the RAID 1 volume or the
transactional volume, respectively. The general rule is that space is added to the
underlying devices, and the growfs command is run on the top-level device.
Volume Names
■ Instead of specifying the full volume name, such as /dev/md/dsk/d1, you can
often use an abbreviated volume name, such as d1, with any meta* command.
The state database is actually a collection of multiple, replicated database copies. Each
copy, referred to as a state database replica, ensures that the data in the database is
always valid. Having copies of the state database protects against data loss from single
points-of-failure. The state database tracks the location and status of all known state
database replicas.
Solaris Volume Manager cannot operate until you have created the state database and
its state database replicas. It is necessary that a Solaris Volume Manager configuration
have an operating state database.
When you set up your configuration, you can locate the state database replicas on
either of the following:
■ On dedicated slices
■ On slices that will later become part of volumes
You can keep more than one copy of a state database on one slice. However, you
might make the system more vulnerable to a single point-of-failure by doing so.
The system will continue to function correctly if all state database replicas are deleted.
However, the system will lose all Solaris Volume Manager configuration data if a
reboot occurs with no existing state database replicas on disk.
When errors occur, Solaris Volume Manager checks the hot spare pool for the first
available hot spare whose size is equal to or greater than the size of the slice being
replaced. If found, Solaris Volume Manager automatically resynchronizes the data. If a
slice of adequate size is not found in the list of hot spares, the submirror or RAID 5
volume is considered to have failed. For more information, see Chapter 15.
Disk Sets
A shared disk set, or simply disk set, is a set of disk drives that contain state database
replicas, volumes, and hot spares that can be shared exclusively but not at the same
time by multiple hosts.
A disk set provides for data availability in a clustered environment. If one host fails,
another host can take over the failed host’s disk set. (This type of configuration is
known as a failover configuration.) Additionally, disk sets can be used to help manage
the Solaris Volume Manager name space, and to provide ready access to
network-attached storage devices.
General Guidelines
■ Disk and controllers–Place drives in a volume on separate drive paths. For SCSI
drives, this means separate host adapters. Spreading the I/O load over several
controllers improves volume performance and availability.
■ System files–Never edit or remove the /etc/lvm/mddb.cf or /etc/lvm/md.cf
files.
Make sure these files are backed up on a regular basis.
■ Volume Integrity–After a slice is defined as a volume and activated, do not use it
for any other purpose.
■ Maximum volumes–The maximum number of volumes supported in a disk set is
8192 (though the default number of volumes is 128). To increase the number of
default volumes, edit the /kernel/drv/md.conf file. See “System Files and
Startup Files” on page 277 for more information on this file.
■ Information about disks and partitions–Have a copy of output from the prtvtoc
and metastat -p command in case you need to reformat a bad disk or re-create
your Solaris Volume Manager configuration.
Note – For suggestions on how to name volumes, see “Volume Names” on page 42.
Throughout the Solaris Volume Manager Administration Guide, the examples generally
relate to a single storage configuration, whenever that is possible. This chapter
explains what that configuration is and provides information about this broad storage
scenario for the rest of the book.
Background
Throughout this book, the scenarios and many examples relate to a single
configuration. Although this configuration is small (to simplify the documentation),
the concepts will scale to much larger storage environments.
Hardware Configuration
The hardware system is configured as follows:
■ There are 3 physically separate controllers (c0 — IDE, c1—SCSI, and c2 — SCSI).
■ Each SCSI controller connects to a MultiPack that contains 6 internal 9–Gbyte disks
(c1t1 through c1t6 and c2t1 through c2t6).
■ Each controller/terminator pair (cntn) has 8.49 Gbytes of usable storage space.
■ Storage space on the root (/) drive c0t0d0 is split into 6 partitions.
47
An alternative way to understand this configuration is shown in the following
diagram.
c0t0d0
c0
c1t1d0 c2t1d0
c1t2d0 c2t2d0
c1t3d0 c2t3d0
c1t4d0 c2t4d0
c1t5d0 c2t5d0
c1t6d0 c2t6d0
c1 c2
Storage Configuration
The storage configuration before Solaris Volume Manager is configured is as follows:
■ The SCSI controller/terminator pairs (cntn) have approximately 20 Gbytes of
storage space
■ Storage space on each disk (for example, c1t1d0) is split into 7 partitions
(cntnd0s0 through cntnd0s6).
To partition a disk, follow the procedures explained in “Formatting a Disk” in
System Administration Guide: Basic Administration.
This chapter provides conceptual information about state database replicas. For
information about performing related tasks, see Chapter 6.
The state database replicas ensure that the data in the state database is always valid.
When the state database is updated, each state database replica is also updated. The
updates take place one at a time (to protect against corrupting all updates if the
system crashes).
If your system loses a state database replica, Solaris Volume Manager must figure out
which state database replicas still contain valid data. Solaris Volume Manager
determines this information by using a majority consensus algorithm. This algorithm
requires that a majority (half + 1) of the state database replicas be available and in
agreement before any of them are considered valid. It is because of this majority
51
consensus algorithm that you must create at least three state database replicas when
you set up your disk configuration. A consensus can be reached as long as at least two
of the three state database replicas are available.
During booting, Solaris Volume Manager ignores corrupted state database replicas. In
some cases, Solaris Volume Manager tries to rewrite state database replicas that are
corrupted. Otherwise, they are ignored until you repair them. If a state database
replica becomes corrupted because its underlying slice encountered an error, you will
need to repair or replace the slice and then enable the replica.
If all state database replicas are lost, you could, in theory, lose all data that is stored on
your Solaris Volume Manager volumes. For this reason, it is good practice to create
enough state database replicas on separate drives and across controllers to prevent
catastrophic failure. It is also wise to save your initial Solaris Volume Manager
configuration information, as well as your disk partition information.
See Chapter 6 for information on adding additional state database replicas to the
system, and on recovering when state database replicas are lost.
State database replicas are also used for RAID 1 volume resynchronization regions.
Too few state database replicas relative to the number of mirrors might cause replica
I/O to impact RAID 1 volume performance. That is, if you have a large number of
mirrors, make sure that you have a total of at least two state database replicas per
RAID 1 volume, up to the maximum of 50 replicas per disk set.
Each state database replica occupies 4 Mbytes (8192 disk sectors) of disk storage by
default. Replicas can be stored on the following devices:
■ a dedicated disk partition
■ a partition that will be part of a volume
■ a partition that will be part of a UFS logging device
Note – Replicas cannot be stored on the root (/), swap, or /usr slices, or on slices that
contain existing file systems or data. After the replicas have been stored, volumes or
file systems can be placed on the same slice.
To protect data, Solaris Volume Manager will not function unless half of all state
database replicas are available. The algorithm, therefore, ensures against corrupt data.
If insufficient state database replicas are available, you will have to boot into
single-user mode and delete enough of the bad or missing replicas to achieve a
quorum. See “How to Recover From Insufficient State Database Replicas” on page 265.
Note – When the number of state database replicas is odd, Solaris Volume Manager
computes the majority by dividing the number in half, rounding down to the nearest
integer, then adding 1 (one). For example, on a system with seven replicas, the
majority would be four (seven divided by two is three and one-half, rounded down is
three, plus one is four).
When you work with state database replicas, consider the following
“Recommendations for State Database Replicas” on page 54 and “Guidelines for
State Database Replicas” on page 54.
The default state database replica size in Solaris Volume Manager is 8192 blocks,
while the default size in Solstice DiskSuite was 1034 blocks. If you delete a
default-sized state database replica from Solstice DiskSuite, then add a new
default-sized replica with Solaris Volume Manager, you will overwrite the first
7158 blocks of any file system that occupies the rest of the shared slice, thus
destroying the data.
The system can reboot multiuser when at least one more than half of the replicas
are available. If fewer than a majority of replicas are available, you must reboot into
single-user mode and delete the unavailable replicas (by using the metadb
command).
For example, assume you have four replicas. The system will stay running as long
as two replicas (half the total number) are available. However, to reboot the system,
three replicas (half the total plus one) must be available.
In a two-disk configuration, you should always create at least two replicas on each
disk. For example, assume you have a configuration with two disks, and you only
create three replicas (two replicas on the first disk and one replica on the second
disk). If the disk with two replicas fails, the system will panic because the
remaining disk only has one replica and this is less than half the total number of
replicas.
Note – If you create two replicas on each disk in a two-disk configuration, Solaris
Volume Manager will still function if one disk fails. But because you must have one
more than half of the total replicas available for the system to reboot, you will be
unable to reboot.
The sample system has one internal IDE controller and drive, plus two SCSI
controllers, which each have six disks attached. With three controllers, the system can
be configured to avoid any single point-of-failure. Any system with only two
controllers cannot avoid a single point-of-failure relative to Solaris Volume Manager.
By distributing replicas evenly across all three controllers and across at least one disk
on each controller (across two disks if possible), the system can withstand any single
hardware failure.
A minimal configuration could put a single state database replica on slice 7 of the root
disk, then an additional replica on slice 7 of one disk on each of the other two
controllers. To help protect against the admittedly remote possibility of media failure,
using two replicas on the root disk and then two replicas on two different disks on
each controller, for a total of six replicas, provides more than adequate security.
To round out the total, add 2 additional replicas for each of the 6 mirrors, on different
disks than the mirrors. This configuration results in a total of 18 replicas with 2 on the
root disk and 8 on each of the SCSI controllers, distributed across the disks on each
controller.
This chapter provides information about performing tasks that are associated with
Solaris Volume Manager state database replicas. For information about the concepts
involved in these tasks, see Chapter 5.
Create state database Use the Solaris Volume Manager GUI “How to Create State
replicas or the metadb -a command to create Database Replicas”
state database replicas. on page 58
Check the status of state Use the Solaris Volume Manager GUI “How to Check the Status
database replicas or the metadb command to check the of State Database Replicas”
status of existing replicas. on page 60
Delete state database Use the Solaris Volume Manager GUI “How to Delete State
replicas. or the metadb -d command to delete Database Replicas”
state database replicas. on page 61
57
Creating State Database Replicas
Caution – If you upgraded from Solstice DiskSuite™to Solaris Volume Manager and
you have state database replicas sharing slices with file systems or logical volumes (as
opposed to on separate slices), do not delete existing replicas and replace them with
new default replicas in the same location.
The default state database replica size in Solaris Volume Manager is 8192 blocks, while
the default size in Solstice DiskSuite was 1034 blocks. If you delete a default-sized
state database replica from Solstice DiskSuite, and then add a new default-sized
replica with Solaris Volume Manager, you will overwrite the first 7158 blocks of any
file system that occupies the rest of the shared slice, thus destroying the data.
Note – The metadb command without options reports the status of all replicas.
The -a option adds the additional state database replica to the system, and the -f
option forces the creation of the first replica (and may be omitted when you add
supplemental replicas to the system).
The -a option adds additional state database replicas to the system. The -c 2 option
places two replicas on the specified slice. The metadb command checks that the
replicas are active, as indicated by the -a.
You can also specify the size of the state database replica with the -l option, followed
by the number of blocks. However, the default size of 8192 should be appropriate for
virtually all configurations, including those configurations with thousands of logical
volumes.
Caution – Do not replace default-sized (1034 block) state database replicas from
Solstice DiskSuite with default-sized Solaris Volume Manager replicas on a slice
shared with a file system. If you do, the new replicas will overwrite the beginning of
your file system and corrupt it.
The -a option adds the additional state database replica to the system, and the -l
option specifies the length in blocks of the replica to add.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
State Database Replicas node to view all existing state database replicas. For more
information, see the online help.
■ Use the metadb command to view the status of state database replicas. Add the -i
option to display a key to the status flags. See the metadb(1M) man page for more
information.
A legend of all the flags follows the status. The characters in front of the device name
represent the status. Uppercase letters indicate a problem status. Lowercase letters
indicate an “Okay” status.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
State Database Replicas node to view all existing state database replicas. Select
replicas to delete, then choose Edit->Delete to remove them. For more information,
see the online help.
■ Use the following form of the metadb command:
metadb -d -f ctds-of-slice
Note that you need to specify each slice from which you want to remove the state
database replica. See the metadb(1M) man page for more information.
This example shows the last replica being deleted from a slice.
Note – You must add a -f option to force deletion of the last replica on the system.
This chapter describes RAID 0 volumes (both stripes and concatenations) that are
available in Solaris Volume Manager. For information about related tasks, see
Chapter 8.
Note – A component refers to any devices, from slices to soft partitions, used in another
logical volume.
A stripe spreads data equally across all components in the stripe, while a concatenated
volume writes data to the first available component until it is full, then moves to the
next available component. A concatenated stripe is simply a stripe that has been
expanded from its original configuration by adding additional components.
63
RAID 0 volumes allow you to quickly and simply expand disk storage capacity. The
drawback is that these volumes do not provide any data redundancy, unlike RAID 1
or RAID 5 volumes. If a single component fails on a RAID 0 volume, data is lost.
You can use a RAID 0 volume containing a single slice for any file system.
You can use a RAID 0 volume that contains multiple components for any file system
except the following:
■ root (/)
■ /usr
■ swap
■ /var
■ /opt
■ Any file system that is accessed during an operating system upgrade or installation
Note – When you mirror root (/), /usr, swap, /var, or /opt, you put the file system
into a one-way concatenation or stripe (a concatenation of a single slice) that acts as a
submirror. This one-way concatenation is mirrored by another submirror, which must
also be a concatenation.
Striping enables multiple controllers to access data at the same time, which is also
called parallel access. Parallel access can increase I/O throughput because all disks in
the volume are busy most of the time servicing I/O requests.
For sequential I/O operations on a stripe, Solaris Volume Manager reads all the blocks
in a segment of blocks (called an interlace) on the first component, then all the blocks in
a segment of blocks on the second component, and so forth.
For sequential I/O operations on a concatenation, Solaris Volume Manager reads all
the blocks on the first component, then all the blocks of the second component, and so
forth.
Note – RAID 5 volumes also use an interlace value. See “Overview of RAID 5
Volumes” on page 131 for more information.
When you create a stripe, you can set the interlace value or use the Solaris Volume
Manager default interlace value of 16 Kbytes. Once you have created the stripe, you
cannot change the interlace value. However, you could back up the data on it, delete
the stripe, create a new stripe with a new interlace value, and then restore the data.
When Solaris Volume Manager stripes data from the volume to the components, it
writes data from chunk 1 to Disk A, from chunk 2 to Disk B, and from chunk 3 to Disk
C. Solaris Volume Manager then writes chunk 4 to Disk A, chunk 5 to Disk B, chunk 6
to Disk C, and so forth.
The interlace value sets the size of each chunk. The total capacity of the stripe d2
equals the number of components multiplied by the size of the smallest component. (If
each slice in the following example were 2 Gbytes, d2 would equal 6 Gbytes.)
Physical interlace 4
Slice A
RAID 0 (Stripe) Volume
interlace 1
interlace 2 interlace 2
Physical interlace 6
Slice C
A concatenation enables you to dynamically expand storage capacity and file system
sizes online. With a concatenation you can add components even if the other
components are currently active.
Note – To increase the capacity of a stripe, you need to build a concatenated stripe. See
“RAID 0 (Concatenated Stripe) Volume” on page 67.
A concatenation can also expand any active and mounted UFS file system without
having to bring down the system. In general, the total capacity of a concatenation is
equal to the total size of all the components in the concatenation. If a concatenation
contains a slice with a state database replica, the total capacity of the concatenation
would be the sum of the components minus the space that is reserved for the replica.
Note – You must use a concatenation to encapsulate root (/), swap, /usr, /opt, or
/var when mirroring these file systems.
Scenario—RAID 0 (Concatenation)
Figure 7–2 illustrates a concatenation that is made of three components (slices).
The data blocks, or chunks, are written sequentially across the components, beginning
with Disk A. Disk A can be envisioned as containing logical chunks 1 through 4.
Logical chunk 5 would be written to Disk B, which would contain logical chunks 5
through 8. Logical chunk 9 would be written to Drive C, which would contain chunks
9 through 12. The total capacity of volume d1 would be the combined capacities of the
three drives. If each drive were 2 Gbytes, volume d1 would have an overall capacity
of 6 Gbytes.
interlace 1
Physical interlace 2
Slice A interlace 3
RAID 0 (Concatenation) Volume
interlace 4
interlace 1
interlace 5 interlace 2
interlace 6 Solaris Volume ...
Physical
Slice B interlace 7 Manager ...
interlace 8 ...
interlace 12
interlace 9
Physical interlace 10
Slice C interlace 11
interlace 12
The first stripe consists of three slices, Slice A through C, with an interlace value of 16
Kbytes. The second stripe consists of two slices Slice D and E, and uses an interlace
value of 32 Kbytes. The last stripe consists of a two slices, Slice F and G. Because no
interlace value is specified for the third stripe, it inherits the value from the stripe
before it, which in this case is 32 Kbytes. Sequential data chunks are addressed to the
first stripe until that stripe has no more space. Chunks are then addressed to the
second stripe. When this stripe has no more space, chunks are addressed to the third
stripe. Within each stripe, the data chunks are interleaved according to the specified
interlace value.
Physical interlace 4
Slice A interlace 7
interlace 10
interlace 2
interlace 5
Stripe Physical
Slice B interlace 8
interlace 11
interlace 3
RAID 0 Volume
Physical interlace 6
Slice C interlace 1
interlace 9
interlace 2
interlace 12
interlace 3
interlace 13 interlace 4
Physical interlace 23
Slice F interlace 25
interlace 27
Stripe
interlace 22
Physical interlace 24
Slice G interlace 26
interlace 28
Scenario—RAID 0 Volumes
RAID 0 volumes provide the fundamental building blocks for aggregating storage or
building mirrors. The following example, drawing on the sample system explained in
Chapter 4, describes how RAID 0 volumes can provide larger storage spaces and
allow you to construct a mirror of existing file systems, including root (/).
The sample system has a collection of relatively small (9 Gbyte) disks, and it is entirely
possible that specific applications would require larger storage spaces. To create larger
spaces (and improve performance), the system administrator can create a stripe that
spans multiple disks. For example, each of c1t1d0, c1t2d0, c1t3d0 and c2t1d0,
c2t2d0, c2t3d0 could be formatted with a slice 0 that spans the entire disk. Then, a
stripe including all three of the disks from the same controller could provide
approximately 27Gbytes of storage and allow faster access. The second stripe, from the
second controller, can be used for redundancy, as described in Chapter 10 and
specifically in the “Scenario—RAID 1 Volumes (Mirrors)” on page 90.
This chapter contains information about tasks related to RAID 0 volumes. For
information about related concepts, see Chapter 7.
Create RAID 0 (stripe) Use the metainit command to create “How to Create a RAID 0
volumes a new volume. (Stripe) Volume”
on page 74
Create RAID 0 Use the metainit command to create “How to Create a RAID 0
(concatenation) volumes a new volume. (Concatenation) Volume”
on page 75
Expand storage space Use the metainit command to “How to Expand Space for
expand an existing file system. Existing Data” on page 76
73
Creating RAID 0 (Stripe) Volumes
Caution – Do not create a stripe from an existing file system or data. Doing so will
destroy data. To create a stripe from existing data, you must dump and restore the
data to the volume.
See the following examples and the metainit(1M) man page for more
information.
The stripe, d20, consists of a single stripe (the number 1) that is made of three slices
(the number 3). Because no interlace value is specified, the stripe uses the default of 16
Kbytes. The system confirms that the volume has been set up.
The stripe, d10, consists of a single stripe (the number 1) that is made of two slices
(the number 2). The -i option sets the interlace value to 32 Kbytes. (The interlace
value cannot be less than 8 Kbytes, nor greater than 100 Mbytes.) The system verifies
that the volume has been set up.
This example shows the creation of a concatenation, d25, that consists of one stripe
(the first number 1) made of a single slice (the second number 1 in front of the slice).
The system verifies that the volume has been set up.
Note – This example shows a concatenation that can safely encapsulate existing data.
This example creates a concatenation called d40 that consists of four “stripes” (the
number 4), each made of a single slice (the number 1 in front of each slice). The system
verifies that the volume has been set up.
4. Edit the /etc/vfstab file so that the file system references the name of the
concatenation.
This example shows the creation of a concatenation called d25 out of two slices,
/dev/dsk/c0t1d0s2 (which contains a file system mounted on /docs) and
/dev/dsk/c0t2d0s2. The file system must first be unmounted.
Caution – The first slice in the metainit command must be the slice that contains the
file system. If not, you will corrupt your data.
Next, the entry for the file system in the /etc/vfstab file is changed (or entered for
the first time) to reference the concatenation. For example, the following line:
/dev/dsk/c0t1d0s2 /dev/rdsk/c0t1d0s2 /docs ufs 2 yes -
An application, such as a database, that uses the raw concatenation must have its own
way of recognizing the concatenation, or of growing the added space.
This procedure assumes that you are adding an additional stripe to an existing stripe.
This example takes an existing three-way stripe, d25, and concatenates another
three-way stripe. Because no interlace value is given for the attached slices, they
inherit the interlace value configured for d25. The system verifies that the volume has
been set up.
An application, such as a database, that uses the raw volume must have its own way
of recognizing the volume, or of growing the added space.
To prepare a newly created concatenated stripe for a file system, see “Creating File
Systems (Tasks)” in System Administration Guide: Basic Administration.
Removing a Volume
See the following example and the metaclear(1M) man page for more
information.
Example—Removing a Concatenation
# umount d8
# metaclear d8
d8: Concat/Stripe is cleared
(Edit the /etc/vfstab file)
This example illustrated clearing the concatenation d8 that also contains a mounted
file system. The file system must be unmounted before the volume can be cleared. The
system displays a confirmation message that the concatenation is cleared. If there is an
entry in the /etc/vfstab file for this volume, delete that entry. You do not want to
confuse the system by asking it to mount a file system on a nonexistent volume.
This chapter explains essential Solaris Volume Manager concepts related to mirrors
and submirrors. For information about performing related tasks, see Chapter 10.
After you configure a mirror, it can be used just as if it were a physical slice.
You can mirror any file system, including existing file systems. You can also use a
mirror for any application, such as a database.
Tip – Use Solaris Volume Manager’s hot spare feature with mirrors to keep data safe
and available. For information on hot spares, see Chapter 15 and Chapter 16.
81
Overview of Submirrors
The RAID 0 volumes that are mirrored are called submirrors. A mirror is made of one or
more RAID 0 volumes (stripes or concatenations).
If you take a submirror “offline,” the mirror stops reading and writing to the
submirror. At this point, you could access the submirror itself, for example, to perform
a backup. However, the submirror is in a read-only state. While a submirror is offline,
Solaris Volume Manager keeps track of all writes to the mirror. When the submirror is
brought back online, only the portions of the mirror that were written while the
submirror was offline (resynchronization regions) are resynchronized. Submirrors can
also be taken offline to troubleshoot or repair physical devices which have errors.
Submirrors can be attached or detached from a mirror at any time, though at least one
submirror must remain attached at all times.
Normally, you create a mirror with only a single submirror. Then, you attach a second
submirror after you create the mirror.
Solaris Volume Manager software makes duplicate copies of the data on multiple
physical disks, and presents one virtual disk to the application. All disk writes are
duplicated; disk reads come from one of the underlying submirrors. The total capacity
of mirror d2 is the size of the smallest of the submirrors (if they are not of equal size).
Note – Solaris Volume Manager cannot always provide RAID 1+0 functionality.
However, in a best practices environment, where both submirrors are identical to each
other and are made up of disk slices (and not soft partitions), RAID 1+0 will be
possible.
For example, with a pure RAID 0+1 implementation and a two-way mirror that
consists of three striped slices, a single slice failure could fail one side of the mirror.
And, assuming that no hot spares were in use, a second slice failure would fail the
mirror. Using Solaris Volume Manager, up to three slices could potentially fail without
failing the mirror, because each of the three striped slices are individually mirrored to
their counterparts on the other half of the mirror.
RAID 1 Volume
Mirror d1 consists of two submirrors, each of which consists of three identical physical
disks and the same interlace value. A failure of three disks, A, B, and F can be tolerated
because the entire logical block range of the mirror is still contained on at least one
good disk.
If, however, disks A and D fail, a portion of the mirror’s data is no longer available on
any disk and access to these logical blocks will fail.
When a portion of a mirror’s data is unavailable due to multiple slice errors, access to
portions of the mirror where data is still available will succeed. Under this situation,
the mirror acts like a single disk that has developed bad blocks. The damaged portions
are unavailable, but the rest is available.
You can define mirror options when you initially create the mirror, or after a mirror
has been set up. For tasks related to changing these options, see “How to Change
RAID 1 Volume Options” on page 110.
Round Robin Attempts to balance the load across the submirrors. All reads are made
(Default) in a round-robin order (one after another) from all submirrors in a
mirror.
First Directs all reads to the first submirror. This policy should be used only
when the device or devices that comprise the first submirror are
substantially faster than those of the second submirror.
Parallel (Default) A write to a mirror is replicated and dispatched to all of the submirrors
simultaneously.
Serial Performs writes to submirrors serially (that is, the first submirror write
completes before the second is started). The serial option specifies that
writes to one submirror must complete before the next submirror write
is initiated. The serial option is provided in case a submirror becomes
unreadable, for example, due to a power failure.
While the resynchronization takes place, the mirror remains readable and writable by
users.
Full Resynchronization
When a new submirror is attached (added) to a mirror, all the data from another
submirror in the mirror is automatically written to the newly attached submirror. Once
the mirror resynchronization is done, the new submirror is readable. A submirror
remains attached to a mirror until it is explicitly detached.
Caution – A pass number of 0 (zero) should only be used on mirrors that are mounted
as read-only.
Partial Resynchronization
Following a replacement of a slice within a submirror, Solaris Volume Manager
performs a partial mirror resynchronization of data. Solaris Volume Manager copies the
data from the remaining good slices of another submirror to the replaced slice.
Pass Number
The pass number, a number in the range 0–9, determines the order in which a
particular mirror is resynchronized during a system reboot. The default pass number
is 1. Smaller pass numbers are resynchronized first. If 0 is used, the mirror
resynchronization is skipped. A pass number of 0 should be used only for mirrors that
are mounted as read-only. Mirrors with the same pass number are resynchronized at
the same time.
Caution – When you create a mirror for an existing file system, be sure that the
initial submirror contains the existing file system.
■ When creating a mirror, first create a one-way mirror, then attach a second
submirror. This strategy starts a resynchronization operation and ensures that data
is not corrupted.
■ You can create a one-way mirror for a future two-way or three-way mirror.
■ You can create up to a three-way mirror. However, two-way mirrors usually
provide sufficient data redundancy for most applications, and are less expensive in
terms of disk drive costs. A three-way mirror enables you to take a submirror
offline and perform a backup while maintaining a two-way mirror for continued
data redundancy.
■ Use components of identical size when creating submirrors. Using components of
different sizes leaves wasted space in the mirror.
■ Adding additional state database replicas before you create a mirror can improve
the mirror’s performance. As a general rule, add two additional replicas for each
mirror you add to the system. Solaris Volume Manager uses these additional
replicas to store the dirty region log (DRL), used to provide optimized
resynchronization. By providing adequate numbers of replicas to prevent
contention or using replicas on the same disks or controllers as the mirror they log,
you will improve overall performance.
As described in “Interlace Values for Stripes” on page 65, the sample system has two
RAID 0 volumes, each of which is approximately 27Gbytes in size and spans three
disks. By creating a RAID 1 volume to mirror these two RAID 0 volumes, a fully
redundant storage space can provide resilient data storage.
Within this RAID 1 volume, the failure of either of the disk controllers will not
interrupt access to the volume. Similarly, failure of up to three individual disks might
be tolerated without access interruption.
This chapter explains how to perform Solaris Volume Manager tasks related to RAID 1
volumes. For information about related concepts, see Chapter 9.
Create a mirror from Use the Solaris Volume Manager GUI “How to Create a RAID 1
unused slices or the metainit command to create a Volume From Unused
mirror from unused slices. Slices” on page 95
Create a mirror from an Use the Solaris Volume Manager GUI “How to Create a RAID 1
existing file system or the metainit command to create a Volume From a File
mirror from an existing file system. System” on page 97
Record the path to the Find the path to the alternative book “How to Record the Path to
alternate boot device for device and enter it in the boot the Alternate Boot Device”
a mirrored root instructions. on page 102
Attach a submirror Use the Solaris Volume Manager GUI “How to Attach a
or the metattach command to attach Submirror” on page 103
a submirror.
Detach a submirror Use the Solaris Volume Manager GUI “How to Detach a
or the metadetach command to Submirror” on page 104
detach the submirror.
93
Task Description Instructions
Place a submirror online Use the Solaris Volume Manager GUI “How to Place a Submirror
or take a submirror or the metaonline command to put a Offline and Online”
offline submirror online. Use the Solaris on page 105
Volume Manager GUI or the
metaoffline command to take a
submirror offline..
Enable a component Use the Solaris Volume Manager GUI “How to Enable a Slice in a
within a submirror or the metareplace command to Submirror” on page 106
enable a slice in a submirror.
Check mirror status Use the Solaris Volume Manager GUI “How to Check the Status
or the metastat command to check of Mirrors and Submirrors”
the status of RAID 1 volumes. on page 108
Change mirror options Use the Solaris Volume Manager GUI “How to Change RAID 1
or the metaparam command to Volume Options”
change the options for a specific RAID on page 110
1 volume.
Expand a mirror Use the Solaris Volume Manager GUI “How to Expand a RAID 1
or the metattach command to Volume” on page 111
expand the capacity of a mirror.
Replace a slice within a Use the Solaris Volume Manager GUI “How to Replace a Slice in
submirror or the metareplace command to a Submirror” on page 112
replace a slice in a submirror.
Replace a submirror Use the Solaris Volume Manager GUI “How to Replace a
or the metattach command to Submirror” on page 113
replace a submirror.
Remove a mirror Use the Solaris Volume Manager GUI “How to Unmirror a File
(unmirror) or the metadetach command or the System” on page 115
metaclear command to unmirror a
file system.
Remove a mirror Use the Solaris Volume Manager GUI “How to Unmirror a File
(unmirror) of a file or the metadetach command or the System That Cannot Be
system that cannot be metaclear command to unmirror a Unmounted” on page 116
unmounted file system that cannot be unmounted.
Use a mirror to perform Use the Solaris Volume Manager GUI “How to Use a RAID 1
backups or the metaonline command and the Volume to Make an Online
metaoffline commands to perform Backup” on page 118
backups with mirrors.
See the following examples and the metainit(1M) man page for more
information.
See the following examples and the metattach(1M) man page for more
information.
This example shows the creation of a two-way mirror, d50. The metainit command
creates two submirrors (d51 and d52), which are RAID 0 volumes. The metainit -m
command creates the one-way mirror from the d51 RAID 0 volume. The metattach
command attaches d52, creating a two-way mirror and causing a resynchronization.
(Any data on the attached submirror is overwritten by the other submirror during the
resynchronization.) The system verifies that the objects are defined.
This example creates a two-way mirror, d50. The metainit command creates two
submirrors (d51 and d52), which are RAID 0 volumes. The metainit -m command
with both submirrors creates the mirror from the d51 RAID 0 volume and avoids
resynchronization. It is assumed that all information on the mirror is considered
invalid and will be regenerated (for example, through a newfs operation) before the
mirror is used.
Note – When mirroring root (/), it is essential that you record the secondary root slice
name to reboot the system if the primary submirror fails. This information should be
written down, not recorded on the system, which might not be available. See
Chapter 24 for details on recording the alternate boot device, and on booting from the
alternate boot device.
If you are mirroring root on an IA system, install the boot information on the alternate
boot disk before you create the RAID 0 or RAID 1 devices. See “SPARC: Booting a
System (Tasks)” in System Administration Guide: Basic Administration.
2. Identify the slice that contains the existing file system to be mirrored (c1t0d0s0 in
this example).
3. Create a new RAID 0 volume on the slice from the previous step by using one of
the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose Action->Create Volume and follow the instructions on
screen. For more information, see the online help.
■ Use the metainit raid-0-volume-name -f 1 1 ctds-of-slice command.
# metainit d1 -f 1 1 c1t0d0s0
Note – When you create a mirror from an existing file system, you must follow the
next two steps precisely to avoid data corruption.
If you are mirroring any file system other than the root (/) file system, then edit the
/etc/vfstab file so that the file system mount instructions refer to the mirror, not to
the block device.
For more information about the/etc/vfstab file, see“Mounting File Systems ()” in
System Administration Guide: Basic Administration.
6. Remount your newly mirrored file system according to one of the following
procedures:
■ If you are mirroring your root (/) file system, run the metaroot d0 command,
replacing d0 with the name of the mirror you just created, then reboot your system.
For more information, see the metaroot(1M) man page.
■ If you are mirroring a file system that can be unmounted, then unmount and
remount the file system.
■ If you are mirroring a file system other than root (/) that cannot be unmounted,
then reboot your system.
8. If you mirrored your root file system, record the alternate boot path.
See “How to Record the Path to the Alternate Boot Device” on page 102.
The -f option forces the creation of the first concatenation, d1, which contains the
mounted file system /master on /dev/dsk/c1t0d0s0. The second concatenation,
d2, is created from /dev/dsk/c1t1d0s0. (This slice must be the same size or greater
than that of d1.) The metainit command with the -m option creates the one-way
mirror, d0, from d1.
Next, the entry for the file system should be changed in the /etc/vfstab file to
reference the mirror. For example, the following line:
/dev/dsk/c1t0d0s0 /dev/rdsk/c1t0d0s0 /var ufs 2 yes -
Finally, the file system is remounted and submirror d2 is attached to the mirror,
causing a mirror resynchronization. The system confirms that the RAID 0 and RAID 1
volumes are set up, and that submirror d2 is attached.
Note – Do not attach the second submirror before the system is rebooted. You must
reboot between running the metaroot command and attaching the second submirror.
The -f option forces the creation of the first RAID 0 volume, d1, which contains the
mounted file system root (/) on /dev/dsk/c0t0d0s0. The second concatenation, d2,
is created from /dev/dsk/c0t1d0s0. (This slice must be the same size or greater
than that of d1.) The metainit command with the -m option creates the one-way
mirror d0 using the concatenation that contains root (/).
Next, the metaroot command edits the /etc/vfstab and /etc/system files so
that the system can be booted with the root file system (/) on a volume. (It is a good
idea to run the lockfs -fa command before rebooting.) After a reboot, the
submirror d2 is attached to the mirror, causing a mirror resynchronization. (The
system confirms that the concatenations and the mirror are set up, and that submirror
d2 is attached.) The ls -l command is run on the root raw device to determine the
path to the alternate root device in case the system might later need to be booted from
it.
The -f option forces the creation of the first concatenation, d12, which contains the
mounted file system /usr on /dev/dsk/c0t3d0s6. The second concatenation, d22,
is created from /dev/dsk/c1t0d0s6. (This slice must be the same size or greater
After a reboot, the second submirror d22 is attached to the mirror, causing a mirror
resynchronization. (The system confirms that the concatenation and the mirror are set
up, and that submirror d22 is attached.
The -f option forces the creation of the first concatenation, d11, which contains the
mounted file system swap on /dev/dsk/c0t0d0s1. The second concatenation, d21,
is created from /dev/dsk/c1t0d0s1. (This slice must be the same size or greater
than that of d11.) The metainit command with the -m option creates the one-way
mirror d1 using the concatenation that contains swap. Next, if there is an entry for
swap in the /etc/vfstab file, it must be edited to reference the mirror. For example,
the following line:
/dev/dsk/c0t0d0s1 - - swap - no -
After a reboot, the second submirror d21 is attached to the mirror, causing a mirror
resynchronization. (The system confirms that the concatenations and the mirror are set
up, and that submirror d21 is attached.
To save the crash dump when you have mirrored swap, use the dumpadm command to
configure the dump device as a volume. For instance, if the swap device is named
/dev/md/dsk/d2, use the dumpadm command to set this device as the dump device.
Here you would record the string that follows the /devices directory:
/sbus@1,f8000000/esp@1,200000/sd@3,0:a.
Solaris Volume Manager users who are using a system with OpenBoot™ Prom can use
the OpenBoot nvalias command to define a “backup root” device alias for the
secondary root (/) mirror. For example:
ok nvalias backup_root /sbus@1,f8000000/esp@1,200000/sd@3,0:a
Then, redefine the boot-device alias to reference both the primary and secondary
submirrors, in the order in which you want them to be used, and store the
configuration.
ok printenv boot-device
boot-device = disk net
In the event of primary root disk failure, the system would automatically boot to the
second submirror. Or, if you boot manually, rather than using auto boot, you would
only enter:
ok boot backup_root
Here, you would record the string that follows the /devices directory:
/eisa/eha@1000,0/cmdk@1,0:a
Example—Attaching a Submirror
# metastat d30
d30: mirror
Submirror 0: d60
State: Okay
...
# metattach d30 d70
d30: submirror d70 is attached
# metastat d30
d30: mirror
Submirror 0: d60
State: Okay
Submirror 1: d70
State: Resyncing
Resync in progress: 41 % done
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2006130 blocks
...
This example shows the attaching of a submirror, d70, to a one-way mirror, d30,
creating a two-way mirror. The mirror d30 initially consists of submirror d60. The
submirror d70 is a RAID 0 volume. You verify that the status of the mirror is “Okay”
with the metastat command, then attach the submirror. When the metattach
command is run, the new submirror is resynchronized with the existing mirror. When
you attach an additional submirror to the mirror, the system displays a message. To
verify that the mirror is resynchronizing, use the metastat command.
Example—Detaching a Submirror
# metastat
d5: mirror
Submirror 0: d50
...
# metadetach d5 d50
d5: submirror d50 is detached
In this example, mirror d5 has a submirror, d50, which is detached with the
metadetach command. The underlying slices from d50 are going to be reused
elsewhere. When you detach a submirror from a mirror, the system displays a
confirmation message.
Note – The metaoffline command’s capabilities are similar to that offered by the
metadetach command. However, the metaoffline command does not sever the
logical association between the submirror and the mirror.
1. Make sure that you have root privilege and that you have a current backup of all
data.
In this example, submirror d11 is taken offline from mirror d10. Reads will continue
to be made from the other submirror. The mirror will be out of sync as soon as the first
write is made. This inconsistency is corrected when the offlined submirror is brought
back online.
In this example, the mirror d11 has a submirror that contains slice, c1t4d0s7, which
had a soft error. The metareplace command with the -e option enables the failed
slice.
Note – If a physical disk is defective, you can either replace it with another available
disk (and its slices) on the system as documented in “How to Replace a Slice in a
Submirror” on page 112. Alternatively, you can repair/replace the disk, format it, and
run the metareplace command with the -e option as shown in this example.
State Meaning
Needs Maintenance A slice (or slices) in the submirror has encountered an I/O error or an
open error. All reads and writes to and from this slice in the submirror
have been discontinued.
Additionally, for each slice in a submirror, the metastat command shows the
“Device” (device name of the slice in the stripe); “Start Block” on which the slice
begins; “Dbase” to show if the slice contains a state database replica; “State” of the
slice; and “Hot Spare” to show the slice being used to hot spare a failed slice.
The slice state is perhaps the most important information when you are
troubleshooting mirror errors. The submirror state only provides general status
information, such as “Okay” or “Needs Maintenance.” If the submirror reports a
The following table explains the slice states for submirrors and possible actions to
take.
Resyncing The component is actively being If desired, monitor the submirror status
resynchronized. An error has until the resynchronization is done.
occurred and been corrected, the
submirror has just been brought
back online, or a new submirror
has been added.
Maintenance The component has encountered Enable or replace the failed component.
an I/O error or an open error. All See “How to Enable a Slice in a
reads and writes to and from this Submirror” on page 106, or “How to
component have been Replace a Slice in a Submirror”
discontinued. on page 112. The metastat command
will show an invoke recovery message
with the appropriate action to take with
the metareplace command. You can also
use the metareplace -e command.
Last Erred The component has encountered First, enable or replace components in the
an I/O error or an open error. “Maintenance” state. See “How to Enable
However, the data is not replicated a Slice in a Submirror” on page 106, or
elsewhere due to another slice “How to Replace a Slice in a Submirror”
failure. I/O is still performed on on page 112. Usually, this error results in
the slice. If I/O errors result, the some data loss, so validate the mirror after
mirror I/O will fail. it is fixed. For a file system, use the fsck
command, then check the data. An
application or database must have its own
method of validating the device.
See “How to Change RAID 1 Volume Options” on page 110 to change a mirror’s
pass number, read option, or write option.
d1: Submirror of d0
State: Okay
Size: 5600 blocks
...
For each submirror in the mirror, the metastat command shows the state, an
“invoke” line if there is an error, the assigned hot spare pool (if any), size in blocks,
and information about each slice in the submirror.
See “RAID 1 Volume Options” on page 86 for a description of mirror options. Also
see the metaparam(1M) man page.
Each submirror in a mirror must be expanded. See the metattach(1M) man page
for more information.
This example shows how to expand a mirrored mounted file system by concatenating
two disk drives to the mirror’s two submirrors. The mirror is named d8 and contains
two submirrors named d9 and d10.
An application, such as a database, that uses the raw volume must have its own way
of growing the added space.
See the following examples and the metainit(1M) man page for more
information.
The metastat command confirms that mirror d6 has a submirror, d26, with a slice in
the “Needs maintenance” state. The metareplace command replaces the slice as
specified in the “Invoke” line of the metastat output with another available slice on
the system. The system confirms that the slice is replaced, and starts resynchronizing
the submirror.
Note – The specific configuration of the new volume d22 will depend on the
component you are replacing. A concatenation, as shown here, would be fine to
replace a concatenation, but would not be an ideal replacement for a stripe as it could
impact performance.
# metastat d20
d20: Mirror
Submirror 0: d21
State: Okay
Submirror 1: d22
State: Needs maintenance
...
# metadetach -f d20 d22
d20: submirror d22 is detached
# metaclear -f d22
d22: Concat/Stripe is cleared
# metainit d22 2 1 c1t0d0s2 1 c1t0d1s2
d22: Concat/Stripe is setup
# metattach d20 d22
d20: components are attached
The metastat command confirms that the two-way mirror d20 has a submirror, d22,
in the “Needs maintenance” state. In this case, the entire submirror will be cleared and
recreated. The metadetach command detaches the failed submirror from the mirror
by using the -f option, which forces the detach to occur. The metaclear command
clears the submirror. The metainit command recreates submirror d22, with new
slices. The metattach command attaches the rebuilt submirror, and a mirror
resynchronization begins automatically.
Note – You temporarily lose the capability for data redundancy while the mirror is a
one-way mirror.
1. Make sure that you have root privilege and that you have a current backup of all
data.
5. Detach the submirror that will continue to be used for the file system
For more information, see the metadetach(1M) man page.
# metadetach d1 d10
7. Edit the /etc/vfstab file to use the component detached in Step 5, if necessary.
In this example, the /opt filesystem is made of a two-way mirror named d4; its
submirrors are d2 and d3, made of slices /dev/dsk/c0t0d0s0 and
/dev/dsk/c1t0d0s0, respectively. The metastat command verifies that at least
one submirror is in the “Okay” state. (A mirror with no submirrors in the “Okay” state
must be repaired first.) The file system is unmounted then submirror d2 is detached.
The metaclear -r command deletes the mirror and the other submirror, d3.
Next, the entry for /opt in the /etc/vfstab file is changed to reference the
underlying slice. For example, if d4 were the mirror and d2 the submirror, the
following line:
/dev/md/dsk/d4 /dev/md/rdsk/d4 /var ufs 2 yes -
By using the submirror name, you can continue to have the file system mounted on a
volume. Finally, the /opt file system is remounted.
Note – By using d2 instead of d4 in the /etc/vfstab file, you have unmirrored the
mirror. Because d2 consists of a single slice, you can mount the file system on the slice
name (/dev/dsk/c0t0d0s0) if you do not want the device to support a volume.
1. Run the metastat command to verify that at least one submirror is in the “Okay”
state.
2. Run the metadetach command on the mirror that contains root (/), /usr, /opt, or
swap to make a one-way mirror.
In this example, root (/) is a two-way mirror named d0; its submirrors are d10 and
d20, which are made of slices /dev/dsk/c0t3d0s0 and /dev/dsk/c1t3d0s0,
respectively. The metastat command verifies that at least one submirror is in the
“Okay” state. (A mirror with no submirrors in the “Okay” state must first be repaired.)
Submirror d20 is detached to make d0 a one-way mirror. The metaroot command is
then run, using the rootslice from which the system is going to boot. This command
edits the /etc/system and /etc/vfstab files to remove information that specifies
the mirroring of root (/). After a reboot, the metaclear -r command deletes the
mirror and the other submirror, d10. The last metaclear command clears submirror
d20.
Example—Unmirroring swap
# metastat d1
d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d21
State: Okay
...
# metadetach d1 d21
d1: submirror d21 is detached
(Edit the /etc/vfstab file to change the entry for swap from metadevice to slice name)
# reboot
...
# metaclear -r d1
d1: Mirror is cleared
In this example, swap is made of a two-way mirror named d1; its submirrors are d11
and d21, which are made of slices /dev/dsk/c0t3d0s1 and /dev/dsk/c1t3d0s1,
respectively. The metastat command verifies that at least one submirror is in the
“Okay” state. (A mirror with no submirrors in the “Okay” state must first be repaired.)
Submirror d21 is detached to make d1 a one-way mirror. Next, the /etc/vfstab file
must be edited to change the entry for swap to reference the slice that is in submirror
d21. For example, if d1 was the mirror, and d21 the submirror containing slice
/dev/dsk/c0t3d0s1, the following line:
/dev/md/dsk/d1 - - swap - no -
After a reboot, the metaclear -r command deletes the mirror and the other
submirror, d11. The final metaclear command clears submirror d21.
Note – If you use these procedures regularly, put them into a script for ease of use.
1. Run the metastat command to make sure the mirror is in the “Okay” state.
A mirror that is in the “Maintenance” state should be repaired first.
2. For all file systems except root (/), lock the file system from writes.
# /usr/sbin/lockfs -w mount point
Only a UFS needs to be write-locked. If the volume is set up as a raw device for
database management software or some other application, running lockfs is not
necessary. (You might, however, want to run the appropriate vendor-supplied utility
to flush any buffers and lock access.)
Caution – Write-locking root (/) causes the system to hang, so it should never be
performed.
Note – To ensure a proper backup, use the raw volume, for example,
/dev/md/rdsk/d4. Using “rdsk” allows greater than 2 Gbyte access.
This chapter provides information about Solaris Volume Manager soft partitions. For
information about related tasks, see Chapter 12.
Solaris Volume Manager can support up to 8192 logical volumes per disk set
(including the local, or unspecified, disk set), but is configured for 128 (d0–d127) by
default. To increase the number of logical volumes, see “Changing Solaris Volume
Manager Defaults” on page 226.
Note – Do not increase the number of possible logical volumes far beyond the number
that you will actually use. Solaris Volume Manager creates a device node
(/dev/dsk/md/*) and associated data structures for every logical volume that is
permitted by the maximum value. These additional possible volumes can result in a
substantial performance impact.
121
You use soft partitions to divide a disk slice or logical volume into as many partitions
as needed. You must provide a name for each division or soft partition, just like you do
for other storage volumes, such as stripes or mirrors. A soft partition, once named, can
be accessed by applications, including file systems, as long as it is not included in
another volume. Once included in a volume, the soft partition should no longer be
directly accessed.
Soft partitions can be placed directly above a disk slice, or on top of a mirror, stripe or
RAID 5 volume. Nesting of a soft partition between volumes is not allowed. For
example, a soft partition built on a stripe with a mirror built on the soft partition is not
allowed.
Although a soft partition appears, to file systems and other applications, to be a single
contiguous logical volume, it actually comprises a series of extents that could be
located at arbitrary locations on the underlying media. In addition to the soft
partitions, extent headers (also called system recovery data areas) on disk record
information about the soft partitions to facilitate recovery in the event of a catastrophic
system failure.
When you partition a disk and build file systems on the resulting slices, you cannot
later extend a slice without modifying or destroying the disk format. With soft
partitions, you can extend the soft partitions up to the amount of space on the
underlying device without moving or destroying data on other soft partitions.
Scenario—Soft Partitions
Soft partitions provide tools with which to subdivide larger storage spaces into more
managable spaces. For example, in other scenarios (“Scenario—RAID 1 Volumes
(Mirrors)” on page 90 or “Scenario—RAID 5 Volumes” on page 136), large storage
aggregations provided redundant storage of many Gigabytes. However, many
possible scenarios would not require so much space—at least at first. Soft partitions
allow you to subdivide that storage space into more manageable sections, each of
which can have a complete file system. For example, you could create 1000 soft
partitions on top of a RAID 1 or RAID 5 volume so that each of your users can have a
home directory on a separate file system. If a user needs more space, simply expand
the soft partition.
This chapter provides information about performing tasks that are associated with
Solaris Volume Manager soft partitions. For information about the concepts involved
in these tasks, see Chapter 11.
Create soft partitions Use the Solaris Volume Manager GUI “How to Create a Soft
or the metainit command to create Partition” on page 126
soft partitions.
Check the status of soft Use the Solaris Volume Manager GUI “How to Check the Status
partitions or the metastat command to check of a Soft Partition”
the status of soft partitions. on page 127
Expand soft partitions Use the Solaris Volume Manager GUI “How to Expand a Soft
or the metattach command to Partition” on page 128
expand soft partitions.
Remove soft partitions Use the Solaris Volume Manager GUI “How to Remove a Soft
or the metaclear command to Partition” on page 129
remove soft partitions.
125
Creating Soft Partitions
-s is used to specify which set is being used. If -s isn’t specified, the local (default)
disk set is used.
-e is used to specify that the entire disk should be reformatted to provide a slice 0,
taking most of the disk, and a slice 7 of a minimum of 4 Mbytes in size to contain a
state database replica.
soft-partition is the name of the soft partition. The name is of the form dnnn, where
nnn is a number in the range of 0 to 8192.
component is the disk, slice, or (logical) volume from which to create the soft
partition. All existing data on the component is destroyed because the soft partition
headers are written at the beginning of the component.
size is the size of the soft partition, and is specified as a number followed by one of
the following:
■ M or m for megabytes
■ G or g for gigabytes
■ T or t for terabyte
■ B or b for blocks (sectors)
See the following examples and the metainit(1M) man page for more information.
2. Use one of the following methods to check the status of a soft partition:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node. Choose the soft partition you want to monitor, then choose
Action->Properties, then follow the instructions on screen. For more information,
see the online help.
■ To view the existing configuration, use the following format of the metastat
command:
metastat soft-partition
See “Viewing the Solaris Volume Manager Configuration” on page 218 for more
information.
disk-set is the name of the disk set in which the soft partition exists.
where:
■ disk-set is the disk set in which the soft partition exists.
■ soft-partition is the soft partition to delete.
■ r specifies to recursively delete logical volumes, but not volumes on which
others depend.
■ p specifies to purge all soft partitions on the specified component, except those
soft partitions that are open.
■ component is the component from which to clear all of the soft partitions.
This chapter provides conceptual information about Solaris Volume Manager RAID 5
volumes. For information about performing related tasks, see Chapter 14.
A RAID 5 volume uses storage capacity equivalent to one component in the volume to
store redundant information (parity) about user data stored on the remainder of the
RAID 5 volume’s components. That is, if you have three components, the equivalent
of one will be used for the parity information. If you have five components, then the
equivalent of one will be used for parity information. The parity is distributed across
all components in the volume. Like a mirror, a RAID 5 volume increases data
availability, but with a minimum of cost in terms of hardware and only a moderate
penalty for write operations. However, you cannot use a RAID 5 volume for root (/),
/usr, and swap, or for existing file systems.
131
Example—RAID 5 Volume
Figure 13–1 shows a RAID 5 volume, d40.
The first three data chunks are written to Disks A through C. The next chunk that is
written is a parity chunk, written to Drive D, which consists of an exclusive OR of the
first three chunks of data. This pattern of writing data and parity chunks results in
both data and parity being spread across all disks in the RAID 5 volume. Each drive
can be read independently. The parity protects against a single disk failure. If each
disk in this example were 2 Gbytes, the total capacity of d40 would be 6 Gbytes. (One
drive’s worth of space is allocated to parity.)
interlace 1
Component interlace 4
A interlace 7 RAID 5 Volume
P(10-12)
interlace 1
interlace 2
interlace 2
interlace 3
Component interlace 5
interlace 4
B P(7-9)
interlace 5
interlace 10
Solaris Volume interlace 6
Manager interlace 7
interlace 3
interlace 8
Component P(4-6)
interlace 9
C interlace 8
interlace 10
interlace 11
interlace 11
interlace 12
P(1-3)
Component interlace 6
D interlace 9
interlace 12
interlace 1
Component interlace 4
A interlace 7
P(10-12, 16)
interlace 2
Component interlace 5
RAID 5 Volume
B P(7-9, 15)
interlace 1
interlace 10
interlace 2
interlace 3 interlace 3
interlace 11 interlace 6
interlace 7
interlace 12 interlace 11
interlace 12
interlace 13 interlace 13
Concatenated RAID 5 volumes are not suited for long-term use. Use a concatenated
RAID 5 volume until it is possible to reconfigure a larger RAID 5 volume and copy the
data to the larger volume.
Note – When you add a new component to a RAID 5 volume, Solaris Volume
Manager “zeros” all the blocks in that component. This process ensures that the parity
will protect the new data. As data is written to the additional space, Solaris Volume
Manager includes it in the parity calculations.
Scenario—RAID 5 Volumes
RAID 5 volumes allow you to have redundant storage without the overhead of RAID
1 volumes, which require two times the total storage space to provide data
redundancy. By setting up a RAID 5 volume, you can provide redundant storage of
greater capacity than you could achieve with RAID 1 on the same set of disk
components, and, with the help of hot spares (see Chapter 15 and specifically “How
Hot Spares Work” on page 148), nearly the same level of safety. The drawbacks are
increased write time and markedly impaired performance in the event of a component
failure, but those tradeoffs might be insignificant for many situations. The following
example, drawing on the sample system explained in Chapter 4, describes how RAID
5 volumes can provide extra storage capacity.
Other scenarios for RAID 0 and RAID 1 volumes used 6 slices (c1t1d0, c1t2d0,
c1t3d0, c2t1d0, c2t2d0, c2t3d0) on six disks, spread over two controllers, to
provide 27 Gbytes of redundant storage. By using the same slices in a RAID 5
configuration, 45 Gbytes of storage is available, and the configuration can withstand a
single component failure without data loss or access interruption. By adding hot
spares to the configuration, the RAID 5 volume can withstand additional component
failures. The most significant drawback to this approach is that a controller failure
would result in data loss to this RAID 5 volume, while it would not with the RAID 1
volume described in “Scenario—RAID 1 Volumes (Mirrors)” on page 90.
This chapter provides information about performing Solaris Volume Manager tasks
that are associated with RAID 5 volumes. For information about the concepts involved
in these tasks, see Chapter 13.
Create RAID 5 volumes Use the Solaris Volume Manager GUI “How to Create a RAID 5
or the metainit command to create Volume” on page 138
RAID 5 volumes.
Check the status of Use the Solaris Volume Manager GUI “How to Check the Status
RAID 5 volumes or the metastat command to check of RAID 5 Volumes”
the status of RAID 5 volumes. on page 139
Expand a RAID 5 Use the Solaris Volume Manager GUI “How to Expand a RAID 5
volume or the metattach command to Volume” on page 142
expand RAID 5 volumes.
Enable a slice in a RAID Use the Solaris Volume Manager GUI “How to Enable a
5 volume or the metareplace command to Component in a RAID 5
enable slices in RAID 5 volumes. Volume” on page 142
Replace a slice in a Use the Solaris Volume Manager GUI “How to Replace a
RAID 5 volume or the metareplace command to Component in a RAID 5
enable slices in RAID 5 volumes. Volume” on page 143
137
Creating RAID 5 Volumes
In this example, the RAID 5 volume d45 is created with the -r option from three
slices. Because no interlace value is specified, d45 uses the default of 16 Kbytes. The
system verifies that the RAID 5 volume has been set up, and begins initializing the
volume.
Note – You must wait for the initialization to finish before you can use the RAID 5
volume.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node and view the status of the volumes. Choose a volume, then choose
Action->Properties to see more detailed information. For more information, see the
online help.
■ Use the metastat command.
For each slice in the RAID 5 volume, the metastat command shows the
following:
■ “Device” (device name of the slice in the stripe)
■ “Start Block” on which the slice begins
■ “Dbase” to show if the slice contains a state database replica
■ “State” of the slice
■ “Hot Spare” to show the slice being used to hot spare a failed slice
State Meaning
Initializing Slices are in the process of having all disk blocks zeroed. This process is
necessary due to the nature of RAID 5 volumes with respect to data and
parity interlace striping.
Once the state changes to “Okay,” the initialization process is complete
and you are able to open the device. Up to this point, applications
receive error messages.
Okay The device is ready for use and is currently free from errors.
Maintenance A slice has been marked as failed due to I/O or open errors that were
encountered during a read or write operation.
The slice state is perhaps the most important information when you are
troubleshooting RAID 5 volume errors. The RAID 5 state only provides general status
information, such as “Okay” or “Needs Maintenance.” If the RAID 5 reports a “Needs
Maintenance” state, refer to the slice state. You take a different recovery action if the
slice is in the “Maintenance” or “Last Erred” state. If you only have a slice in the
“Maintenance” state, it can be repaired without loss of data. If you have a slice in the
“Maintenance” state and a slice in the “Last Erred” state, data has probably been
corrupted. You must fix the slice in the “Maintenance” state first then the “Last Erred”
slice. See “Overview of Replacing and Enabling Components in RAID 1 and RAID 5
Volumes” on page 230.
The following table explains the slice states for a RAID 5 volume and possible actions
to take.
Initializing Slices are in the process of Normally none. If an I/O error occurs
having all disk blocks during this process, the device goes into
zeroed. This process is the “Maintenance” state. If the
necessary due to the nature initialization fails, the volume is in the
of RAID 5 volumes with “Initialization Failed” state, and the slice is
respect to data and parity in the “Maintenance” state. If this
interlace striping. happens, clear the volume and re-create it.
Okay The device is ready for use None. Slices can be added or replaced, if
and is currently free from necessary.
errors.
Resyncing The slice is actively being If desired, monitor the RAID 5 volume
resynchronized. An error status until the resynchronization is done.
has occurred and been
corrected, a slice has been
enabled, or a slice has been
added.
Maintenance A single slice has been Enable or replace the failed slice. See
marked as failed due to I/O “How to Enable a Component in a RAID 5
or open errors that were Volume” on page 142, or “How to Replace
encountered during a read a Component in a RAID 5 Volume”
or write operation. on page 143. The metastat command
will show an invoke recovery message
with the appropriate action to take with
the metareplace command.
Maintenance/ Last Multiple slices have Enable or replace the failed slices. See
Erred encountered errors. The “How to Enable a Component in a RAID 5
state of the failed slices is Volume” on page 142, or “How to Replace
either “Maintenance” or a Component in a RAID 5 Volume”
“Last Erred.” In this state, on page 143. The metastat command
no I/O is attempted on the will show an invoke recovery message
slice that is in the with the appropriate action to take with
“Maintenance” state, but the metareplace command, which must
I/O is attempted to the slice be run with the -f flag. This state
marked “Last Erred” with indicates that data might be fabricated due
the outcome being the to multiple failed slices.
overall status of the I/O
request.
This example shows the addition of slice c2t1d0s2 to an existing RAID 5 volume
named d2.
An application, such as a database, that uses the raw volume must have its own way
of growing the added space.
In this example, the RAID 5 volume d20 has a slice, c2t0d0s2, which had a soft
error. The metareplace command with the -e option enables the slice.
Note – If a disk drive is defective, you can either replace it with another available disk
(and its slices) on the system as documented in “How to Replace a Component in a
RAID 5 Volume” on page 143. Alternatively, you can repair/replace the disk, label it,
and run the metareplace command with the -e option.
Caution – Replacing a failed slice when multiple slices are in error might cause data to
be fabricated. The integrity of the data in this instance would be questionable.
1. Make sure that you have a current backup of all data and that you have root access.
2. Use one of the following methods to determine which slice of the RAID 5 volume
needs to be replaced:
3. Use one of the following methods to replace the failed slice with another slice:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then open the RAID 5 volume. Choose the Components pane, then
choose the failed component. Click Replace Component and follow the
instructions. For more information, see the online help.
■ Use the following form of the metareplace command:
metareplace volume-name failed-component new-component
4. To verify the status of the replacement slice, use one of the methods described in
Step 2.
The state of the replaced slice should be “Resyncing” or “Okay”.
In this example, the metastat command displays the action to take to recover from
the failed slice in the d1 RAID 5 volume. After locating an available slice, the
metareplace command is run, specifying the failed slice first, then the replacement
slice. (If no other slices are available, run the metareplace command with the -e
option to attempt to recover from possible soft errors by resynchronizing the failed
device.) If multiple errors exist, the slice in the “Maintenance” state must first be
replaced or enabled. Then the slice in the “Last Erred” state can be repaired. After the
metareplace command, the metastat command monitors the progress of the
resynchronization. During the replacement, the state of the volume and the new slice
will is “Resyncing.” You can continue to use the volume while it is in this state.
Note – You can use the metareplace command on non-failed devices to change a
disk slice or other component. This procedure can be useful for tuning the
performance of RAID 5 volumes.
This chapter explains how Solaris Volume Manager uses hot spare pools. For
information about performing related tasks, see Chapter 16.
Note – Hot spares do not apply to RAID 0 volumes or one-way mirrors. For automatic
substitution to work, redundant data must be available.
A hot spare cannot be used to hold data or state database replicas while it is idle. A hot
spare must remain ready for immediate use in the event of a slice failure in the volume
with which it is associated. To use hot spares, you must invest in additional disks
beyond those disks that the system actually requires to function.
147
Hot Spares
A hot spare is a slice (not a volume) that is functional and available, but not in use. A
hot spare is reserved, meaning that it stands ready to substitute for a failed slice in a
submirror or RAID 5 volume.
Hot spares provide protection from hardware failure because slices from RAID 1 or
RAID 5 volumes are automatically replaced and resynchronized when they fail. The
hot spare can be used temporarily until a failed submirror or RAID 5 volume slice can
be either fixed or replaced.
You create hot spares within hot spare pools. Individual hot spares can be included in
one or more hot spare pools. For example, you might have two submirrors and two
hot spares. The hot spares can be arranged as two hot spare pools, with each pool
having the two hot spares in a different order of preference. This strategy enables you
to specify which hot spare is used first, and it improves availability by having more
hot spares available.
A submirror or RAID 5 volume can use only a hot spare whose size is equal to or
greater than the size of the failed slice in the submirror or RAID 5 volume. If, for
example, you have a submirror made of 1 Gbyte drives, a hot spare for the submirror
must be 1 Gbyte or greater.
Tip – When you add hot spares to a hot spare pool, add them from smallest to largest.
This strategy avoids potentially wasting “large” hot spares as replacements for small
slices.
When the slice experiences an I/O error, the failed slice is placed in the “Broken” state.
To fix this condition, first repair or replace the failed slice. Then, bring the slice back to
the “Available” state by using the Enhanced Storage tool within the Solaris
Management Console or the metahs -e command.
When a submirror or RAID 5 volume is using a hot spare in place of an failed slice
and that failed slice is enabled or replaced, the hot spare is then marked “Available” in
the hot spare pool, and is again ready for use.
You can place hot spares into one or more pools to get the most flexibility and
protection from the fewest slices. That is, you could put a single slice designated for
use as a hot spare into multiple pools, each hot spare pool having different slices and
characteristics. Then, you could assign a hot spare pool to any number of submirror
volumes or RAID 5 volumes.
Note – You can assign a single hot spare pool to multiple submirrors or RAID 5
volumes. On the other hand, a submirror or a RAID 5 volume can be associated with
only one hot spare pool.
When I/O errors occur, Solaris Volume Manager checks the hot spare pool for the first
available hot spare whose size is equal to or greater than the size of the slice that is
being replaced. If found, Solaris Volume Manager changes the hot spare’s status to
“In-Use” and automatically resynchronizes the data. In the case of a mirror, the hot
spare is resynchronized with data from a good submirror. In the case of a RAID 5
volume, the hot spare is resynchronized with the other slices in the volume. If a slice
of adequate size is not found in the list of hot spares, the submirror or RAID 5 volume
that failed goes into a failed state and the hot spares remain unused. In the case of the
submirror, the submirror no longer replicates the data completely. In the case of the
RAID 5 volume, data redundancy is no longer available.
interlace 1
interlace 2
interlace 3
interlace 4
interlace 1 interlace 1
interlace 2 interlace 2
interlace 3 interlace 3
interlace 4 interlace 4
Slice 1 Slice 2
This chapter explains how to work with Solaris Volume Manager’s hot spares and hot
spare pools. For information about related concepts, see Chapter 15.
Create a hot spare pool Use the Solaris Volume Manager GUI “How to Create a Hot
or the metainit command to create a Spare Pool” on page 154
hot spare pool.
Add slices to a hot spare Use the Solaris Volume Manager GUI “How to Add Additional
pool or the metahs command to add slices Slices to a Hot Spare Pool”
to a hot spare pool. on page 155
Associate a hot spare Use the Solaris Volume Manager GUI “How to Associate a Hot
pool with a volume or the metaparam command to Spare Pool With a Volume”
associate a hot spare pool with a on page 156
volume.
Change which hot spare Use the Solaris Volume Manager GUI “How to Change the
pool is associated with a or the metaparam command to Associated Hot Spare Pool”
volume change which hot spare pool is on page 157
associated with a volume.
153
Task Description Instructions
Check the status of hot Use the Solaris Volume Manager GUI, “How to Check Status of
spares and hot spare or the metastat or metahs -i Hot Spares and Hot Spare
pools commands to check the status of a hot Pools” on page 159
spare or hot spare pool.
Replace a hot spare in a Use the Solaris Volume Manager GUI “How to Replace a Hot
hot spare pool or the metahs command to replace a Spare in a Hot Spare Pool”
hot spare in a hot spare pool. on page 160
Delete a hot spare from Use the Solaris Volume Manager GUI “How to Delete a Hot
a hot spare pool or the metahs command to delete a Spare from a Hot Spare
hot spare from a hot spare pool. Pool” on page 161
Enable a hot spare Use the Solaris Volume Manager GUI “How to Enable a Hot
or the metahs command to enable a Spare” on page 162
hot spare in a hot spare pool.
where ctds-for-slice is repeated for each slice in the hot spare pool. See the
metainit(1M) man page for more information.
Note – The metahs command can also be used to create hot spare pools.
Caution – Solaris Volume Manager will not warn you if you create a hot spare that is
not large enough. If the hot spare is not equal to, or larger than, the volume to which it
is attached, the hot spare will not work.
2. To add a slice to an existing hot spare pool, use one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Hot Spare Pools node, then choose the hot spare pool you want to change. Choose
Action->Properties, then choose the Components panel. For more information, see
the online help.
■ Use the following form of the metahs command:
metahs -a hot-spare-pool-name slice-to-add
Use -a for hot-spare-pool-name to add the slice to the specified hot spare pool.
Use -all for hot-spare-pool-name to add the slice to all hot spare pools. See the
metahs(1M) man page for more information.
Note – You can add a hot spare to one or more hot spare pools. When you add a hot
spare to a hot spare pool, it is added to the end of the list of slices in the hot spare
pool.
In this example, the -a and -all options add the slice /dev/dsk/c3t0d0s2 to all
hot spare pools configured on the system. The system verifies that the slice has been
added to all hot spare pools.
2. To associate a hot spare pool with a RAID 5 volume or submirror, use one of the
following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes and choose a volume. Choose Action->Properties, then choose the Hot
Spare Pool panel and Attach HSP. For more information, see the online help.
■ Use the following form of the metaparam command:
metaparam -h hot-spare-pool component
d10: Submirror of d0
State: Okay
Hot spare pool: hsp100
...
d11: Submirror of d0
State: Okay
Hot spare pool: hsp100
...
The -h option associates a hot spare pool, hsp100, with two submirrors, d10 and
d11, of mirror, d0. The metastat command shows that the hot spare pool is
associated with the submirrors.
The -h option associates a hot spare pool named hsp001 with a RAID 5 volume
named d10. The metastat command shows that the hot spare pool is associated with
the RAID 5 volume.
2. To change a volume’s associated hot spare pool, use one of the following methods:
In this example, the hot spare pool hsp001 is initially associated with a RAID 5
volume named d4. The hot spare pool association is changed to hsp002. The
metastat command shows the hot spare pool association before and after this
change.
In this example, the hot spare pool hsp001 is initially associated with a RAID 5
volume named d4. The hot spare pool association is changed to none, which indicates
that no hot spare pool should be associated with this device. The metastat command
shows the hot spare pool association before and after this change.
Note – The metahs command can also be used to check the status of hot spare pool.
In-use This hot spare pool includes slices that Diagnose how the hot spares are
have been used to replace failed being used. Then, repair the slice
components in a redundant volume. in the volume for which the hot
spare is being used.
Broken There is a problem with a hot spare or Diagnose how the hot spares are
hot spare pool, but there is no immediate being used or why they are
danger of losing data. This status is also broken. You can add more hot
displayed if all the hot spares are in use spares to the hot spare pool, if
or if any hot spares are broken. desired.
In this example, the metastat command makes sure that the hot spare is not in use.
The metahs -r command replaces hot spare /dev/dsk/c0t2d0s2 with
/dev/dsk/c3t1d0s2 in the hot spare pool hsp003.
In this example, the keyword all replaces hot spare /dev/dsk/c1t0d0s2 with
/dev/dsk/c3t1d0s2 in all its associated hot spare pools.
-d Specifies to delete a hot spare from the hot spare pool named.
hot-spare-pool Is the name of the hot spare pool, or the special keyword all
to delete from all hot spare pools.
current-hot-spare Is the name of the current hot spare that will be deleted.
In this example, the metastat command makes sure that the hot spare is not in use.
The metahs -d command deletes hot spare /dev/dsk/c0t2d0s2 in the hot spare
pool hsp003.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Hot Spare Pools node and select a hot spare pool. Choose Action->Properties, then
the Hot Spares panel and follow the instructions. For more information, see the
online help.
■ Use the following form of the metahs command:
metahs -e hot-spare-slice
In this example, the command places the hot spare /dev/dsk/c0t0d0s2 in the
“Available” state after it has been repaired. It is unnecessary to specify a hot spare
pool.
This chapter provides conceptual information about two types of file system logging,
transactional volumes and UFS logging. For information about performing tasks
related to transactional volumes, see Chapter 18. For more information about UFS
logging, see “Mounting and Unmounting File Systems (Tasks)” in System
Administration Guide: Basic Administration.
Note – Transactional volumes are scheduled to be removed from the Solaris operating
environment in an upcoming Solaris release. UFS logging, available since the Solaris 8
release, provides the same capabilities but superior performance, as well as lower
system administration requirements and overhead. These benefits provide a clear
choice for optimal performance and capabilities.
File system logging describes writing file system updates to a log before applying the
updates to a UFS file system. Once a transaction is recorded in the log, the transaction
information can be applied to the file system later. For example, if a user creates a new
directory, the mkdir command will be logged, then applied to the file system.
165
At reboot, the system discards incomplete transactions, but applies the transactions for
completed operations. The file system remains consistent because only completed
transactions are ever applied. Because the file system is never inconsistent, it does not
need checking by the fsck command.
A system crash can interrupt current system calls and introduce inconsistencies into an
unlogged UFS. If you mount a UFS without running the fsck command, these
inconsistencies can cause panics or corrupt data.
Checking large file systems takes a long time, because it requires reading and
verifying the file system data. With UFS logging, UFS file systems do not have to be
checked at boot time because the changes from unfinished system calls are discarded.
Note – Transactional volumes are scheduled to be removed from the Solaris operating
environment in an upcoming Solaris release. UFS logging, available since the Solaris 8
release, provides the same capabilities but superior performance, as well as lower
system administration requirements and overhead. These benefits provide a clear
choice for optimal performance and capabilities.
To enable UFS logging, use the mount_ufs -logging option on the file system, or
add logging to the mount options for the file system in the /etc/vfstab file. For
more information about mounting file systems with UFS logging enabled, see
“Mounting and Unmounting File Systems (Tasks)” in System Administration Guide:
Basic Administration and the mount_ufs(1M) man page.
To learn more about using transactional volumes, continue reading this document.
Note – If you are not currently logging UFS file systems but want to use this feature,
choose UFS logging, rather than transactional volumes.
After you configure a transactional volume, you can use it as though it were a physical
slice or another logical volume. For information about creating a transactional volume,
see “Creating Transactional Volumes” on page 175.
Example—Transactional Volume
The following figure shows a transactional volume, d1, which consists of a master
device, d3, and a mirrored log device, d30.
Logging
Data
Solaris
Logging Volume
Data Manager
Trans
Master Device Volume d2
Volume d4
interlace 1 interlace 1
interlace 2 interlace 2
interlace 3 interlace 3
interlace 4 interlace 4
Note – Mirroring log devices is strongly recommended. Losing the data in a log device
because of device errors can leave a file system in an inconsistent state that fsck
might be unable to fix without user intervention. Using a RAID 1 volume for the
master device is a good idea to ensure data redundancy.
Caution – When one master device of a shared log device goes into a failed state, the
log device is unable to roll its changes forward. This problem causes all master devices
sharing the log device to go into the hard error state.
Scenario—Transactional Volumes
Transactional volumes provide logging capabilities for UFS file systems, similar to
UFS Logging. The following example, drawing on the sample system explained in
Chapter 4, describes how transactional volumes can help speed reboot by providing
file system logging.
Note – Unless your situation requires the special capabilities of transactional volumes,
specifically the ability to log to a different device than the logged device, consider
using UFS logging instead. UFS logging provides superior performance to
transactional volumes.
The sample system has several logical volumes that should be logged to provide
maximum uptime and availability, including the root (/) and /var mirrors. By
configuring transactional volumes to log to a third RAID 1 volume, you can provide
redundancy and speed the reboot process.
This chapter provides information about performing tasks that are associated with
transactional volumes. For information about the concepts involved in these tasks, see
Chapter 17.
Note – Transactional volumes are scheduled to be removed from the Solaris operating
environment in an upcoming Solaris release. UFS logging, available since the Solaris 8
release, provides the same capabilities but superior performance, as well as lower
system administration requirements and overhead. These benefits provide a clear
choice for optimal performance and capabilities.
Create a transactional Use the Solaris Volume Manager GUI “How to Create a
volume or the metainit command to create a Transactional Volume”
transactional volume. on page 175
173
Task Description Instructions
Check the status of Use the Solaris Volume Manager GUI “How to Check the State of
transactional volumes or the metastat command to check Transactional Volumes ”
the status of a transactional volume. on page 182
Attach a log device to a Use the Solaris Volume Manager GUI “How to Attach a Log
transactional volume or the metattach command to attach Device to a Transactional
a log device. Volume” on page 184
Detach a log device from Use the Solaris Volume Manager GUI “How to Detach a Log
a transactional volume or the metadetach command to Device from a
detach a log device. Transactional Volume”
on page 185
Expand a transactional Use the Solaris Volume Manager GUI “How to Expand a
volume or the metattach command to Transactional Volume”
expand a transactional volume. on page 186
Delete a transactional Use the Solaris Volume Manager GUI, “How to Remove a
volume the metadetach command, or the Transactional Volume”
metarename command to delete a on page 187
transactional volume.
Delete a transactional Use the Solaris Volume Manager GUI “How to Remove a
volume and retain the or the metadetach command to Transactional Volume and
mount point delete a transactional volume. Retain the Mount Device”
on page 188
Share a log device Use the Solaris Volume Manager GUI “How to Share a Log
or the metainit command to share a Device Among File
transactional volume log device. Systems” on page 191
2. If possible, unmount the UFS file system for which you want to enable logging.
# umount /export
Note – If the file system cannot be unmounted, you can continue, but will have to
reboot the system before the transactional volume can be active.
The master device and log device can be either slices or logical volumes. See the
metainit(1M) man page for more information.
For example, to create a transactional volume (d10) logging the file system on slice
c0t0d0s6 to a log on c0t0d0s7, use the following syntax:
# metainit d10 -t c0t0d0s6 c0t0d0s7
4. Edit the /etc/vfstab file so that the existing UFS file system information is
replaced with that of the created transactional volume.
For example, if /export was on c0t0d0s6, and the new transactional volume is d10,
edit /etc/vfstab as shown here, so the mount points to the transactional volume
rather than to the raw disk slice:
#/dev/dsk/c0t0d0s5 /dev/rdsk/c0t0d0s5 /export ufs 2 yes -
/dev/md/dsk/d10 /dev/md/rdsk/d10 /export ufs 2 yes -
Note – If you are creating a transactional volume for a file system that cannot be
unmounted, such as /usr, then reboot the system now to remount the transactional
volume and start logging.
The slice /dev/dsk/c0t2d0s2 contains a file system mounted on /home1. The slice
that will contain the log device is /dev/dsk/c2t2d0s1. First, the file system is
unmounted. The metainit command with the -t option creates the transactional
volume, d63.
Next, the /etc/vfstab file must be edited to change the entry for the file system to
reference the transactional volume. For example, the following line:
/dev/dsk/c0t2d0s2 /dev/rdsk/c0t2d0s2 /home1 ufs 2 yes -
On subsequent reboots, instead of checking the file system, the fsck command
displays a log message for the transactional volume:
# reboot
...
Slice /dev/dsk/c0t3d0s6 contains the /usr file system. The slice that will contain
the log device is /dev/dsk/c1t2d0s1. Because /usr cannot be unmounted, the
metainit command is run with the -f option to force the creation of the
transactional volume, d20. Next, the line in the /etc/vfstab file that mounts the file
system must be changed to reference the transactional volume. For example, the
following line:
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 1 no -
Logging becomes effective for the file system when the system is rebooted.
RAID 1 volume d30 contains a file system that is mounted on /home1. The mirror
that will contain the log device is d12. First, the file system is unmounted. The
metainit command with the -t option creates the transactional volume, d64.
Next, the line in the /etc/vfstab file that mounts the file system must be changed to
reference the transactional volume. For example, the following line:
/dev/md/dsk/d30 /dev/md/rdsk/d30 /home1 ufs 2 yes -
Logging becomes effective for the file system when the file system is remounted.
On subsequent file system remounts or system reboots, instead of checking the file
system, the fsck command displays a log message for the transactional volume:
Note – To avoid editing the /etc/vfstab file, you can use the metarename(1M)
command to exchange the name of the original logical volume and the new
transactional volume. For more information, see “Renaming Volumes” on page 221.
Note – You must have at least one Mbyte of free space (using default system settings)
to convert to UFS logging, because the log requires some space and resides on the
logged volume. If you do not have sufficient free space, you will have to remove files
or grow your file system before you can complete this conversion process.
d20: Concat/Stripe
Size: 28728 blocks
Stripe 0: (interlace: 32 blocks)
Device Start Block Dbase State Reloc Hot Spare
d10 0 No Okay No
d11 0 No Okay No
d12
Note the names for these devices for later use.
2. Check to see if the Trans device is currently mounted by using the df command
and searching for the name of the transactional volume in the output. If the
transactional volume is not mounted, go to Step 7.
# df | grep d2
/mnt/transvolume (/dev/md/dsk/d2 ): 2782756 blocks 339196 files
4. Stop all activity on the file system, either by halting applications or bringing the
system to the single user mode.
# init s
[root@lexicon:lexicon-setup]$ init s
INIT: New run level: S
The system is coming down for administration. Please wait.
Dec 11 08:14:43 lexicon syslogd: going down on signal 15
Killing user processes: done.
5. Flush the log for the file system that is logged with lockfs -f.
# /usr/sbin/lockfs -f /mnt/transvolume
The Logging device, identified at the beginning of this procedure, is now unused
and can be reused for other purposes. The master device, also identified at the
beginning of this procedure, contains the file system and must be mounted for use.
8. Edit the /etc/vfstab file to update the mount information for the file system.
You must change the raw and block mount points, and add logging to the options
for that file system. With the transactional volume in use, the /etc/vfstab entry
looks like this:
/dev/md/dsk/d2 /dev/md/rdsk/d2 /mnt/transvolume ufs 1 no -
After you update the file to change the mount point from the transactional volume d2
to the underlying device d0, and add the logging option, that part of the
/etc/vfstab file looks like this:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
/dev/md/dsk/d0 /dev/md/rdsk/d0 /mnt/transvolume ufs 1 no logging
Note – The mount command might report an error, similar to“the state of
/dev/md/dsk/d0 is not okay and it was attempted to be mounted read/write. Please
run fsck and try again.” If this happens, run fsck on the raw device (fsck
/dev/md/rdsk/d0 in this case), answer y to fixing the file system state in the
superblock, and try again.
10. Verify that the file system is mounted with logging enabled by examining the
/etc/mnttab file and confirming that the file system has logging listed as one of
the options.
# grep mnt /etc/mnttab
mnttab /etc/mnttab mntfs dev=43c0000 1007575477
/dev/md/dsk/d0 /mnt/transvolume ufs rw,intr,largefiles,
logging,xattr,onerror=panic,suid,dev=1540000 1008085006
Make note of the ’master’ and ’log’ devices as you will need this information in subsequent steps.
# df | grep d50
/home1 (/dev/md/dsk/d50 ): 161710 blocks 53701 files
Go to single-user mode.
# /usr/sbin/lockfs -f /home1
# /usr/sbin/umount /home1
# /usr/sbin/metaclear d50
d50: Trans is cleared
Update /etc/vfstab file to mount underlying volume and add logging option.
# cat /etc/vfstab
#device device mount FS fsck mount
mount
#to mount to fsck point type pass at boot
options
# mount /home1
# /usr/bin/grep /home1 /etc/mnttab
/dev/dsk/c1t14d0s0 /home1 ufs
rw,intr,largefiles,logging,xattr,onerror=panic,suid,dev=740380
1008019906
Return to multi-user mode.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then view the status of the volumes. Right-click a transactional
volume and choose Properties for more detailed status information. For more
information, see the online help.
■ Use the metastat command.
For more information, see the metastat(1M) man page.
The following table explains transactional volume states and possible actions to take.
Detached The transactional volume does not The fsck command automatically checks
have a log device. All benefits from the device at boot time. See the fsck(1M)
UFS logging are disabled. man page.
Hard Error A device error or panic has Fix the transactional volume. See “How to
occurred while the device was in Recover a Transactional Volume With a
use. An I/O error is returned for Panic” on page 192, or “How to Recover a
every read or write until the device Transactional Volume With Hard Errors”
is closed or unmounted. The first on page 193.
open causes the device to
transition to the Error state.
Error The device can be read and written Fix the transactional volume. See “How to
to. The file system can be mounted Recover a Transactional Volume With a
read-only. However, an I/O error Panic” on page 192, or “How to Recover a
is returned for every read or write Transactional Volume With Hard Errors”
that actually gets a device error. on page 193. Successfully completing the
The device does not transition back fsck or newfs commands transitions the
to the Hard Error state, even when device into the Okay state. When the
a later device error occurs. device is in the Hard Error or Error state,
the fsck command automatically checks
and repairs the file system at boot time.
The newfs command destroys whatever
data might be on the device.
2. Unmount the UFS file system for which you want to enable logging.
3. Attach a log device to the transactional volume by using one of the following
methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
■ Use the following form of the metattach command:
metattach master-volume logging-volume
master-volume is the name of the transactional volume that contains the file system
to be logged.
logging-volume is the name of the volume or slice that should contain the log.
2. Unmount the UFS file system for which you want to disable logging or change the
log device.
3. Detach the log device from the transactional volume by using one of the following
methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
■ Use the following form of the metadetach command:
metadetach master-volume
master-volume is the name of the transactional volume that contains the file system
that is being logged.
Note – You can expand a master device within a transactional volume only when the
master device is a volume (RAID 0, RAID 1, or RAID 5).
2. If the master device is a volume (rather than a basic slice), attach additional slices to
the master device by using one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties, then the Components panel. For more
information, see the online help.
■ Use the following form of the metattach command:
metattach master-volume component
master-volume is the name of the transactional volume that contains the file system
to be logged.
Note – If the master device is a mirror, you need to attach additional slices to each
submirror.
3. If the master device is a slice, you cannot expand it directly. Instead, you must do
the following:
■ Clear the existing transactional volume.
■ Put the master device’s slice into a volume.
■ Recreate the transactional volume.
Once you have completed this process, you can expand the master device as explained
in the previous steps of this procedure.
This example shows the expansion of a transactional device, d10, whose master
device consists of a two-way RAID 1 volume, d0, which contains two submirrors, d11
and d12. The metattach command is run on each submirror. The system confirms
that each slice was attached.
An application, such as a database, that uses the raw volume must have its own way
of growing the added space.
2. Unmount the UFS file system for which you want to remove the transactional
volume and disable logging.
# umount /filesystem
3. Detach the log device from the transactional volume by using one of the following
methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
■ Use the following form of the metadetach command:
metadetach master-volume
master-volume is the name of the transactional volume that contains the file system
that is being logged.
4. Remove (clear) the transactional volume by using one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Delete. For more information, see the online help.
■ Use the following form of the metaclear command:
metaclear master-volume
5. If necessary, update /etc/vfstab to mount the underlying volume, rather than the
transactional volume you just cleared.
( Edit /etc/vfstab to update mount point for /fs2 to mount on c1t1d0s1, not d1)
# mount /fs2
2. Unmount the UFS file system for which you want to remove the transactional
volume and disable logging.
3. Detach the log device from the transactional volume by using one of the following
methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
master-volume is the name of the transactional volume that contains the file system
that is being logged.
4. Exchange the name of the transactional volume with that of the master device.
5. Remove (clear) the transactional volume by using one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Delete. For more information, see the online help.
■ Use the following form of the metaclear command:
metaclear master-volume
d21: Mirror
Submirror 0: d20
State: Okay
Submirror 1: d2
State: Okay
...
d1: Mirror
Submirror 0: d20
State: Okay
Submirror 1: d2
State: Okay
# metaclear 21
# fsck /dev/md/dsk/d1
** /dev/md/dsk/d1
** Last Mounted on /fs2
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
The metastat command confirms that the transactional volume, d1, is in the “Okay”
state. The file system is unmounted before detaching the transactional volume’s log
device. The transactional volume and its mirrored master device are exchanged by
using the -f (force) flag. Running the metastat command again confirms that the
exchange occurred. The transactional volume and the log device (if desired) are
cleared, in this case, d21 and d0, respectively. Next, the fsck command is run on the
mirror, d1, and the prompt is answered with a y. After the fsck command is done,
the file system is remounted. Note that because the mount device for /fs2 did not
change, the /etc/vfstab file does not require editing.
2. If possible, unmount the file system for which you want to enable logging.
3. If you already have an existing log device, detach it from the transactional volume
by using one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
■ Use the following form of the metadetach command:
metadetach master-volume
4. Attach a log device to the transactional volume by using one of the following
methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node, then choose the transactional volume from the listing. Right-click
the volume, and choose Properties. For more information, see the online help.
■ Use the following form of the metattach command:
metattach master-volume logging-volume
5. Edit the /etc/vfstab file to modify (or add) the entry for the file system to
reference the transactional volume.
6. Remount the file system. If the file system cannot be unmounted, reboot the system
to force your changes to take effect.
This example shows the sharing of a log device (d10) defined as the log for a previous
transactional volume, with a new transactional volume (d64). The file system to be set
up as the master device is /xyzfs and is using slice /dev/dsk/c0t2d0s4. The
metainit -t command specifies the configuration is a transactional volume. The
/etc/vfstab file must be edited to change (or enter for the first time) the entry for
the file system to reference the transactional volume. For example, the following line:
/dev/dsk/c0t2d0s4 /dev/rdsk/c0t2d0s4 /xyzfs ufs 2 yes -
The metastat command verifies that the log is being shared. Logging becomes
effective for the file system when the system is rebooted.
Upon subsequent reboots, instead of checking the file system, the fsck command
displays these messages for the two file systems:
/dev/md/rdsk/d63: is logging.
/dev/md/rdsk/d64: is logging.
See “How to Check the State of Transactional Volumes ” on page 182 to check the
status of a transactional volume.
If either the master device or log device encounters errors while processing logged
data, the device transitions from the “Okay” state to the “Hard Error” state. If the
device is in the “Hard Error” or “Error” state, either a device error or panic occurred.
Recovery from both scenarios is the same.
Note – If a log (log device) is shared, a failure in any of the slices in a transactional
volume will result in all slices or volumes that are associated with the transactional
volume switching to a failed state.
3. Run the lockfs command to determine which file systems are locked.
# lockfs
Affected file systems are listed with a lock type of hard. Every file system that shares
the same log device would be hard locked.
7. Run the fsck command to repair the file system, or the newfs command if you
need to restore data.
Run the fsck command on all of the transactional volumes that share the same log
device. When all transactional volumes have been repaired by the fsck command,
they then revert to the “Okay” state.
The newfs command will also transition the file system back to the “Okay” state, but
the command will destroy all of the data on the file system. The newfs command is
generally used when you plan to restore file systems from backup.
The fsck or newfs commands must be run on all of the transactional volumes that
share the same log device before these devices revert back to the “Okay” state.
8. Run the metastat command to verify that the state of the affected devices has
reverted to “Okay.”
d4: Mirror
State: Okay
...
c0t0d0s6: Logging device for d5
State: Hard Error
Size: 5350 blocks
...
# fsck /dev/md/rdsk/d5
** /dev/md/rdsk/d5
** Last Mounted on /fs1
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
WARNING: md: log device: /dev/dsk/c0t0d0s6 changed state to
Okay
4 files, 11 used, 4452 free (20 frags, 554 blocks, 0.4%
fragmentation)
d4: Mirror
State: Okay
...
This example shows a transactional volume, d5, which has a log device in the “Hard
Error” state, being fixed. You must run the fsck command on the transactional
volume itself, which transitions the state of the transactional volume to “Okay.” The
metastat command confirms that the state is “Okay.”
This chapter provides conceptual information about disk sets. For information about
performing related tasks, see Chapter 20.
A disk set supports data redundancy and data availability. If one host fails, the other
host can take over the failed host’s disk set. (This type of configuration is known as a
failover configuration.) Although each host can control the set of disks, only one host
can control it at a time.
Note – Disk sets are supported on both SPARC based and IA based platforms.
197
Note – Disk sets are intended primarily for use with Sun Cluster, Solstice HA (High
Availability), or another supported third-party HA framework. Solaris Volume
Manager by itself does not provide all the functionality necessary to implement a
failover configuration.
Volumes and hot spare pools in a shared disk set must be built on drives from within
that disk set. Once you have created a volume within the disk set, you can use the
volume just as you would a physical slice. However, disk sets do not support
mounting file systems from the /etc/vfstab file.
A file system that resides on a volume in a disk set cannot be mounted automatically
at boot with the /etc/vfstab file. The necessary disk set RPC daemons (rpc.metad
and rpc.metamhd) do not start early enough in the boot process to permit this.
Additionally, the ownership of a disk set is lost during a reboot.
Similarly, volumes and hot spare pools in the local disk set can consist only of drives
from within the local disk set.
When you add disks to a disk set, Solaris Volume Manager automatically creates the
state database replicas on the disk set. When a drive is accepted into a disk set, Solaris
Volume Manager might repartition the drive so that the state database replica for the
disk set can be placed on the drive (see “Automatic Disk Partitioning” on page 199).
Unlike local disk set administration, you do not need to manually create or delete disk
set state databases. Solaris Volume Manager places one state database replica (on slice
7) on each drive across all drives in the disk set, up to a maximum of 50 total replicas
in the disk set.
If you have disk sets that you upgraded from Solstice DiskSuite software, the default
state database replica size on those sets will be 1034 blocks, not the 8192 block size
from Solaris Volume Manager. Also, slice 7 on the disks that were added under
Solstice DiskSuite will be correspondingly smaller than slice 7 on disks that were
added under Solaris Volume Manager.
After you add the disk to a disk set, the output of prtvtoc looks like the following:
[root@lexicon:apps]$ prtvtoc /dev/rdsk/c1t6d0s0
* /dev/rdsk/c1t6d0s0 partition map
*
* Dimensions:
* 512 bytes/sector
* 133 sectors/track
* 27 tracks/cylinder
* 3591 sectors/cylinder
* 4926 cylinders
* 4924 accessible cylinders
*
* Flags:
* 1: unmountable
* 10: read-only
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 0 00 10773 17671311 17682083
7 0 01 0 10773 10772
[root@lexicon:apps]$
If disks you add to a disk set have acceptable slice 7s (that start at cylinder 0 and that
have sufficient space for the state database replica), they will not be reformatted.
Similarly, hot spare pools have the disk set name as part of the hot spare name.
In this configuration, Host A and Host B share disk sets A and B. They each have their
own local disk set, which is not shared. If Host A fails, Host B can take over control of
Host A’s shared disk set (Disk set A). Likewise, if Host B fails, Host A can take control
of Host B’s shared disk set (Disk set B).
Disk 0 Disk 1
Local Disk 4 Disk 2 Shared
Host A
Disk Set Disk Set
Disk 5 Disk 3 red
Disk 0 Disk 1
Local Disk 4 Disk 2 Shared
Host B
Disk Set Disk Set
Disk 5 Disk 3 blue
After drives are added to a disk set, the disk set can be reserved (or taken) and released
by hosts in the disk set. When a disk set is reserved by a host, the other host in the
disk set cannot access the data on the drives in the disk set. To perform maintenance
on a disk set, a host must be the owner of the disk set or have reserved the disk set. A
host takes implicit ownership of the disk set by putting the first drives into the set.
For more information about releasing a disk set, see “How to Release a Disk Set”
on page 214.
Scenario—Disk Sets
The following example, drawing on the sample system shown in Chapter 4, describes
how disk sets should be used to manage storage that resides on a SAN (Storage Area
Network) fabric.
Assume that the sample system has an additional controller that connects to a fiber
switch and SAN storage. Storage on the SAN fabric is not available to the system as
early in the boot process as other devices, such as SCSI and IDE disks, and Solaris
Volume Manager would report logical volumes on the fabric as unavailable at boot.
However, by adding the storage to a disk set, and then using the disk set tools to
manage the storage, this problem with boot time availability is avoided (and the
fabric-attached storage can be easily managed within a separate, disk set controlled,
namespace from the local storage).
This chapter provides information about performing tasks that are associated with
disk sets. For information about the concepts involved in these tasks, see Chapter 19.
Create a disk set Use the Solaris Volume Manager GUI “How to Create a Disk Set”
or the metaset command to create a on page 206
disk set.
Add drives to a disk set Use the Solaris Volume Manager GUI “How to Add Drives to a
or the metaset command to add Disk Set” on page 207
drives to a disk set.
Add a host to a disk set Use the Solaris Volume Manager GUI “How to Add a Host to a
or the metaset command to add a Disk Set” on page 209
host to a disk set.
Create Solaris Volume Use the Solaris Volume Manager GUI “How to Create Solaris
Manager volumes in a or the metainit command to create Volume Manager
disk set volumes in a disk set. Components in a Disk Set”
on page 210
205
Task Description Instructions
Check the status of a Use the Solaris Volume Manager GUI “How to Check the Status
disk set or the metaset and metastat of a Disk Set” on page 211
commands to check the status of a
disk set.
Remove disks from a Use the Solaris Volume Manager GUI “How to Remove Disks
disk set or the metaset command to remove from a Disk Set”
drives from a disk set. on page 212
Take a disk set Use the Solaris Volume Manager GUI “How to Take a Disk Set”
or the metaset command to take a on page 213
disk set.
Release a disk set Use the Solaris Volume Manager GUI “How to Release a Disk
or the metaset command to release a Set” on page 214
disk set.
Delete a host from a disk Use the Solaris Volume Manager GUI “How to Delete a Host or
set or the metaset command to delete Disk Set” on page 215
hosts from a disk set.
Delete a disk set Use the Solaris Volume Manager GUI “How to Delete a Host or
or the metaset command to delete Disk Set” on page 215
the last host from a disk set, thus
deleting the disk set.
3. Check the status of the new disk set by using the metaset command.
# metaset
Host Owner
lexicon
In this example, you create a shared disk set called blue, from the host lexicon. The
metaset command shows the status. At this point, the set has no owner. The host that
adds disks to the set will become the owner by default.
Caution – Do not add a disk with data; the process of adding it to the disk set might
repartition the disk, destroying any data. For more information, see “Example—Two
Shared Disk Sets” on page 201
3. Use the metaset command to verify the status of the disk set and drives.
# metaset
Host Owner
lexicon Yes
Drive Dbase
c1t6d0 Yes
In this example, the host name is lexicon. The shared disk set is blue. At this point,
only one disk has been added to the disk set blue.
3. Verify that the host has been added to the disk set by using the metaset command
without any options.
# metaset
Host Owner
lexicon Yes
idiom
This example shows the addition of host idiom to the disk set blue.
● To create volumes or other Solaris Volume Manager devices within a disk set, use
one of the following methods:
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes, State Database Replicas, or Hot Spare Pools node. Choose
Action->Create, then follow the instructions in the wizard. For more information,
see the online help.
■ Use the command line utilities with the same basic syntax you would without a
disk set, but add -s diskset-name immediately after the command for every
command.
# metastat -s blue
blue/d10: Mirror
Submirror 0: blue/d11
State: Okay
Submirror 1: blue/d12
State: Resyncing
Resync in progress: 0 % done
Pass: 1
Read option: roundrobin (default)
This example shows the creation of a mirror, d10, in disk set blue, that consists of
submirrors (RAID 0 devices) d11 and d12.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Disk Sets node. Right-click the Disk Set you want to monitor, then choose
Properties from the menu. For more information, see the online help.
■ Use the metaset command to view disk set status.
See metaset(1M) for more information.
Host Owner
idiom Yes
Drive Dbase
c1t6d0 Yes
c2t6d0 Yes
The metaset command with the -s option followed by the name of the blue disk set
displays status information for that disk set. By issuing the metaset command from
the owning host, idiom, it is determined that idiom is in fact the disk set owner. The
metaset command also displays the drives in the disk set.
The metaset command by itself displays the status of all disk sets.
● Verify that the disk has been deleted from the disk set by using the metaset -s
diskset-name command.
# metaset -s blue
Host Owner
Drive Dbase
c2t6d0 Yes
This example deletes the disk from the disk set blue.
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Disk Sets node. Right-click the Disk Set you want to take, then choose Take
Ownership from the menu. For more information, see the online help.
■ Use the following form of the metaset command.
metaset -s diskset-name-t
-s diskset-name Specifies the name of a disk set on which the metaset command
will work.
-t Specifies to take the disk set.
-f Specifies to take the disk set forcibly.
See the metaset(1M) man page for more information.
When one host in a disk set takes the disk set, the other host in the disk set cannot
access data on drives in the disk set.
The default behavior of the metaset command takes the disk set for your host only if
a release is possible on the other host.
Use the -f option to forcibly take the disk set. This option takes the disk set whether
or not another host currently has the set. Use this method when a host in the disk set
is down or not communicating. If the other host had the disk set taken at this point, it
would panic when it attempts to perform an I/O operation to the disk set.
Host Owner
Host Owner
lexicon Yes
idiom
...
In this example, host lexicon communicates with host idiom and ensures that host
idiom has released the disk set before host lexicon attempts to take the set.
Note – In this example, if host idiom owned the set blue, the “Owner” column in the
above output would still have been blank. The metaset command only shows
whether the issuing host owns the disk set, and not the other host.
In this example, the host that is taking the disk set does not communicate with the
other host. Instead, the drives in the disk set are taken without warning. If the other
host had the disk set, it would panic when it attempts an I/O operation to the disk set.
3. Verify that the disk set has been released on this host by using the metaset
command without any options.
# metaset
Host Owner
lexicon
idiom
Drive Dbase
c1t6d0 Yes
c2t6d0 Yes
This example shows the release of the disk set blue. Note that there is no owner of
the disk set. Viewing status from host lexicon could be misleading. A host can only
determine if it does or does not own a disk set. For example, if host idiom were to
reserve the disk set, it would not appear so from host lexicon. Only host idiom
would be able to determine the reservation in this case.
2. Verify that the host has been deleted from the disk set by using the metaset
command. Note that only the current (owning) host is shown. Other hosts have
been deleted.
# metaset -s blue
Set name = blue, Set number = 1
Host Owner
lexicon Yes
Drive Dbase
c1t2d0 Yes
c1t3d0 Yes
c1t4d0 Yes
c1t5d0 Yes
c1t6d0 Yes
c2t1d0 Yes
This example shows the deletion of the last host from the disk set blue.
View the Solaris Volume Use the Solaris Volume Manager GUI “How to View the Solaris
Manager configuration or the metastat command to view Volume Manager Volume
the system configuration. Configuration” on page 218
217
Task Description Instructions
Rename a volume Use the Solaris Volume Manager GUI “How to Rename a
or the metarename command to Volume” on page 223
rename a volume.
Initialize Solaris Volume Use the metainit command to “How to Initialize Solaris
Manager from initialize Solaris Volume Manager Volume Manager from a
configuration files from configuration files. Configuration File”
on page 224
Increase the number of Edit the /kernel/drv/md.conf file “How to Increase the
possible volumes to increase the number of possible Number of Default
volumes. Volumes” on page 226
Increase the number of Edit the /kernel/drv/md.conf file “How to Increase the
possible disk sets to increase the number of possible disk Number of Default Disk
sets. Sets” on page 227
Grow a file system Use the growfs command to grow a “How to Grow a File
file system. System” on page 229
Enable components Use the Solaris Volume Manager GUI “Enabling a Component”
or the metareplace command to on page 230
enable components.
Replace components Use the Solaris Volume Manager GUI “Replacing a Component
or the metareplace command to With Another Available
replace components. Component” on page 231
■ From the Enhanced Storage tool within the Solaris Management Console, open the
Volumes node. For more information, see the online help.
■ Use the following format of the metastat command:
Tip – The metastat command does not sort output. Pipe the output of the
metastat -p command to the sort or grep commands for a more managable
listing of your configuration.
d1: Concat/Stripe
Size: 4197879 blocks
Stripe 0:
Device Start Block Dbase Reloc
c1t2d0s3 0 No Yes
d2: Concat/Stripe
Size: 4197879 blocks
Stripe 0:
Device Start Block Dbase Reloc
c2t2d0s3 0 No Yes
d70: Mirror
Submirror 0: d71
State: Okay
Submirror 1: d72
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 12593637 blocks
hsp010: is empty
Renaming Volumes
Solaris Volume Manager enables you to rename most types of volumes at any time,
subject to some constraints.
Before you rename a volume, make sure that it is not currently in use. For a file
system, make sure it is not mounted or being used as swap. Other applications using
the raw device, such as a database, should have their own way of stopping access to
the data.
You can use either the Enhanced Storage tool within the Solaris Management Console
or the command line (the metarename(1M) command) to rename volumes.
Note – You must use the command line to exchange volume names. This functionality
is currently unavailable in the Solaris Volume Manager GUI. However, you can
rename a volume with either the command line or the GUI.
4. Edit the /etc/vfstab file to refer to the new volume name, if necessary.
In this example, the volume d10 is renamed to d100. Because d10 contains a mounted
file system, the file system must be unmounted before the rename can occur. If the
volume is used for a file system with an entry in the /etc/vfstab file, the entry
must be changed to reference the new volume name. For example, the following line:
/dev/md/dsk/d10 /dev/md/rdsk/d10 /docs ufs 2 yes -
Caution – Use this procedure only if you have experienced a complete loss of your
Solaris Volume Manager configuration, or if you have no configuration yet and you
want to create a configuration from a saved configuration file.
If your system loses the information maintained in the state database (for example,
because the system was rebooted after all state database replicas were deleted), and as
long as no volumes were created since the state database was lost, you can use the
md.cf or md.tab files to recover your Solaris Volume Manager configuration.
For more information about these files, see md.cf(4) and md.tab(4).
4. Check the syntax of the md.tab file entries without committing changes by using
the following form of the metainit command:
# metainit -n -a component-name
■ -n specifies not to actually create the devices. Use this to check to verify that the
results will be as you expect
■ -a specifies to activate the devices.
■ component-name specifies the name of the component to initialize. IF no component
is specified, all components will be created.
5. If no problems were apparent from the previous step, re-create the volumes and hot
spare pools from the md.tab file:
# metainit -a component-name
The values of total volumes and number of disk sets can be changed if necessary, and
the tasks in this section tell you how.
Caution – If you lower this number at any point, any volume existing between the old
number and the new number might not be available, potentially resulting in data loss.
If you see a message such as “md: d200: not configurable, check
/kernel/drv/md.conf” you must edit the md.conf file and increase the value, as
explained in this task.
3. Change the value of the nmd field. Values up to 8192 are supported.
Caution – Do not decrease the number of default disk sets if you have already
configured disk sets. Lowering this number could make existing disk sets unavailable
or unusable.
Example—md.conf File
Here is a sample md.conf file that is configured for five shared disk sets. The value of
md_nsets is six, which results in five shared disk sets and the one local disk set.
#
#
#pragma ident "@(#)md.conf 2.1 00/07/07 SMI"
An application, such as a database, that uses the raw device must have its own
method to grow added space. Solaris Volume Manager does not provide this
capability.
The growfs command will “write-lock” a mounted file system as it expands the file
system. The length of time the file system is write-locked can be shortened by
expanding the file system in stages. For instance, to expand a 1 Gbyte file system to 2
Gbytes, the file system can be grown in 16 Mbyte stages using the -s option to specify
the total size of the new file system at each stage.
During the expansion, the file system is not available for write access because of
write-lock. Write accesses are transparently suspended and are restarted when the
growfs command unlocks the file system. Read accesses are not affected, though
access times are not kept while the lock is in effect.
Note – Solaris Volume Manager volumes can be expanded, but not shrunk.
Note – For mirror and transactional volumes, always run the growfs command on
the top-level volume, not a submirror or master device, even though space is added to
the submirror or master device.
Note – When recovering from disk errors, scan /var/adm/messages to see what
kind of errors occurred. If the errors are transitory and the disks themselves do not
have problems, try enabling the failed components. You can also use the format
command to test a disk.
Enabling a Component
You can enable a component when any of the following conditions exist:
■ Solaris Volume Manager could not access the physical drive. This problem might
have occurred, for example, due to a power loss, or a loose drive cable. In this case,
Solaris Volume Manager puts the components in the “Maintenance” state. You
need to make sure that the drive is accessible (restore power, reattach cables, and
Note – Always check for state database replicas and hot spares on the drive being
replaced. Any state database replica shown to be in error should be deleted before
replacing the disk. Then after enabling the component, they should be re-created (at
the same size). You should treat hot spares in the same manner.
You can use this command when any of the following conditions exist:
■ A disk drive has problems, and you do not have a replacement drive, but you do
have available components elsewhere on the system.
You might want to use this strategy if a replacement is absolutely necessary but
you do not want to shut down the system.
■ You are seeing soft errors.
Physical disks might report soft errors even though Solaris Volume Manager shows
the mirror/submirror or RAID 5 volume in the “Okay” state. Replacing the
component in question with another available component enables you to perform
preventative maintenance and potentially prevent hard errors from occurring.
■ You want to do performance tuning.
For example, by using the performance monitoring feature available from the
Enhanced Storage tool within the Solaris Management Console, you see that a
particular component in a RAID 5 volume is experiencing a high load average,
even though it is in the “Okay” state. To balance the load on the volume, you can
replace that component with a component from a disk that is less utilized. You can
perform this type of replacement online without interrupting service to the
volume.
When a component in a RAID 0 or RAID 5 volume experiences errors and there are no
redundant components to read from (for example, in a RAID 5 volume, after one
component goes into Maintenance state, there is no redundancy available, so the next
component to fail would go into “Last Erred” state) When either a mirror or RAID 5
volume has a component in the “Last Erred” state, I/O is still attempted to the
component marked “Last Erred.” This happens because a “Last Erred” component
contains the last good copy of data from Solaris Volume Manager’s point of view. With
a component in the “Last Erred” state, the volume behaves like a normal device (disk)
and returns I/O errors to an application. Usually, at this point some data has been lost.
Always replace components in the “Maintenance” state first, followed by those in the
“Last Erred” state. After a component is replaced and resynchronized, use the
metastat command to verify its state, then validate the data to make sure it is good.
Mirrors –If components are in the “Maintenance” state, no data has been lost. You can
safely replace or enable the components in any order. If a component is in the “Last
Erred” state, you cannot replace it until you first replace all the other mirrored
components in the “Maintenance” state. Replacing or enabling a component in the
“Last Erred” state usually means that some data has been lost. Be sure to validate the
data on the mirror after you repair it.
RAID 5 Volumes–A RAID 5 volume can tolerate a single component failure. You can
safely replace a single component in the “Maintenance” state without losing data. If an
error on another component occurs, it is put into the “Last Erred” state. At this point,
the RAID 5 volume is a read-only device. You need to perform some type of error
recovery so that the state of the RAID 5 volume is stable and the possibility of data
loss is reduced. If a RAID 5 volume reaches a “Last Erred” state, there is a good
chance it has lost data. Be sure to validate the data on the RAID 5 volume after you
repair it.
Note – A submirror or RAID 5 volume might be using a hot spare in place of a failed
component. When that failed component is enabled or replaced by using the
procedures in this section, the hot spare is marked “Available” in the hot spare pool,
and is ready for use.
This chapter provides general best practices information from real world storage
scenarios using Solaris Volume Manager. In this section, you will see a typical
configuration, followed by an analysis, followed by a recommended (“Best Practices”)
configuration to meet the same needs.
235
As a starting point, consider a Netra with a single SCSI bus and two internal
disks—an off-the-shelf configuration, and a good starting point for distributed servers.
Solaris Volume Manager could easily be used to mirror some or all of the slices, thus
providing redundant storage to help guard against disk failure. See the following
figure for an example.
SCSI
controller
c0t0d0 c0t1d0
A configuration like this example might include mirrors for the root (/), /usr, swap,
/var, and /export file systems, plus state database replicas (one per disk). As such, a
failure of either side of any of the mirrors would not necessarily result in system
failure, and up to five discrete failures could possibly be tolerated. However, the
system is not sufficiently protected against disk or slice failure. A variety of potential
failures could result in a complete system failure, requiring operator intervention.
While this configuration does help provide some protection against catastrophic disk
failure, it exposes key possible single points of failure:
■ The single SCSI controller represents a potential point of failure. If the controller
fails, the system will be down, pending replacement of the part.
■ The two disks do not provide adequate distribution of state database replicas. The
majority consensus algorithm requires that half of the state database replicas be
available for the system to continue to run, and half plus one replica for a reboot.
So, if one state database replica were on each disk and one disk or the slice
containing the replica failed, the system could not reboot (thus making a mirrored
root ineffective). If two or more state database replicas were on each disk, a single
slice failure would likely not be problematic, but a disk failure would still prevent a
reboot. If different number of replicas were on each disk, one would have more
than half and one fewer than half. If the disk with fewer replicas failed, the system
could reboot and continue. However, if the disk with more replicas failed, the
system would immediately panic.
Generally, do not establish Solaris Volume Manager RAID 5 volumes on any hardware
storage devices that provide redundancy (for example, RAID 1 and RAID 5 volumes).
Unless you have a very unusual situation, performance will suffer, and you will gain
very little in terms of redundancy or higher availability
Configuring underlying hardware storage devices with RAID 5 volumes, on the other
hand, is very effective, as it provides a good foundation for Solaris Volume Manager
volumes. Hardware RAID 5 provides some additional redundancy for Solaris Volume
Manager RAID 1 volumes, soft partitions, or other volumes.
Note – Do not configure similar software and hardware devices. For example, do not
build software RAID 1 volumes on top of hardware RAID 1 devices. Configuring
similar devices in hardware and software results in performance penalties without
offsetting any gains in reliability.
Configuring soft partitions on top of an Solaris Volume Manager RAID 1 volume, built
in turn on a hardware RAID 5 device is a very flexible and resilient configuration.
When Solaris Volume Manager encounters a problem, such as being unable to write to
a volume due to physical errors at the slice level, it changes the status of the volume so
system administrators can stay informed. However, unless you regularly check the
status in the Solaris Volume Manager graphical user interface through the Solaris
Management Console, or by running the metastat command, you might not see
these status changes in a timely fashion.
This chapter provides information about various monitoring tools available for Solaris
Volume Manager, including the Solaris Volume Manager SNMP agent, which is a
subagent of the Solstice Enterprise Agents™ monitoring software. In addition to
configuring the Solaris Volume Manager SNMP agent to report SNMP traps, you can
create a shell script to actively monitor many Solaris Volume Manager functions. Such
a shell script can run as a cron job and be valuable in identifying issues before they
become problems.
239
Solaris Volume Manager Monitoring and
Reporting (Task Map)
The following task map identifies the procedures needed to manage Solaris Volume
Manager error reporting.
Set the mdmonitord Set the error-checking interval used by “Setting the mdmonitord
daemon to periodically the mdmonitord daemon by editing Command for Periodic
check for errors the /etc/rc2.d/S95svm.sync file. Error Checking”
on page 240
Configure the Solaris Edit the configuration files in the “Configuring the Solaris
Volume Manager SNMP /etc/snmp/conf directory so Solaris Volume Manager SNMP
agent Volume Manager will throw traps Agent” on page 242
appropriately, to the correct system.
Monitor Solaris Volume Create or adapt a script to check for “Monitoring Solaris
Manager with scripts errors, then run the script from the Volume Manager with a
run by the cron cron command. cron Job” on page 245
command
2. Edit the /etc/rc2.d/S95svm.sync script and change the line that starts the
mdmonitord command by adding a - t flag and the number of seconds between
checks.
if [ -x $MDMONITORD ]; then
$MDMONITORD -t 3600
error=$?
case $error in
0) ;;
*) echo "Could not start $MDMONITORD. Error $error."
;;
esac
fi
These packages are part of the Solaris operating environment and are normally
installed by default unless the package selection was modified at install time or a
minimal set of packages was installed. After you confirm that all five packages are
available (by using the pkginfo pkgname command, as in pkginfo SUNWsasnx),
you need to configure the Solaris Volume Manager SNMP agent, as described in the
following section.
Note – Make sure that you have the same number of opening and closing brackets in
the /etc/snmp/conf/snmpdx.acl file.
Note – Whenever you upgrade your Solaris operating environment, you will probably
need to edit the/etc/snmp/conf/enterprises.oid file and append the line in
Step 6 again, then restart the Solaris Enterprise Agents server.
After you have completed this procedure, your system will issue SNMP traps to the
host or hosts that you specified. You will need to use an appropriate SNMP monitor,
such as Solstice Enterprise Agents software, to view the traps as they are issued.
Note – Set the mdmonitord command to probe your system regularly to help ensure
that you receive traps if problems arise. See “Setting the mdmonitord Command for
Periodic Error Checking” on page 240. Also, refer to “Monitoring Solaris Volume
Manager with a cron Job” on page 245 for additional error-checking options.
Many problematic situations, such as an unavailable disk with RAID 0 volumes or soft
partitions on it, do not result in SNMP traps, even when reads and writes to the device
are attempted. SCSI or IDE errors are generally reported in these cases, but other
SNMP agents must issue traps for those errors to be reported to a monitoring console.
Note – This script serves as a starting point for automating Solaris Volume Manager
error checking. You will probably need to modify this script for your own
configuration.
#
#ident "@(#)metacheck.sh 1.3 96/06/21 SMI"
#!/bin/ksh
#!/bin/ksh -x
#!/bin/ksh -v
# ident=’%Z%%M% %I% %E% SMI’
#
# Copyright (c) 1999 by Sun Microsystems, Inc.
#
# metacheck
#
# Check on the status of the metadevice configuration. If there is a problem
# return a non zero exit code. Depending on options, send email notification.
#
# -h
# help
# -s setname
# Specify the set to check. By default, the ’local’ set will be checked.
# -m recipient [recipient...]
# Send email notification to the specified recipients. This
# must be the last argument. The notification shows up as a short
# email message with a subject of
# "Solaris Volume Manager Problem: metacheck.who.nodename.setname"
# which summarizes the problem(s) and tells how to obtain detailed
# information. The "setname" is from the -s option, "who" is from
# the -w option, and "nodename" is reported by uname(1).
# Email notification is further affected by the following options:
# -f to suppress additional messages after a problem
# has been found.
# -d to control the supression.
# -w to identify who generated the email.
# -t to force email even when there is no problem.
shift
# decho "strstr LOOK .$look. FIRST .$1."
while [ $# -ne 0 ] ; do
if [ "$look" = "$1" ] ; then
ret="$look"
fi
shift
done
echo "$ret"
}
shift
# decho "strdstr LOOK .$look. FIRST .$1."
while [ $# -ne 0 ] ; do
if [ "$look" != "$1" ] ; then
ret="$ret $1"
fi
shift
done
echo "$ret"
}
merge_continued_lines()
{
awk -e ’\
BEGIN { line = "";} \
$NF == "\\" { \
$NF = ""; \
line = line $0; \
next; \
} \
$NF != "\\" { \
if ( line != "" ) { \
print line $0; \
line = ""; \
} else { \
print $0; \
} \
}’
}
#
# - MAIN
#
METAPATH=/usr/sbin
PATH=//usr/bin:$METAPATH
USAGE="usage: metacheck [-s setname] [-h] [[-t] [-f [-d datefmt]] \
[-w who] -m recipient [recipient...]]"
curdate_filter=‘date +$datefmt‘
curdate=‘date‘
node=‘uname -n‘
#
# Check replicas for problems, capital letters in the flags
# indicate an error, fields are seperated by tabs.
#
problem=‘awk < $metadb_f -F\t ’{if ($1 ~ /[A-Z]/) print $1;}’‘
if [ -n "$problem" ] ; then
retval=‘expr $retval + 64‘
echo "\
metacheck: metadb problem, for more detail run:\n\t$METAPATH/metadb$setarg -i" \
>> $msgs_f
fi
#
# Check the metadevice state
#
problem=‘awk < $metastat_f -e \
#
# Check the hotspares to see if any have been used.
#
problem=""
grep "no hotspare pools found" < $metahs_f > /dev/null 2>&1
if [ $? -ne 0 ] ; then
problem=‘awk < $metahs_f -e \
’/blocks/ { if ( $2 != "Available" ) print $0;}’‘
fi
if [ -n "$problem" ] ; then
retval=‘expr $retval + 256‘
echo "\
metacheck: hot spare in use, for more detail run:\n\t$METAPATH/metahs$setarg -i" \
>> $msgs_f
fi
fi
For information on invoking scripts by using the cron utility, see the cron(1M) man
page.
This chapter describes some Solaris Volume Manager problems and their appropriate
solution. It is not intended to be all-inclusive but rather to present common scenarios
and recovery procedures.
Replace a failed disk Replace a disk, then update state “How to Replace a Failed
database replicas and logical volumes Disk” on page 255
on the new disk.
253
Task Description Instructions
Recover from improper Use the fsck command on the mirror, “How to Recover From
/etc/vfstab entries then edit the /etc/vfstab file so the Improper /etc/vfstab
system will boot correctly. Entries” on page 258
Recover from a boot Boot from a different submirror. “How to Recover From a
device failure Boot Device Failure”
on page 261
Recover from insufficient Delete unavailable replicas by using “How to Recover From
state database replicas the metadb command. Insufficient State Database
Replicas” on page 265
Recover a Solaris Attach disks to a new system and have “How to Recover a
Volume Manager Solaris Volume Manager rebuild the Configuration” on page 271
configuration from configuration from the existing state
salvaged disks database replicas.
Tip – Any time you update your Solaris Volume Manager configuration, or make
other storage or operating environment-related changes to your system, generate fresh
copies of this configuration information. You could also generate this information
automatically with a cron job.
Replacing Disks
This section describes how to replace disks in a Solaris Volume Manager environment.
3. Record the slice name where the state database replicas reside and the number of
state database replicas, then delete the state database replicas.
The number of state database replicas is obtained by counting the number of
appearances of a slice in the metadb command output in Step 2. In this example, the
three state database replicas that exist on c0t1d0s4 are deleted.
# metadb -d c0t1d0s4
Caution – If, after deleting the bad state database replicas, you are left with three or
fewer, add more state database replicas before continuing. This will help ensure that
configuration information remains intact.
4. Locate any submirrors that use slices on the failed disk and detach them.
The metastat command can show the affected mirrors. In this example, one
submirror, d10, is using c0t1d0s4. The mirror is d20.
# metadetach d20 d10
d20: submirror d10 is detached
9. If you deleted state database replicas in Step 3, add the same number back to the
appropriate slice.
In this example, /dev/dsk/c0t1d0s4 is used.
# metadb -a c 3 c0t1d0s4
10. Depending on how the disk was used, you have a variety of things to do. Use the
following table to decide what to do next.
Unmirrored RAID 0 volume or If the volume is used for a file system, run the newfs
Soft Partition command, mount the file system then restore data from
backup. If the RAID 0 volume is used for an application that
uses the raw device, that application must have its own
recovery procedures.
RAID 5 volume Run the metareplace command to re-enable the slice, which
causes the resynchronization to start.
11. Replace hot spares that were deleted, and add them to the appropriate hot spare
pool or pools.
# metahs -a hsp000 c0t0d0s6
hsp000: Hotspare is added
The following table describes these problems and points you to the appropriate
solution.
There are not enough state database “How to Recover From Insufficient State Database
replicas. Replicas” on page 265
A boot device (disk) has failed. “How to Recover From a Boot Device Failure”
on page 261
The incorrect /etc/vfstab file would look something like the following:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
/dev/dsk/c0t3d0s0 /dev/rdsk/c0t3d0s0 / ufs 1 no -
/dev/dsk/c0t3d0s1 - - swap - no -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no -
#
/proc - /proc proc - no -
floppy - /dev/floppy floppy - no -
swap - /tmp tmpfs - yes -
Because of the errors, you automatically go into single-user mode when the system is
booted:
ok boot
...
configuring network interfaces: hme0.
Hostname: lexicon
mount: /dev/dsk/c0t3d0s0 is not this fstype.
setmnt: Cannot open /etc/mnttab for writing
At this point, root (/) and /usr are mounted read-only. Follow these steps:
1. Run the fsck command on the root (/) mirror.
# fsck /dev/md/rdsk/d0
** /dev/md/rdsk/d0
** Currently Mounted on /
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
2274 files, 11815 used, 10302 free (158 frags, 1268 blocks,
0.7% fragmentation)
2. Remount root (/) read/write so you can edit the /etc/vfstab file.
# mount -o rw,remount /dev/md/dsk/d0 /
mount: warning: cannot lock temp file </etc/.mnt.lock>
4. Verify that the /etc/vfstab file contains the correct volume entries.
The root (/) entry in the /etc/vfstab file should appear as follows so that the entry
for the file system correctly references the RAID 1 volume:
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
/dev/md/dsk/d0 /dev/md/rdsk/d0 / ufs 1 no -
/dev/dsk/c0t3d0s1 - - swap - no -
/dev/dsk/c0t3d0s6 /dev/rdsk/c0t3d0s6 /usr ufs 2 no -
#
/proc - /proc proc - no -
floppy - /dev/floppy floppy - no -
swap - /tmp tmpfs - yes -
In the following example, the boot device contains two of the six state database
replicas and the root (/), swap, and /usr submirrors fails.
Initially, when the boot device fails, you’ll see a message similar to the following. This
message might differ among various architectures.
Rebooting with command:
Boot device: /iommu/sbus/dma@f,81000/esp@f,80000/sd@3,0
The selected SCSI device is not responding
Can’t open boot device
...
When you see this message, note the device. Then, follow these steps:
2. Determine that two state database replicas have failed by using the metadb
command.
# metadb
flags first blk block count
M p unknown unknown /dev/dsk/c0t3d0s3
3. Determine that half of the root (/), swap, and /usr mirrors have failed by using the
metastat command.
# metastat
d0: Mirror
Submirror 0: d10
State: Needs maintenance
Submirror 1: d20
State: Okay
...
d10: Submirror of d0
State: Needs maintenance
Invoke: "metareplace d0 /dev/dsk/c0t3d0s0 <new device>"
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s0 0 No Maintenance
d20: Submirror of d0
State: Okay
Size: 47628 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s0 0 No Okay
d1: Mirror
Submirror 0: d11
State: Needs maintenance
Submirror 1: d21
State: Okay
...
d11: Submirror of d1
State: Needs maintenance
Invoke: "metareplace d1 /dev/dsk/c0t3d0s1 <new device>"
Size: 69660 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s1 0 No Maintenance
d21: Submirror of d1
State: Okay
Size: 69660 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
d2: Mirror
Submirror 0: d12
State: Needs maintenance
Submirror 1: d22
State: Okay
...
d2: Mirror
Submirror 0: d12
State: Needs maintenance
Submirror 1: d22
State: Okay
...
d12: Submirror of d2
State: Needs maintenance
Invoke: "metareplace d2 /dev/dsk/c0t3d0s6 <new device>"
Size: 286740 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t3d0s6 0 No Maintenance
d22: Submirror of d2
State: Okay
Size: 286740 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
/dev/dsk/c0t2d0s6 0 No Okay
In this example, the metastat command shows that following submirrors need
maintenance:
■ Submirror d10, device c0t3d0s0
■ Submirror d11, device c0t3d0s1
■ Submirror d12, device c0t3d0s6
4. Halt the system, replace the disk, and use the format command or the fmthard
command, to partition the disk as it was before the failure.
Tip – If the new disk is identical to the existing disk (the intact side of the mirror in
this example), use prtvtoc /dev/rdsk/c0t2d0s2 | fmthard - -s
/dev/rdsk/c0t3d0s2 to quickly format the new disk (c0t3d0 in this example)
# halt
...
Halted
...
ok boot
...
# format /dev/rdsk/c0t3d0s0
6. To delete the failed state database replicas and then add them back, use the metadb
command.
# metadb
flags first blk block count
M p unknown unknown /dev/dsk/c0t3d0s3
M p unknown unknown /dev/dsk/c0t3d0s3
a m p luo 16 1034 /dev/dsk/c0t2d0s3
a p luo 1050 1034 /dev/dsk/c0t2d0s3
a p luo 16 1034 /dev/dsk/c0t1d0s3
a p luo 1050 1034 /dev/dsk/c0t1d0s3
# metadb -d c0t3d0s3
# metadb -c 2 -a c0t3d0s3
# metadb
flags first blk block count
a m p luo 16 1034 /dev/dsk/c0t2d0s3
a p luo 1050 1034 /dev/dsk/c0t2d0s3
a p luo 16 1034 /dev/dsk/c0t1d0s3
a p luo 1050 1034 /dev/dsk/c0t1d0s3
a u 16 1034 /dev/dsk/c0t3d0s3
a u 1050 1034 /dev/dsk/c0t3d0s3
# metareplace -e d1 c0t3d0s1
Device /dev/dsk/c0t3d0s1 is enabled
# metareplace -e d2 c0t3d0s6
Device /dev/dsk/c0t3d0s6 is enabled
After some time, the resynchronization will complete. You can now return to booting
from the original device.
1. Boot the system to determine which state database replicas are down.
3. If one or more disks are known to be unavailable, delete the state database replicas
on those disks. Otherwise, delete enough errored state database replicas (W, M, D, F,
or R status flags reported by metadb) to ensure that a majority of the existing state
database replicas are not errored.
Delete the state database replica on the bad disk using the metadb -d command.
Tip – State database replicas with a capitalized status flag are in error, while those
with lowercase status flags are functioning normally.
4. Verify that the replicas have been deleted by using the metadb command.
5. Reboot.
6. If necessary, you can replace the disk, format it appropriately, then add any state
database replicas needed to the disk. Following the instructions in “Creating State
Database Replicas” on page 58.
Once you have a replacement disk, halt the system, replace the failed disk, and once
again, reboot the system. Use the format command or the fmthard command to
partition the disk as it was configured before the failure.
panic:
stopped at edd000d8: ta %icc,%g0 + 125
Type ’go’ to resume
ok boot -s
Resetting ...
The system paniced because it could no longer detect state database replicas on slice
/dev/dsk/c1t1d0s0, which is part of the failed disk or attached to a failed
controller. The first metadb -i command identifies the replicas on this slice as having
a problem with the master blocks.
When you delete the stale state database replicas, the root (/) file system is read-only.
You can ignore the mddb.cf error messages displayed.
At this point, the system is again functional, although it probably has fewer state
database replicas than it should, and any volumes that used part of the failed storage
are also either failed, errored, or hot-spared; those issues should be addressed
promptly.
Any device errors or panics must be managed by using the command line utilities.
At reboot, fsck checks and repairs the file system and transitions the file system back
to the “Okay” state. fsck completes this process for all transactional volumes listed in
the /etc/vfstab file for the affected log device.
Any devices sharing the failed log device also go the “Error” state.
This procedure is a last option to recover lost soft partition configuration information.
The metarecover command should only be used when you have lost both your
metadb and your md.cf files, and your md.tab is lost or out of date.
Note – If your configuration included other Solaris Volume Manager volumes that
were built on top of soft partitions, you should recover the soft partitions before
attempting to recover the other volumes.
Configuration information about your soft partitions is stored on your devices and in
your state database. Since either of these sources could be corrupt, you must tell the
metarecover command which source is reliable.
First, use the metarecover command to determine whether the two sources agree. If
they do agree, the metarecover command cannot be used to make any changes. If
the metarecover command reports an inconsistency, however, you must examine its
output carefully to determine whether the disk or the state database is corrupt, then
you should use the metarecover command to rebuild the configuration based on the
appropriate source.
1. Read the “Background Information About Soft Partitions” on page 122.
This example recovers three soft partitions from disk, after the state database replicas
were accidentally deleted.
Note – Only recover a Solaris Volume Manager configuration onto a system with no
preexisting Solaris Volume Manager configuration. Otherwise, you risk replacing a
logical volume on your system with a logical volume that you are recovering, and
possibly corrupting your system.
Note – This process only works to recover volumes from the local disk set.
2. Do a reconfiguration reboot to ensure that the system recognizes the newly added
disks.
# reboot -- -r
3. Determine the major/minor number for a slice containing a state database replica
on the newly added disks.
Use ls -lL, and note the two numbers between the group name and the date. Those
are the major/minor numbers for this slice.
# ls -Ll /dev/dsk/c1t9d0s7
brw-r----- 1 root sys 32, 71 Dec 5 10:05 /dev/dsk/c1t9d0s7
4. If necessary, determine the major name corresponding with the major number by
looking up the major number in /etc/name_to_major.
5. Update the /kernel/drv/md.conf file with two commands: one command to tell
Solaris Volume Manager where to find a valid state database replica on the new
disks, and one command to tell it to trust the new replica and ignore any conflicting
device ID information on the system.
In the line in this example that begins with mddb_bootlist1, replace the sd in the
example with the major name you found in the previous step. Replace 71 in the
example with the minor number you identified in Step 3.
#pragma ident "@(#)md.conf 2.1 00/07/07 SMI"
#
# Copyright (c) 1992-1999 by Sun Microsystems, Inc.
# All rights reserved.
#
name="md" parent="pseudo" nmd=128 md_nsets=4;
#
#pragma ident "@(#)md.conf 2.1 00/07/07 SMI"
#
# Copyright (c) 1992-1999 by Sun Microsystems, Inc.
# All rights reserved.
#
name="md" parent="pseudo" nmd=128 md_nsets=4;
# Begin MDD database info (do not edit)
mddb_bootlist1="sd:71:16:id0";
md_devid_destroy=1;# End MDD database info (do not edit)
d10: Mirror
Submirror 0: d0
State: Okay
Submirror 1: d1
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 82593 blocks
d10: Mirror
Submirror 0: d0
State: Okay
Submirror 1: d1
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 82593 blocks
This appendix contains information about Solaris Volume Manager files for reference
purposes. It contains the following:
■ “System Files and Startup Files” on page 277
■ “Manually Configured Files” on page 279
Caution – Do not edit this file. If you change this file, you could corrupt your
Solaris Volume Manager configuration.
The /etc/lvm/mddb.cf file records the locations of state database replicas. When
state database replica locations change, Solaris Volume Manager makes an entry in
the mddb.cf file that records the locations of all state databases. See mddb.cf(4)
for more information.
■ /etc/lvm/md.cf
The /etc/lvm/md.cf file contains automatically generated configuration
information for the default (unspecified or local) disk set. When you change the
Solaris Volume Manager configuration, Solaris Volume Manager automatically
updates the md.cf file (except for information about hot spares in use). See
md.cf(4) for more information.
277
Caution – Do not edit this file. If you change this file, you could corrupt your
Solaris Volume Manager configuration or be unable to recover your Solaris Volume
Manager configuration.
If your system loses the information maintained in the state database, and as long
as no volumes were changed or created in the meantime, you can use the md.cf
file to recover your configuration. See “How to Initialize Solaris Volume Manager
from a Configuration File” on page 224.
■ /kernel/drv/md.conf
The md.conf configuration file is read by Solaris Volume Manager at startup. You
can edit two fields in this file: nmd, which sets the number of volumes
(metadevices) that the configuration can support, and md_nsets, which is the
number of disk sets. The default value for nmd is 128, which can be increased to
8192. The default value for md_nsets is 4, which can be increased to 32. The total
number of named disk sets is always one less than the md_nsets value, because
the default (unnamed or local) disk set is included in md_nsets.
Note – Keep the values of nmd and md_nsets as low as possible. Memory
structures exist for all possible devices as determined by nmd and md_nsets, even
if you have not created those devices. For optimal performance, keep nmd and
md_nsets only slightly higher than the number of volumes you will use.
■ /etc/rcS.d/S35svm.init
This file configures and starts Solaris Volume Manager at boot and allows
administrators to start and stop the daemons.
■ /etc/rc2.d/S95svm.sync
This file checks the Solaris Volume Manager configuration at boot, starts
resynchronization of mirrors if necessary, and starts the active monitoring daemon.
(For more information, see mdmonitord(1M)).
Note – The configuration information in the /etc/lvm/md.tab file might differ from
the current volumes, hot spares, and state database replicas in use. It is used manually,
by the system administrator, to capture the intended configuration. After you change
your Solaris Volume Manager configuration, re-create this file and preserve a backup
copy.
Once you have created and updated the file, the metainit, metahs, and metadb
commands then activate the volumes, hot spare pools, and state database replicas
defined in the file.
In the /etc/lvm/md.tab file, one complete configuration entry for a single volume
appears on each line using the syntax of the metainit, metadb, and metahs
commands.
You then run the metainit command with either the -a option, to activate all
volumes in the /etc/lvm/md.tab file, or with the volume name that corresponds to
a specific entry in the file.
Note – Solaris Volume Manager does not write to or store configuration information in
the /etc/lvm/md.tab file. You must edit the file by hand and run the metainit,
metahs, or metadb commands to create Solaris Volume Manager components.
This appendix provides quick access information about the features and functions
available with Solaris Volume Manager.
281
TABLE B–1 Solaris Volume Manager Commands (Continued)
Solaris Volume Manager Man page
Command Description
CIM defines the data model, referred to as the “schema“ which describes the
following:
■ attributes of and the operations against SVM devices
■ relationships among the various SVM devices
■ relationships among the SVM devices and other aspects of the operating system,
such as file systems
This model is made available through the Solaris Web Based Enterprise Management
(WBEM) SDK. The WBEM SDK is a set of Java™ technology-based API’s that allow
access to system management capabilities that are represented by CIM.
For more information about the CIM/WBEM SDK, see the Solaris WBEM SDK
Developer’s Guide.
283
284 Solaris Volume Manager Administration Guide • May 2002
Index
A configuration planning
adding hot spares, 156 guidelines, 29
alternate boot device overview, 29
IA, 103 trade-offs, 30
alternate boot path, 98 cron command, 252
B D
boot device disk set, 197
recovering from failure, 261 adding another host to, 209
boot problems, 258 adding disks to, 198
booting into single-user mode, 90 adding drives to, 207
administering, 203
checking status, 211, 216
creating, 207
C definition, 39, 44
concatenated stripe displaying owner, 212
definition, 67 example with two shared disk sets, 201
inability to use with /etc/vfstab file, 198
example with three stripes, 68
increasing the default number, 227
removing, 80
intended usage, 198
concatenated volume, See concatenation
placement of replicas, 198
concatenation
relationship to volumes and hot spare
creating, 76
pools, 198
definition, 66
releasing, 204, 212, 214
example with three slices, 67
reservation behavior, 203
expanding, 79
reservation types, 203
expanding UFS file system, 66
reserving, 203, 213
information for creating, 70 Solstice HA, 198
information for recreating, 70 usage, 197
removing, 80
usage, 66
285
DiskSuite Tool, See graphical interface hot spare (continued)
conceptual overview, 148
enabling, 163
replacing in a hot spare pool, 161
E hot spare pool, 44
enabling a hot spare, 162 administering, 150
enabling a slice in a RAID 5 volume, 143 associating, 157
enabling a slice in a submirror, 107 basic operation, 44
Enhanced Storage, See graphical interface changing association, 158
errors conceptual overview, 147, 149
checking for using a script, 245 creating, 155
/etc/lvm/md.cf file, 277 definition, 39, 44
/etc/lvm/mddb.cf file, 277 example with mirror, 149
/etc/rc2.d/S95lvm.sync file, 278 states, 160
/etc/rcS.d/S35lvm.init file, 278
/etc/vfstab file, 116, 177, 192
recovering from improper entries, 258
I
I/O, 31
interfaces, See Solaris Volume Manager
F interfaces
failover configuration, 44, 197 interlace
file system specifying, 74
expanding by creating a concatenation, 77
expansion overview, 41
growing, 228
guidelines, 45 K
panics, 268 /kernel/drv/md.conf file, 226, 278
unmirroring, 118
fmthard command, 263, 265
format command, 263, 265
fsck command, 193 L
local disk set, 198
lockfs command, 119, 193
log device
G definition, 167
general performance guidelines, 30 problems when sharing, 193
graphical interface recovering from errors, 194
overview, 36 shared, 167, 170
growfs command, 41, 229, 281 sharing, 192
GUI space required, 170
sample, 37 logging device
hard error state, 268
H
hot spare, 148 M
adding to a hot spare pool, 156 majority consensus algorithm, 52
Index 287
RAID 5 volume (continued) Solaris Volume Manager
initializing slices, 131 See Solaris Volume Manager
maintenance vs. last erred, 232 configuration guidelines, 45
overview of replacing and enabling recovering the configuration, 224
slices, 230 Solaris Volume Manager elements
parity information, 131, 134 overview, 38
replacing a failed slice, 145 Solaris Volume Manager interfaces
resynchronizing slices, 131 command line, 36
random I/O, 31 sample GUI, 37
raw volume, 75, 96, 138 Solaris Management Console, 36
read policies overview, 86 state database
releasing a disk set, 212, 214 conceptual overview, 43, 52
renaming volumes, 221 corrupt, 52
replica, 43 definition, 39, 43
reserving a disk set, 213 recovering from stale replicas, 265
resynchronization state database replicas, 43
full, 87 adding larger replicas, 61
optimized, 88 basic operation, 52
partial, 88 creating additional, 58
root (/) creating multiple on a single slice, 53
mirroring, 100 definition, 43
unmirroring, 116 errors, 56
location, 44, 54
minimum number, 54
recovering from stale replicas, 265
S two-disk configuration, 55
SCSI disk usage, 51
replacing, 255, 257 status, 212
sequential I/O, 32 stripe
shared disk set, 44 creating, 74
simple volume definition, 64
See RAID 0 volume example with three slices, 65
definition, 40 expanding, 79
slices information for creating, 70
adding to a RAID 5 volume, 142
information for recreating, 70
expanding, 78
removing, 80
soft partition
striped volume, See stripe
checking status, 127
striping
creating, 126
definition, 64
deleting, 129
submirror, 82
expanding, 128
attaching, 82
growing, 128
detaching, 82
recovering configuration for, 268
enabling a failed slice, 107
removing, 129
operation while offline, 82
soft partitioning
placing offline and online, 105
definition, 122
replacing a failed slice, 113
guidelines, 122
replacing entire, 114
locations, 122
U
UFS logging
definition, 165
/usr
logging, 177
mirroring, 100
unmirroring, 116
V
/var/adm/messages file, 230, 255
volume
checking status, 183
conceptual overview, 39
default number, 226
definition, 39
expanding disk space, 41
Index 289
290 Solaris Volume Manager Administration Guide • May 2002