Zfs Tutorial Part 1: Getting Started
Zfs Tutorial Part 1: Getting Started
Zfs Tutorial Part 1: Getting Started
In this tutorial I hope to give you a brief overview of ZFS and show you how to manage ZFS pools, the
foundation of ZFS. In subsequent parts will we look at ZFS filesystems in more depth.
Let your hook be always cast; in the pool where you least expect it, there will be a fish. — Ovid
Getting Started
You need:
Using Files
To use files on an existing filesystem, create four 128 MB files, eg.:
# ls -lh /home/ocean
total 1049152
Using Disks
To use real disks in the tutorial make a note of their names (eg. c2t1d0 or c1d0 under Solaris). You will be
destroying all the partition information and data on these disks, so be sure they're not needed.
In the examples I will be using files named disk1, disk2, disk3, and disk4; substitute your disks or files for
them as appropriate.
ZFS Overview
The architecture of ZFS has three levels. One or more ZFS filesystems exist in a ZFS pool, which consists of
one of more devices* (usually disks). Filesystems within a pool share its resources and are not restricted to
a fixed size. Devices may be added to a pool while its still running: eg. to increase the size of a pool. New
filesystems can be created within a pool without taking filesystems offline. ZFS supports filesystems
snapshots and cloning existing filesystems. ZFS manages all aspects of the storage: volume management
software (such as SVM or Veritas) is not needed.
*Technically a virtual device (vdev), see the zpool(1M) man page for more.
If you run either command with no options it gives you a handy options summary.
Pools
All ZFS filesystems live in a pool, so the first step is to create a pool. ZFS pools are administered using
the zpool command.
Before creating new pools you should check for existing pools to avoid confusing them with your tutorial
pools. You can check what pools exist with zpool list:
# zpool list
no pools available
NB. OpenSolaris now uses ZFS, so you will likely have an existing ZFS pool called syspool on this OS.
# zpool list
No volume management, configuration, newfs or mounting is required. You now have a working pool
complete with mounted ZFS filesystem under /herring (/Volumes/herring on Mac OS X - you can also see it
mounted on your Mac desktop). We will learn about adjusting mount points in part 2 of the tutorial.
# ls -lh /herring/foo
# zpool list
The new file is using about a quarter of the pool capacity (indicated by the CAP value). NB. If you run the
list command before ZFS has finished writing to the disk you will see lower USED and CAP values than
shown above; wait a few moments and try again.
# zpool list
no pools available
On Mac OS X you need to force an unmount of the filesyetem (using umount -f /Volumes/herring) before
destroying it as it will be in use by fseventsd.
You will only receive a warning about destroying your pool if it's in use. We'll see in a later tutorial how you
can recover a pool you've accidentally destroyed.
Mirrored Pool
A pool composed of a single disk doesn't offer any redundancy. One method of providing redundancy is to
use a mirrored pair of disk as a pool:
# zpool list
pool: trout
state: ONLINE
config:
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
We can see our pool contains one mirror of two disks. Let's create a file and see how USED changes:
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
As before about a quarter of the disk has been used; but the data is now stored redundantly over two disks.
Let's test it by overwriting the first disk label with random data (if you are using real disks you could
physically disable or remove a disk instead):
ZFS automatically checks for errors when it reads/writes files, but we can force a check with the zfs
scrub command.
# zpool status
pool: trout
state: DEGRADED
status: One or more devices could not be used because the label is
missing or
see: http://www.sun.com/msg/ZFS-8000-4J
config:
trout DEGRADED 0 0 0
mirror DEGRADED 0 0 0
/home/ocean/disk1 UNAVAIL 0 0 0
corrupted data
/home/ocean/disk2 ONLINE 0 0 0
The disk we used dd on is showing as UNAVAIL with corrupted data, but no data errors are reported for the
pool as a whole, and we can still read and write to the pool:
# ls -l /trout/
total 131112
To maintain redundancy we should replace the broken disk with another. If you are using a physical disk
you can use the zpool replace command (the zpool man page has details). However, in this file-based
example I remove the disk file from the mirror and recreate it.
pool: trout
state: ONLINE
config:
trout ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
To attach another device we specify an existing device in the mirror to attach it to with zpool attach:
If you're quick enough, after you attach the new disk you will see a resilver (remirroring) in progress with
zpool status.
pool: trout
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
config:
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
Once the resilver is complete, the pool is healthy again (you can also use ls to check the files are still
there):
state: ONLINE
config:
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
# zpool list
# zpool list
This happens almost instantly, and the filesystem within the pool remains available. Looking at the status
now shows the pool consists of two mirrors:
# zpool status trout
pool: trout
state: ONLINE
config:
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk3 ONLINE 0 0 0
/home/ocean/disk4 ONLINE 0 0 0
We can see where the data is currently written in our pool using zpool iostat -v:
mirror 0 123M 0 0 0 0
/home/ocean/disk3 - - 0 0 0 768
/home/ocean/disk4 - - 0 0 0 768
All the data is currently written on the first mirror pair, and none on the second. This makes sense as the
second pair of disks was added after the data was written. If we write some new data to the pool the new
mirror will be used:
/home/ocean/disk1 - - 0 0 0 28.2K
/home/ocean/disk3 - - 0 0 0 11.1K
/home/ocean/disk4 - - 0 0 0 11.1K
Note how a little more of the data has been written to the new mirror than the old: ZFS tries to make best
use of all the resources in the pool.
That's it for part 1. In part 2 we will look at managing ZFS filesystems themselves and creating multiple
filesystems within a pool. We'll create a new pool for part 2, so feel free to destroy the trout pool.
zfs tutorial part 2
Learning to use ZFS, Sun's new filesystem.
ZFS is an open source filesystem used in Solaris 10, with growing support from other operating systems.
This series of tutorials shows you how to use ZFS with simple hands-on examples that require a minimum of
resources.
In ZFS Tutorial part 1 we looked at ZFS pool management. In this tutorial we look at ZFS filesystem
management, including creating new filesystems, destroying them and adjusting their properties.
As the poet said, 'Only God can make a tree' - probably because it's so hard to figure out how to get the
bark on. — Woody Allen
Getting Started
You need:
Using Files
To use files on an existing filesystem, create two 128 MB files, eg.:
# ls -lh /home/ocean
total 1049152
-rw------T 1 root root 128M Mar 7 19:48 disk1
Using Disks
To use real disks in the tutorial make a note of their names (eg. c2t1d0 or c1d0 under Solaris). You will be
destroying all the partition information and data on these disks, so be sure they're not needed.
In the examples I will be using a pair of 146 GB disks named c3t2d0 and c3t3d0; substitute your disks or
files for them as appropriate.
ZFS Filesystems
ZFS filesystems within a pool are managed with the zfs command. Before you can manipulate filesystems
you need to create a pool (you can learn about ZFS pools in part 1). When you create a pool, a ZFS
filesystem is created and mounted for you.
# zpool list
# zfs list
We can create an arbitrary number (264) of new filesystems within our pool. Let's add some filesystems
space for three users withzfs create:
Note how all four filesystems share the same pool space and all report 134 GB available. We'll see how to
set quotas and reserve space for filesystems later in this tutorial.
We can create arbitrary levels of filesystems, so you could create whole tree of filesystems inside
/salmon/kent.
We can also see our filesystems using df (output trimmed for brevity):
# df -h
You can remove filesystems with zfs destroy. User billj has stopped working on salmon, so let's remove
him:
# zfs list
You can set the mount point of a ZFS filesystem using zfs set mountpoint. For example, if we want to
move salmon under /projects directory:
# zfs list
On Mac OS X you need to force an unmount of the filesyetem (using umount -f /Volumes/salmon) before
changing the mount point as it will be in use by fseventsd. To mount it again after setting a new mount
point use 'zfs mount salmon'.
Mount points of filesystems are not limited to those of the pool as a whole, for example:
# zfs list
To mount and unmount ZFS filesystems you use zfs mount and zfs unmount*. ZFS filesystems are
entirely managed by ZFS by default, and don't appear in /etc/vfstab. In a future tutorial we will look at
using 'legacy' mount points to manage filesystems the traditional way.
*Old school Unix users will be pleased to know 'zfs umount' also works.
/projects/salmon on salmon
/projects/salmon/dennisr on salmon/dennisr
/projects/salmon on salmon
/projects/salmon/dennisr on salmon/dennisr
/fishing on salmon/kent
The first set of properties, with a SOURCE of '-', are read only and give information on your filesystem; the
rest of the properties can be set with 'zfs set'. The SOURCE value shows where a property gets its value
from, other than '-' there are three sources for a property:
The mountpoint property is shown as from a local source, this is because we set the mountpoint for this
filesystem above. We'll see an example of an inherited property in the section on compression (below).
I'm going to look at three properties in this section: quota, reservation and compression (sharenfs will be
covered in a future tutorial). You can read about the remaining properties in the Sun ZFS Administration
Guide.
For example, let's say we want to set a quota of 10 GB on dennisr and kent to ensure there's space for
other users to be added to salmon (If you are using disk files or small disks just substitute a suitable value,
eg. quota=10M):
You can see how we used zfs get to retrieve a particular property for a set of filesystems. There are some
useful options we can use with get:
salmon quota 0
If we look at our list of filesystems with zfs list we can see the effect of the quotas and reservation:
# zfs list
As expected the space available to salmon/dennisr and salmon/kent is now limited to 10 GB, but there
appears to be no change to salmon/jeffb. However, if we look at the used space for salmon as a whole we
can see this has risen to 1 GB. This space isn't actually used, but because it has been reserved for
salmon/jeffb it isn't available to the rest of the pool. Reservations could lead you to over-estimate the
spaced used in your pool. The df command always displays the actual usage, so can be handy in such
situations.
Compression
ZFS has built-in support for compression. Not only does this save disk space, but it can actually improve
performance on systems with plenty of CPU and highly compressible data, as it saves disk I/O. An obvious
candidate for compression is a logs directory.
This section hasn't been completed, but you can find details of ZFS compression in the zfs man page.