Ibm Sonas
Ibm Sonas
Ibm Sonas
Mary Lovelace Vincent Boucher Shradha Nayak Curtis Neal Lukasz Razmuk John Sing John Tarella
ibm.com/redbooks
International Technical Support Organization IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics December 2010
SG24-7875-00
Note: Before using this information and the product it supports, read the information in Notices on page xiii.
First Edition (December 2010) This edition applies to IBM Scale Out Network Attached Storage V1.1.1.
Copyright International Business Machines Corporation 2010. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Stay connected to IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Chapter 1. Introduction to IBM Scale Out Network Attached Storage . . . . . . . . . . . . . . 1 1.1 Marketplace requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Understanding I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 File I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Block I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Network Attached Storage (NAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Scale Out Network Attached Storage (SONAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 SONAS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 SONAS scale out capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3 SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.4 High availability design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 SONAS architectural concepts and principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Create, write, and read files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Creating and writing a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.3 Scale out more performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.4 Reading a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.5 Scale out parallelism and high concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.6 Manage storage centrally and automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.7 SONAS logical storage pools for tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.8 SONAS Software central policy engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4.9 High performance SONAS scan engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.10 High performance physical data movement for ILM / HSM. . . . . . . . . . . . . . . . . 32 1.4.11 HSM backup/restore to external storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.4.12 Requirements for high performance external HSM and backup restore . . . . . . . 34 1.4.13 SONAS high performance HSM using Tivoli Storage Manager . . . . . . . . . . . . . 34 1.4.14 SONAS high performance backup/restore using Tivoli Storage Manager . . . . . 35 1.4.15 SONAS and Tivoli Storage Manager integration in more detail . . . . . . . . . . . . . 36 1.4.16 Summary: Lifecycle of a file using SONAS Software . . . . . . . . . . . . . . . . . . . . . 39 1.4.17 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 2. Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Interface nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Storage nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Management nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Internal InfiniBand switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Internal private Ethernet switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 External Ethernet switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 External ports: 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2010. All rights reserved.
41 42 43 45 45 47 47 48 49 49 iii
2.3 Storage pods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 SONAS storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 SONAS storage expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Connection between components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Interface node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Storage node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Management node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Internal POD connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 Data InfiniBand network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.6 Management Ethernet network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.7 Connection to the external customer network. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 SONAS configurations available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Rack types: How to choose the correct rack for your solution . . . . . . . . . . . . . . . 2.5.2 Drive types: How to choose between various drive options . . . . . . . . . . . . . . . . . 2.5.3 External ports: 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 SONAS with XIV storage overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Differences between SONAS with XIV and standard SONAS system . . . . . . . . . 2.6.2 SONAS with XIV configuration overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 SONAS base rack configuration when used with XIV storage . . . . . . . . . . . . . . . 2.6.4 SONAS with XIV configuration and component considerations . . . . . . . . . . . . . .
50 51 53 53 53 56 57 58 59 60 60 61 61 66 67 68 68 69 70 70
Chapter 3. Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.1 SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2 SONAS data access layer: File access protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2.1 File export protocols: CIFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.2.2 File export protocols: NFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2.3 File export protocols: FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2.4 File export protocols: HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.5 SONAS locks and oplocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.3.1 Introduction to the SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.2 Principles of SONAS workload allocation to interface nodes . . . . . . . . . . . . . . . . 81 3.3.3 Principles of interface node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.3.4 Principles of storage node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.6 SONAS Cluster Manager manages multi-platform concurrent file access . . . . . . 86 3.3.7 Distributed metadata manager for concurrent access and locking . . . . . . . . . . . . 88 3.3.8 SONAS Cluster Manager components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4 SONAS authentication and authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.4.1 SONAS authentication concepts and flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.4.2 SONAS authentication methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.5 Data repository layer: SONAS file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.5.1 SONAS file system scalability and maximum sizes . . . . . . . . . . . . . . . . . . . . . . . 97 3.5.2 Introduction to SONAS file system parallel clustered architecture . . . . . . . . . . . . 98 3.5.3 SONAS File system performance and scalability . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.6 SONAS data management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.6.1 SONAS: Using the central policy engine and automatic tiered storage . . . . . . . 107 3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics . . . . . 111 3.7 SONAS resiliency using Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.7.1 SONAS Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.7.2 Integration with Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.8 SONAS resiliency using asynchronous replication . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.9 SONAS and Tivoli Storage Manager integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
iv
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
3.9.1 General Tivoli Storage Manager and SONAS guidelines . . . . . . . . . . . . . . . . . . 3.9.2 Basic SONAS to Tivoli Storage Manager setup procedure. . . . . . . . . . . . . . . . . 3.9.3 Tivoli Storage Manager software licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.4 How to protect SONAS files without Tivoli Storage Manager . . . . . . . . . . . . . . . 3.10 SONAS system management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.1 Management GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.2 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.3 Command Line Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.4 External notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Grouping concepts in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.1 Node grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.2 Node grouping and async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Summary: SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1 SONAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.2 SONAS goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Review of network attached storage concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 File systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Redirecting I/O over the network to a NAS device . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Network file system protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Domain Name Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Domain Name Server as used by SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Domain Name Server configuration best practices. . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Domain Name Server balances incoming workload . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Interface node failover / failback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Bonding modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Monitoring bonded ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Network groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Implementation networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Network interface names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Virtual Local Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 IP address ranges for internal connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Use of Network Address Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Management node as NTP server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.6 Maximum transmission unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.7 Considerations and restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 The impact of network latency on throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5. SONAS policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Creating and managing policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 File policy types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Rule overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Rule types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 SCAN engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Threshold implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 SONAS CLI policy commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 mkpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Changing policies using chpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Listing policies using the lspolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Applying policies using the setpolicy command . . . . . . . . . . . . . . . . . . . . . . . . .
121 124 125 126 126 127 131 131 132 133 134 137 137 137 139 141 142 142 142 144 145 145 145 146 147 148 150 150 151 151 153 153 153 153 154 154 155 155 155 159 160 160 160 160 161 162 162 163 164 164 165
Contents
5.2.5 Running policies using runpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Creating policies using mkpolicytask command . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 SONAS policy best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Cron job considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Policy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Peered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Tiered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 HSM policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Policy triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.7 Weight expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.8 Migration filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.9 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Policy creation and execution walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Creating a storage pool using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Creating a storage pool using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Creating and applying policies using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Creating and applying policies using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.5 Testing policy execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6. Backup and recovery, availability, and resiliency functions. . . . . . . . . . . 6.1 High availability and data protection in base SONAS . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Cluster Trivial Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 DNS performs IP address resolution and load balancing . . . . . . . . . . . . . . . . . . 6.1.3 File sharing protocol error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Backup and restore of file data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Tivoli Storage Manager terminology and operational overview . . . . . . . . . . . . . 6.2.2 Methods to back up a SONAS cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Tivoli Storage Manager client and server considerations . . . . . . . . . . . . . . . . . . 6.2.4 Configuring interface nodes for Tivoli Storage Manager. . . . . . . . . . . . . . . . . . . 6.2.5 Performing Tivoli Storage Manager backup and restore operations. . . . . . . . . . 6.2.6 Using Tivoli Storage Manager HSM client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Snapshot considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 VSS snapshot integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Snapshot creation and management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Local and remote replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Synchronous versus asynchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Block level versus file level replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 SONAS cluster replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Local synchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Remote async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Disaster recovery methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Backup of SONAS configuration information . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Restore data from a traditional backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Restore data from a remote replica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7. Configuration and sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Tradeoffs between configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 InfiniBand switch configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Storage Pod configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Interface node configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
165 165 165 165 167 168 168 169 169 170 170 170 171 171 173 175 176 178 181 182 182 184 185 185 185 186 186 187 189 190 193 193 194 194 198 199 199 199 200 205 217 217 219 219 221 222 222 223 223 224 226
vi
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
7.2 Considerations for sizing your configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Inputs for SONAS sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Application characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Workload characteristics definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Workload characteristics impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Workload characteristics measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Powers of two and powers of ten: The missing space . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Sizing the SONAS appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 SONAS disk drives and capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 SONAS disk drive availabilities over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Storage subsystem disk type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Interface node connectivity and memory configuration. . . . . . . . . . . . . . . . . . . . 7.5.5 Base rack sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Workload analyzer tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Installation planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Physical planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Space and floor requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 Heat and cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Installation checklist questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Storage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Async replication considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Block size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 File system overhead and characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.5 SONAS master file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.6 Failure groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.7 Setting up storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 SONAS integration into your network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Authentication using AD or LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Planning IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Data access and IP address balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Attachment to customer applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 Share access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.4 Backup considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Pre-Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Software installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Checking health of the node hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Additional hardware health checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Post installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Sample environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Initial hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Initial software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
231 233 234 234 235 240 241 241 242 242 243 243 244 245 245 249 250 250 252 253 253 254 260 260 261 262 263 264 264 265 265 266 267 268 278 278 278 278 278 279 280 280 280 281 281 281 282 282 282 283 291
Contents
vii
9.5.3 Understanding the IP addresses for internal networking . . . . . . . . . . . . . . . . . . 9.5.4 Configuring the Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.5 Listing all available disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.6 Adding a second failure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.7 Creating the GPFS file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.8 Configuring the DNS Server IP addresses and domains . . . . . . . . . . . . . . . . . . 9.5.9 Configuring the NAT Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.10 Configuring authentication: AD and LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.11 Configuring Data Path IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.12 Configuring Data Path IP address group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.13 Attaching the Data Path IP Address Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Creating Exports for data access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Modifying ACLs to the shared export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Testing access to the SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. SONAS administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Using the management interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 GUI tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Accessing the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 SONAS administrator tasks list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Tasks that can be performed only by the SONAS GUI . . . . . . . . . . . . . . . . . . . 10.2.2 Tasks that can be performed only by the SONAS CLI . . . . . . . . . . . . . . . . . . . 10.2.3 Tasks that can be performed by the SONAS GUI and SONAS CLI . . . . . . . . . 10.3 Cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Adding or deleting a cluster to the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Viewing cluster status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Viewing interface node and storage node status . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Modifying the status of interface nodes and storage nodes . . . . . . . . . . . . . . . 10.4 File system management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Creating a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Listing the file system status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 Mounting the file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Unmounting the file system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 Modifying the file system configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.6 Deleting a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.7 Master and non-master file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.8 Quota management for file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.9 File set management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Creating and managing exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Creating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Listing and viewing status of exports created . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.3 Modifying exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.4 Removing service/protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.5 Activating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.6 Deactivating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.7 Removing exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.8 Testing accessing the exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Disk management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 List Disks and View Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.2 Changing properties of disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.3 Starting disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.4 Removing disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 User management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
292 293 294 294 295 297 299 301 302 303 304 304 305 307 313 314 314 345 349 349 350 350 351 351 351 352 352 354 354 361 361 363 364 369 370 370 372 378 378 383 383 386 387 388 389 390 394 394 397 399 399 399
viii
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
10.7.1 SONAS administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.2 SONAS end users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Services Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.1 Management Service administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.2 Managing services on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 Real-time and historical reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.1 System utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.2 File System utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.3 Utilization Thresholds and Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10 Scheduling tasks in SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.1 Listing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.2 Removing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.3 Modifying the schedule tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.2 Default Grid view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.3 Event logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12 Call home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11. Migration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 SONAS file system authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 SONAS file system ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 File sharing protocols in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Windows CIFS and SONAS considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Migrating files and directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Data migration considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Metadata migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Migration tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Migration of CIFS shares and NFS exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Migration data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 Types of migration approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Sample throughput estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.4 Migration throughput example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Getting started with SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Quick start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Connecting to the SONAS appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Connecting to the SONAS appliance using the GUI . . . . . . . . . . . . . . . . . . . . . 12.2.2 Connecting to the SONAS appliance using the CLI . . . . . . . . . . . . . . . . . . . . . 12.3 Creating SONAS administrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Creating a SONAS administrator using the CLI . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Creating a SONAS administrator using the GUI . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Monitoring your SONAS environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Topology view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 SONAS logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.3 Performance and reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.4 Threshold monitoring and notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Creating a filesystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Creating a filesystem using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Creating a filesystem using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Creating an export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.1 Configuring exports using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
399 403 405 405 406 411 411 413 415 416 416 418 419 420 420 424 432 433 435 436 436 437 439 440 440 441 442 444 444 445 445 447 447 449 450 450 450 451 452 452 452 453 454 457 458 459 462 462 465 466 466
Contents
ix
12.6.2 Configuring exports using the CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Accessing an export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 Accessing a CIFS share from Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.2 Accessing a CIFS share from a Windows command prompt . . . . . . . . . . . . . . 12.7.3 Accessing a NFS share from Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Creating and using snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.1 Creating snapshots with the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.2 Creating snapshots with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.3 Accessing and using snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9 Backing up and restoring data with Tivoli Storage Manager . . . . . . . . . . . . . . . . . . . Chapter 13. Hints, tips, and how to information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 What to do when you receive an EFSSG0026I error message . . . . . . . . . . . . . . . . 13.1.1 EFSSG0026I error: Management service stopped . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Commands to use when management service not running . . . . . . . . . . . . . . . 13.2 Debugging SONAS with logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 CTDB health check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 GPFS logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 CTDB logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.4 Samba and Winbind logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 When CTDB goes unhealthy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 How CTDB manages services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Master file system unmounted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 CTDB manages GPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 GPFS unable to mount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Additional component details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Samba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster implementation requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustered Trivial Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How CTDB works to synchronize access to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Providing high availability for node failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB Node recovery mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP failover mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How CTDB manages the cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB tunables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File system concepts and access permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permissions and access control lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Traditional UNIX permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access control lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permissions and ACLs in Windows operating systems . . . . . . . . . . . . . . . . . . . . . . . . GPFS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS file management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS High Availability solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS failure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other GPFS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
469 470 470 471 471 472 472 473 473 475 479 480 480 480 480 480 481 481 481 482 482 482 482 483 485 486 486 486 486 487 490 492 493 494 495 495 496 496 497 498 498 498 498 499 502 503 504 507 508 508 509
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Tivoli Storage Manager concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Policy management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hierarchical Storage Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
509 510 516 519 522 527 527 527 527 528 528
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529
Contents
xi
xii
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
xiii
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AFS AIX BladeCenter DB2 Domino Enterprise Storage Server eServer FlashCopy GPFS HACMP IBM Lotus PowerVM pSeries Redbooks Redbooks (logo) System i System p5 System Storage System x Tivoli XIV xSeries z/OS
The following terms are trademarks of other companies: Snapshot, and the Network Appliance logo are trademarks or registered trademarks of Network Appliance, Inc. in the U.S. and other countries. Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xiv
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Preface
IBM Scale Out Network Attached Storage (IBM SONAS) is a Scale Out NAS offering designed to manage vast repositories of information in enterprise environments requiring very large capacities, high levels of performance, and high availability. IBM SONAS provides a range of reliable, scalable storage solutions for a variety of storage requirements. These capabilities are achieved by using network access protocols such as NFS, CIFS, HTTP, and FTP. Utilizing built-in RAID technologies, all data is well protected with options to add additional protection through mirroring, replication, snapshots, and backup. These storage systems are also characterized by simple management interfaces that make installation, administration, and troubleshooting uncomplicated and straightforward. In this IBM Redbooks publication, we give you details of the hardware and software architecture that make up the SONAS appliance, along with configuration, sizing, and performance considerations. We provide information about the integration of the SONAS appliance into an existing network. We demonstrate the administration of the SONAS appliance through the GUI and CLI, as well as showing backup and availability scenarios. Using a quick start scenario, we take you through common SONAS administration tasks to familiarize you with the SONAS system.
xv
Curtis Neal is an Executive IT Specialist working for the IBM System Storage Group in San Jose, California. He has over 25 years of experience in various technical capacities, including mainframe and open system test, design, and implementation. For the past eight years, he has led the Open Storage Competency Center, which helps customers and IBM Business Partners with the planning, demonstration, and integration of IBM System Storage Solutions. Lukasz Razmuk is an IT Architect at IBM Global Technology Services in Warsaw, Poland. He has six years of IBM experience in designing, implementing and supporting solutions in IBM IBM AIX, Linux, IBM pSeries, virtualization, high availability, General Parallel File System (GPFS), SAN Storage Area Network, Storage for Open Systems, and IBM Tivoli Storage Manager. Moreover, he acts as a Technical Account Advocate for Polish clients. He holds a Master of Science degree in Information Technology from Polish-Japanese Institute of Information Technology in Warsaw as well as many technical certifications, including IBM Certified Advanced Technical Expert on IBM System p5, IBM Certified Technical Expert pSeries, IBM HACMP, Virtualization Technical Support, and Enterprise Technical Support AIX 5.3. John Sing is an Executive IT Consultant with IBM Systems and Technology Group. John has specialties in large Scale Out NAS, in IT Strategy and Planning, and in IT High Availability and Business Continuity. Since 2001, John has been an integral member of the IBM Systems and Storage worldwide planning and support organizations. He started in the Storage area in 1994 while on assignment to IBM Hong Kong (S.A.R. of China), and IBM China. In 1998, John joined the IBM Enterprise Storage Server Planning team for PPRC, XRC, and IBM FlashCopy. He has been the marketing manager for these products, and in 2002, began working in Business Continuity and IT Strategy and Planning. Since 2009, John has also added focus on IT Competitive Advantage strategy including Scale Out NAS and Cloud Storage. John is the author of three Redbooks publications on these topics, and in 2007, celebrated his 25th anniversary of joining IBM. John Tarella is a Senior Consulting IT Specialist who works for IBM Global Services in Italy. He has 25 years of experience in storage and performance management on mainframe and distributed environments. He holds a degree in Seismic Structural Engineering from Politecnico di Milano, Italy. His areas of expertise include IBM Tivoli Storage Manager and storage infrastructure consulting, design, implementation services, open systems storage, and storage performance monitoring and tuning. He is presently focusing on storage solutions for business continuity, information lifecycle management, and infrastructure simplification. He has written extensively on z/OS DFSMS, IBM Tivoli Storage Manager, SANs, storage business continuity solutions, content management, and ILM solutions. John is currently focusing on cloud storage delivery. He also has an interest in Web2.0 and social networking tools and methdologies. Thanks to the following people for their contributions to this project: Mark Doumas Desiree Strom Sven Oehme Mark Taylor Alexander Saupp Mathias Dietz Jason Auvenshine Greg Kishi Scott Fadden Leonard Degallado Todd Neville Warren Saltzman Wen Moy xvi
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Tom Beglin Adam Childers Frank Sowin Pratap Banthia Dean Hanson Everett Bennally Ronnie Sahlberg Christian Ambach Andreas Luengen Bernd Baeuml
Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks publications form, found at: ibm.com/redbooks Send your comments in an email to: [email protected] Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
Preface
xvii
xviii
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 1.
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Todays businesses are demanding the ability to create, manage, retrieve, protect, and share business and social digital content or large rich media files over a broadband Internet that reaches to every corner of the globe (Figure 1-2). Users are creating and using data that is redefining our business and social world in real time. Unlike traditional IT data, this rich digital content is almost entirely file-based or object-based, and it is growing ever larger in size, with highly diverse and unpredictable usage patterns.
Innovative applications in business analytics, digital media, medical data and cloud storage are creating requirements for data access rates and response times to individual files that were previously unique to high-performance computing environmentsand all of this is driving a continuing explosion of business data. While many factors are contributing to data growth, these trends are significant contributors: Digital representation of physical systems and processes Capture of digital content from physical systems and sources Deliveries of digital content to a global population Additional trends are driven by the following kinds of applications: Product Life Cycle Management (PLM) systems, which include Product Data Management systems and mechanical, electronic, and software design automation Service Life Cycle Management (SLM) systems Information Life Cycle Management (ILM), including email archiving Video on demand: Online, broadcast, and cable Digital Video Surveillance (DVS): Government and commercial Video animation rendering Seismic modeling and reservoir analysis Pharmaceutical design and drug analysis Digital health care systems Web 2.0 and service-oriented architecture
When it comes to traditional IT workloads, traditional storage will continue to excel for the traditional applications for which they were designed. But solutions such as Network Attach Storage (NAS) were not intended to scale to the high levels and extremely challenging workload characteristics required by todays Internet-driven, Petabyte Age applications.
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
When sharing files across a network, something needs to control when writes can be done. The operating system fills this role. It does not allow multiple writes at the same time, even though many write requests are made. Databases are able to control this writing function on their own so in general they run faster by skipping the OS although this depends on the efficiency of the implementation of file system and database.
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
However, NAS filers today do not scale to high capacities. When one filer was fully utilized, a second, third, and more filers were installed. The result was that administrators found themselves managing silos of filers. It was not possible to share capacity on individual filers. Various filers were heavily accessed, while others were mostly idle. Figure 1-5 shows a summary of traditional NAS limitations.
This situation is compounded by the fact that at hundreds of terabytes or more, conventional backup of such a large storage farm is difficult, if not impossible. There is also the issue that even though one might be using incremental only backup, scanning hundreds of terabytes to identify the changed files or changed blocks might in itself take to long, with too much overhead. More issues include the possibility that there might not be any way to apply file placement, migration, deletion, and management policies automatically from one centrally managed, centrally deployed control point. Doing manual management of tens or hundreds of filers was proving to be neither timely nor cost-effective, and effectively prohibited any feasible way to globally implement automated tiered storage.
Utilizing mature technology from the IBM High Performance Computing experience, and based upon the IBM flagship General Parallel File System (GPFS), SONAS is an easy-to-install, turnkey, modular, scale out NAS solution that provides the performance, clustered scalability, high availability, and functionality that are essential to meeting strategic Petabyte Age and cloud storage requirements. Simply put, SONAS is a scale out storage system combined with high-speed interface nodes interconnected with storage capacity and GPFS, which enables organizations to scale performance alongside capacity in an integrated, highly-available system. The high-density, high-performance SONAS can help your organization consolidate and manage data affordably, reduce crowded floor space, and reduce management expense associated with administering an excessive number of disparate storage systems
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Assuming 2 TB disk drives, such a system has 14.4 petabytes (PB) of raw storage and billions of files in a single large file system. You can have as few as eight file systems in a fully configured 14.4 PB SONAS system or as many as 256 file systems. It provides an automated policy-based file management that controls backups and restores, snapshots, and remote replication. It also provides: A single global namespace with logical paths that do not change because of physical data movement Support for Serial Attached SCSI (SAS), Nearline SAS, and in the past, Serial Advanced Technology Attachment (SATA) drives High-availability and load-balancing Centralized management Centralized backup An interconnected cluster of file-serving and network-interfacing nodes in a redundant high-speed data network Virtually no capacity limits Virtually no scalability limits IBM Call Home trouble reporting and IBM Tivoli Assist On Site (AOS) remote support capabilities Enhanced support for your Tivoli Storage Manager Server product, with a preinstalled Tivoli Storage Manager client
Chapter 1. Introduction to IBM Scale Out Network Attached Storage
Support for the cloud environment. A controlled set of end users, projects, and applications can perform the following functions: Share files with other users within one or more file spaces Control access to their files using access control lists (Microsoft Windows clients) and user groups Manage each file space with a browser-based tool
Global namespace
SONAS provides a global namespace that enables your storage infrastructure to scale to extreme amounts of data, from terabytes to petabytes. Within the solution, centralized management, provisioning, control, and automated information life-cycle management (ILM) are integrated as standard features to provide the foundation for a truly cloud storage enabled solution.
interface nodes
The high-performance interface nodes provide connectivity to your Internet Protocol (IP) network for file access and support of both 1-gigabit Ethernet (GbE) and 10-GbE connection speeds.Each interface node can connect to the IP network with up to eight separate data-path connections. Performance and bandwidth scalability are achieved by adding interface nodes, up to the maximum of 30 nodes, each of which has access to all files in all file systems You can scale out to thirty interface nodes. Each interface node has its own cache memory, so you increase caching memory and data paths in your file-serving capacity by adding an interface node. Of course, you also increase file-serving processor capacity. If raw storage capacity is the prime constraint in the current system, the SONAS system scales out to as much as 14.4 petabytes (PBs) with 2 terabyte (TB) disk drives, with up to 256 file systems that can each have up to 256 file-system snapshots. Most systems that a SONAS system typically displaces cannot provide clients with access to so much storage from a single file-serving head. Every interface node has access to all of the storage capacity in the SONAS system.
10
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
11
12
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Logical
Physical
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
St or age nodes
Storage Pool 1
Storage Pool 2
In the top half of this diagram, we see the logical file directory structure as seen by the users. SONAS presents and preserves this same logical appearance to the users, no matter what we do to physically manage these files, and all files in the SONAS, from creation to deletion. The user sees only his global namespace, his user directories and files. As a SONAS expands, manages, and changes the physical data location and supporting physical infrastructure, the users will still have the unchanged appearance of one single logical global namespace, and maintain their logical file structure without change. In the lower half of this diagram, we see a representation of the SONAS internal architectural components. SONAS has interface nodes, which serve data to and from the users, over the network. SONAS also has storage nodes, which service the storage for the SONAS clustered file system. All SONAS nodes are in a global cluster, connected by InfiniBand. All interface nodes have full read/write access to all storage nodes. All storage nodes have full read/write access to all interface nodes. Each of the nodes runs a copy of IBM SONAS Software (5639-SN1), which provides all the functions of SONAS, including a Cluster Manager, which manages the cluster and dispatches workload evenly across the cluster.
13
Also included is the SONAS central storage policy engine, which runs in a distributed fashion across all the nodes in the SONAS. The SONAS policy engine provides central management of the lifecyle of all files, in a centrally deployed, centrally controlled, enforceable manner. The policy engine function is not tied to a particular node, it executes in a distributed manner across all nodes. Not shown are the SONAS management nodes, which monitor the health of the SONAS. IBM SONAS Software manages the cluster and maintains the coherency and consistency of the file system, providing file level and byte level locking, using a sophisticated distributed, token (lock) management architecture that is derived from IBM General Parallel File System (GPFS) technology. As we shall see, the SONAS clustered grid architecture provides the foundation for automatic load balancing, high availability, scale out high performance, with multiple parallel, concurrent writers and readers.
Physical disk drives are allocated to SONAS logical storage pools. Typically, we might
allocate a high performance pool of storage (which uses the fastest disk drives), and a lower tier of storage for capacity (less expensive, slower spinning drives). In the previous example, we have allocated three logical storage pools.
14
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
As shown in Figure 1-11, we have selected an interface node and are determining the file placement by the policy engine.
Logical
Global Namespace
Inter fa ce nodes
Policy Engine
Inter fa c e nodes
..
Inter fa ce nodes
Stora ge node s
..
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
All incoming create file requests pass through the SONAS central policy engine in order to determine file placement. The interface node takes the incoming create file request, and based on the logical storage pool for the file, passes the write request to the appropriate storage nodes. A logical storage pool can and often does span storage nodes. The storage nodes, in parallel, perform a large data striped write into the appropriate logical storage pool, exploiting the parallelism of writing the data simultaneously across multiple physical disk drives. SONAS data writes are done in a wide parallel data stripe write, across all disk drives in the logical storage pool. In this way, SONAS Software architecture aggregates the file write and read throughput of multiple disk drives, thus providing high performance. SONAS Software will write the file in physical blocksize chunks, according to the blocksize specified at the file system level. The default blocksize for a SONAS file system is 256 KB, and this is a good blocksize for the large majority of workloads, especially where there will be a mix of small random I/Os and large sequential workloads within the same file system. You can choose to define the file system with other blocksizes. For example, where the workload is known to be highly sequential in nature, you can choose to define the file system with a large 1 MB or even 4 MB blocksize. See the detailed sizing sections of this book for further best settings. This wide data striping architecture has algorithms that determine where the data blocks must physically reside; this provides the SONAS Software the ability to automatically tune and equally load balance all disk drives in the storage pool. This is shown in Figure 1-12.
15
Logical
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
..
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
Figure 1-12 Create and write file 1 - step 2 - wide parallel data stripe write
Now, let us write another file to the SONAS. Another interface node is appropriately selected for this incoming work request by the Domain Name Server, and the file is passed to that interface node for writing, as shown in Figure 1-13.
e
Logical
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
16
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Notice that another interface node has been chosen; this illustrates the automatic balancing of the incoming workload across the interface nodes. The interface node is told by the policy engine that this file is to be written to another logical storage pool. In the same manner as previously described, the file is written in a wide data stripe, as shown in Figure 1-14.
Logical
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
Figure 1-14 Create and write file 2 - step 2 - wide parallel data stripe write
Finally, let us write a third file. As shown in Figure 1-15, a third interface node is selected by the Domain Name Server.
Logical
Global Namespace
Inter fa ce nodes
Policy Engine
Inter fa c e nodes
..
Inter fa ce nodes
Stora ge node s
..
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
17
The SONAS policy engine has specified that this file is to be written into logical storage pool 3. A wide data stripe parallel write is done as shown in Figure 1-16.
Logical
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
St or age nodes
scale out
Storage Pool 1
Storage Pool 2
Storage Pool 3
Figure 1-16 Create and write file 2 - step 2 - wide parallel data stripe write
With these illustrations, we can now see how the components of the SONAS Software uses the SONAS policy engine together with the Domain Name Server, to drive workload equally across interface nodes. The SONAS Software then appropriately distributes workload among the storage nodes and physical disk drives. This is summarized in Figure 1-17.
Logical
Note: all three files, in same directory, but each allocated to different physical storage pool
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
Data striped across all disks in storage pool. High performance, auto-tuning, autoload balancing
Physical
St or age nodes
Storage Pool 1
Storage Pool 2
Storage Pool 3
18
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In summary, SONAS Software will automatically balance workload across all interface nodes. SONAS Software will write all data in wide stripes, cross all disks in the logical storage pool, providing high performance, automatic tuning, and automatic load balancing. Most importantly, notice that from the users perspective, these three files can all reside in the same logical path and directory. Users do not know that their files are physically located on various classes of storage (or that the physical location can change over time). This provides the ability to implement automatic physical tiered storage without impact to users, and without necessitating time-consuming, process-intensive application level s changes and change control. The SONAS Software will continue to maintain this same logical file structure and path, regardless of physical file location changes as the file is managed from creation through its life cycle, using SONAS automatic tiered storage.
/home/appl/data/web/important_big_spreadsheet.xls
Logical
Global Namespace
Policy Engine
Inte rf ac e node
Inter fa ce node
Inte rf ac e node
Stora ge node s
..
..
St or age nodes
Figure 1-18 Scale out more disk drives for more write performance
19
By simply adding more disk drives, the SONAS Software architecture provides the ability to scale out both the number of disk drives and the number of storage nodes that can be applied to support a higher amount of parallel physical data write. The logical storage pool can be as large as the entire file system - and the SONAS file system can be as large as the entire SONAS machine. In this way, SONAS provides a extremely scalable and flexible architecture for serving large scale NAS storage. The SONAS Software architecture provides the ability to expand the scale and capacity of the system in any direction that is desired. The additional disk drives and storage nodes can be added non-disruptively to the SONAS. Immediately upon doing so, SONAS Software will start to automatically auto-balance and auto-tune new workload onto the additional disks, and automatically start taking advantage of the additional resources.
Logical
Global Namespace
Policy Engine
Inter fa c e nodes
Parallel read of file, performance is aggregate all disk drives in that storage pool.
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
/home/a ppl/data/web/ important_big_spreadsheet.xls
..
St or age nodes
Storage Pool 1
Storage Pool 2
Furthermore, the interface node is designed to utilize advanced algorithms that improve read-ahead and write-behind file functions, and recognizes and does intelligent pre-fetch caching of typical access patterns such as sequential, reverse sequential, and random.
20
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Logical
Interface node performs read-ahead caching, intelligent pre-fetch IBM Scale Out NAS
Global Namespace
Policy Engine
Inter fa c e nodes
Inter fa ce nodes
..
Inter fa ce nodes
Stora ge node s
..
St or age nodes
Storage Pool 1
Storage Pool 2
In the same way that write performance can be enhanced by simply adding more disk drives to the logical storage pool, read performance can be enhanced in the same way, as shown in shown in Figure 1-21.
/home/appl/data/web/important_big_spreadsheet.xls
Logical
Read performance is aggregate of all disk drives in storage pool. More disks and storage pods provides more performance
Policy Engine
Inter fa ce nodes
Inte rf ac e node s
..
Stora ge node s
..
Stora ge node s
... >
scale out
Storage Pool 3
Figure 1-21 Scale out more disk drives for read performance - parallel data stripe read
21
Notice that the parallelism in the SONAS for an individual client is in the storage read/write; the connection from the interface to the client is a single connection and single stream. This is done on purpose, so that any standard CIFS, NFS, FTP, or HTTPS client can access the IBM SONAS interface nodes without requiring any modification or any special code. Throughput between the interface nodes and the users are enhanced by sophisticated read-ahead and pre-fetching and large memories on each interface node, to provide very high in capacity and throughput on the network connection to the user. As requirements for NAS storage capacity or performance increase. the SONAS Software scale out architecture provides linearly scalable, high performance, parallel disk I/O capabilities as follows: Striping data across multiple disks, across multiple storage nodes and storage pods Reading and writing data in parallel wide data stripes. Increasing the number of disk drives in the logical storage pool can increase the performance Supporting a large block size, configurable by the administrator, to fit I/O requirements Utilizing advanced algorithms that improve read-ahead and write-behind file functions; SONAS recognizes typical access patterns like sequential, reverse sequential and random and optimizes I/O access for these patterns This scale-out architecture of SONAS Software provides superb parallel performance, especially for larger data objects and excellent performance for large aggregates of smaller objects.
22
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
With the SONAS clustered node architecture, the larger the machine, the more parallel the capability to concurrrently scale out capacity and performance for many individual nodes, many concurrent users, and their storage requests, in parallel, as shown in Figure 1-22.
Logical
/home/appl/data/web/big_architecture_design.ppt /home/appl/data/web/unstructured_big_video.mpg
Global Namespace
Policy Engine
Inter fa ce nodes
Inte rf ac e node s
..
scale out
Physical
..
Stora ge node s
scale out
Storage Pool 1
Storage Pool 2
Storage Pool 3
The value of the SONAS scale out architecture is the ability to flexibly and dynamically add as many nodes as needed, to increase the amount of parallel concurrent users that can be supported. Each individual node works in parallel to service clients, as shown in Figure 1-23.
Logical
/home/appl/data/web/big_architecture_design.ppt /home/appl/data/web/unstructured_big_video.mpg
Global Namespace
Policy Engine
Inter fa ce nodes
Inte rf ac e node s
..
Stora ge node s
..
Stora ge node s
Storage Pool 1
Storage Pool 2
23
SONAS has the same operational procedures and read/write file system architectural philosophy, whether you have a small two interface nodes and two storage node SONAS, or a 30 interface node and 60 storage node very large SONAS.
24
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Figure 1-24, multiple logical storage pools have been set up, as shown in Figure 1-24.
SONAS fileset
Global Namespace
Policy Engine
In e t rf a c e n od es In e t rf a c e n od es I nt er f a c e n ode s I nt er f ac e no de s I nt er f ac e no de s n e I t r f ac e no des In e t rf a c e n od es In e t rf a c e n od es
>
St or ag e nodes
St or ag e nodes
St or ag e nodes
t orag S e nodes
St or ag e nodes
St or ag e nodes
St or ag e no des
St or ag e nodes
St r oag e nodes
St or ag e nodes
St or ag e nodes
St or ag e nodes
St orag e nodes
St or g a e nodes
St or ag e nodes
St or ag e node s
St or ag e no des
>
>
etc.. etc.. etc ..
Logical storage pool #1 can be high performance SAS disks, and logical storage pool #2 might be more economical Nearline SAS large TB disk drives. Logical storage pool #3 might be another large TB drive storage pool defined with external HSM for when the data is intended to be staged in and out of the SONAS, to external storage, external tape or tape libraries, or external data de-decapitation technology. Within the internal SONAS logical storage pools, all of the data management, from creation to physical data movement to deletion, is done by SONAS Software1. In addition to internal storage pools, SONAS also supports external storage pools that are managed through an external Tivoli Storage Manager server. When moving data to an external pool, SONAS Software utilized a high performance scan engine to locate and identify files that need to be managed, and then hands the list of the files to be managed to either the SONAS Software data movement functions (for moving data internally within the SONAS), or passes the list of files to be moved externally to the Tivoli Storage Manager for backup and restore, or for HSM storage on alternate media, such as tape, tape library, virtual tape library, or data de-duplication devices. If the data is moved for HSM purposes, a stub file is left on the disk, and this HSM data can be retrieved from the external storage pool on demand, as a result of an application opening a file. HSM data can also be retrieved in a batch operation if desired.
Note that if all data movement is within a SONAS, there is no need for an external Tivoli Storage Manager server.
25
SONAS Software provides the file management concept of a fileset, which is a sub-tree of the file system namespace, providing a way to partition the global namespace into smaller, more manageable units. A fileset is basically a named collection of files and/or directories that you want to operate upon or maintain as a common unit. Filesets provide an administrative boundary that can be used to set quotas and be specified in a user defined policy, to control initial data placement or data migration. Currently, up to 1,000 filesets can be defined per file system, and it is a known requirement to increase this number in the future. Data and files in a single SONAS fileset can reside in one or more logical storage pools. As the data is physically migrated according to storage policy, the fileset grouping is maintained. Where the file data physically resides, and how and when it is migrated, is based on a set of rules in a user defined policy, that is managed by the SONAS Software policy engine. Let us next overview this SONAS central policy engine.
26
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Automated Tiered Storage policy statem ent examples: Migration policies, evaluated periodically
rule 'cleangold' migrate from pool TIER1' threshold (90,70) to pool TIER2 rule 'hsm' migrate from pool T IER3' threshold(90,85) weight(current_timestamp access_time) to pool HSM' where file_size > 1024kb rule 'cleansilver' when day_of_week()=Monday migrate from pool 'silver' to pool 'bronze' where access_age > 30 days
rule 'purgebronze' when day_of_month()=1 delete from pool 'bronze' where access_age>365 days
After files exist in a SONAS file system, SONAS Software file management policies allow you to move, change the replication status or delete files. You can use file management policies to move data from one pool to another, without changing the files location in the directory structure. The rules are very flexible; as an example, you can write a rule that says: replicate all files in /database/payroll which have the extension *.dat and are greater than 1 MB in size to storage pool 2. In addition, file management policies allow you to prune the file system, deleting files as defined by policy rules. File management policies can use more attributes of a file than file placement policies, because after a file exists, there is more known about the file. In addition to the file placement attributes, the policies can now utilize attributes such as last access time, size of the file or a mix of user and file size. This can result in policy statements such as: delete all files with a name ending in .temp that have not been accessed in 30 days, move all files that are larger than 2 GB to pool2, or migrate all files owned by GroupID=Analytics that are larger than 4 GB to the SATA storage pool
27
Rules can include attributes related to a pool instead of a single file, using the threshold option. Using thresholds you can create a rule that moves files out of the high performance pool if it is more than 80% full, for example. The threshold option comes with the ability to set high low and pre-migrate thresholds. This means that SONAS Software begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate threshold is set, SONAS Software begins copying data until the pre-migrate threshold is reached. This allows the data to continue to be accessed in the original pool until it is quickly deleted to free up space the next time the high threshold is reached. Policy rule syntax is based on the SQL 92 syntax standard and supports multiple complex statements in a single rule enabling powerful policies. Multiple levels of rules can be applied because the complete policy rule set is evaluated for each file when the policy engine executes.
28
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Let us see how this works in more detail. To begin a scan to identify files, we submit a job to the SONAS Software central policy engine to evaluate a set of policy rules, as shown in Figure 1-26.
Scan Engine
/hom e /appl /data /web
Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system
Global Namespace
Inter fa ce node
Policy Engine
2. Read policies
Interface nodes
Int erf ac e node Inter fa ce node
Scale Out
Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s
Stora ge nodes
Storage Pool 2
Storage Pool 3
Figure 1-26 High performance scan engine - start scan by reading policies
The SONAS Software high performance scan engine is designed to utilize the multiple hardware nodes of the SONAS in parallel, to scan the internal file system metadata. The multiple nodes equally spread the policy engine rule evaluation, file scan identification, and and subsequent data movement responsibilities over the multiple nodes in the SONAS cluster. If greater scan speed is required, more SONAS nodes can be allocated to the scan, and each node will scan only its equal portion of the total scan.
29
This architectural aspect of SONAS Software provides a very scalable, high performance, scale out rule processing engine, that can provide the speed and parallelism required to address petabyte file system scan requirements. This is shown in Figure 1-27.
Scan Engine
/hom e /appl /data /web
Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system
Parallel metadata scan IBM Scale Out NAS Scan > 10 million files/minute
Some or all nodes (both storage and interface) participate in parallel scan engine
Global Namespace
Policy Engine
Inter fa ce node
Interface nodes
Inter fa ce node
3. Parallel Scan
Stora ge nodes Stora ge nodes Stor age node s
3. Parallel Scan
Stor age node s Stor age node s Stor age node s
Scale Out
Storage Pool 2
Storage 3
Figure 1-27 High performance scan engine - parallel scan of metadata by all nodes
The results of the parallel scan are aggregated, and returned as the actionable list of candidate files, as shown in Figure 1-28.
Scan Engine
/hom e /appl /data /web
Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system
Scan results completed in much shorter period of time, compared to traditional methods
Policy Engine
Inter fa ce node
Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node
Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s
Storage Pool 2
Storage Pool 3
Figure 1-28 High performance scan engine - return results of parallel scan
30
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Notice that the SONAS scan engine is not limited to just storage. The scan engine can also be used as follows: Reset file attributes according to policy (change deletions, change storage pool allocation, and so on) Run reports on file system usage and user activities Identify changed data blocks for asynchronous replication to remote site
31
/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg
6. All nodes (both storage and interface) can participate in parallel data movement
Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node
Policy Engine
Inte rfa c e node
Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s
Storage Pool 1
Storage Pool 2
Storage Pool 3
Figure 1-29 High performance parallel data movement for ILM - pool 1 to pool 2
All files remain online and fully accessible during this physical data movement; the logical appearance of the file path and location to the user does not change. The user has no idea that the physical location of his file has moved. This is one of the design objectives of the SONAS.
32
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
According to the results of the scan, SONAS continues with other physical file movement. According to policy, data can be up-staged as well as down-staged, as shown in Figure 1-30.
/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg
6. All nodes (both storage and interface) can participate in parallel data movement
Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node
Policy Engine
Inte rfa c e node
Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s
Storage Pool 1
Storage Pool 2
Storage Pool 3
Figure 1-30 High performance parallel data movement for ILM - pool 2 to pool 1
As the SONAS grows in capacity over time, it is a straightforward matter to add additional nodes to the parallel cluster, thus maintaining the ability to perform and complete file system scans and physical data movement in a timely manner, even as the file system grows into hundreds of terabytes and petabytes.
33
1.4.12 Requirements for high performance external HSM and backup restore
SONAS can support any standard HSM and backup/restore software. These conventional solutions use normal walk the directory tree methods to identify files that need to be managed and moved, and then copies these files using conventional methods. However, as file systems continue grow to the hundreds of terabytes to petabyte level, the following requirements have arisen: The elapsed time for traditional scans to identify files that need to be moved for HSM or backup/restore purposes, are becoming too long. In other words, due to the scale of the search, the time required to walk the directory tree is becoming too long, and incurs a very large amount of small block IOPs. These long scan times can severely inhibiting the ability to manage a large amount of storage. In many cases, the scan time alone can be longer than the backup or tiered storage management window. In addition, after we do identify the files, the large amount of data that this can represent often drives the need for very high data rates in order to accomplish the needed amount of HSM or backup/restore data movement, within a desired (and continually shrinking) time window Therefore, in order to address these issues and make feasible automated tiered storage at large scale, SONAS provides a specific set of technology exploitations to solve these issues, significantly reduce this overly long scan time, to perform efficient data movement as well as HSM to external storage. SONAS does this by providing optional (yet highly desirable) exploitation and integration with IBM Tivoli Storage Manager. SONAS Software has specific high performance integration with Tivoli Storage Manager to provide accelerated backup/restore and accelerated, more functional HSM to external storage.
34
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The architecture of the SONAS HSM to external storage is shown in Figure 1-31.
In this SONAS + Tivoli Storage Manager HSM scenario, a stub file is left on disk. allowing the appearance of the file to be active in the file system. Many operations, such as list files, are satisfied by the stub file without any need for recall. You have flexible control over the HSM implementation, such as specifying the size of the stub file, the minimum size of file in order to be eligible for migration, and so on. If the file is requested to be accessed but is resident only on external storage, the file is transparently auto-recalled from the external storage through the Tivoli Storage Manager server. Data movement to and from external storage is done in parallel through as many multiple SONAS interface nodes as desired, maximizing throughput through parallelism. Data can be pre-migrated, re-staged and de-staged, according to policy. In this manner, SONAS provides the ability to support the storing of petabytes of data in the online file system, yet staging only the desired portions of the file system on the actual SONAS disk. The external storage can be any Tivoli Storage Manager-supported storage, including external disk, tape, virtual tape libraries, or data de-duplication devices.
35
Parallel data streams Any TSM-supported devices including: ProtectTier de-dup Virtual Tape Library Tape IBM Scale Out NAS
Figure 1-32 Backup and restore acceleration using Tivoli Storage Manager
Now, let us examine how the SONAS and Tivoli Storage Manager exploitation works in a little more detail.
Scan engine identifies files to be restaged (up or down), or to be backed IBM Scale Outup NAS
Global Namespace
Policy Engine
Inte rf ac e node
Inter fa ce node
Storage Pool 2
36
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Rather that using the walk the directory tree method, the SONAS scan engine uses multiple SONAS interface nodes to parallel scan the file system and identify the list of changed files. SONAS then passes the list of changed files directly to the Tivoli Storage Manager server. In this way, we use the SONAS scan engine to avoid the need to walk the directory tree, and to avoid the associated traditional time-consuming small block directory IOs. The results of the scan are divided up among multiple interface nodes. These multiple interface nodes then work in parallel to with the Tivoli Storage Manager servers to initiate the HSM or backup/restore data movement, creating parallel data streams. The Tivoli Storage Manager software implements a virtual node function, which allows the multiple SONAS interface nodes to stream the data in parallel to a Tivoli Storage Manager server, as shown in Figure 1-34.
High performance parallel data movement to Tivoli Storage Manager server IBM Scale Out NAS
Tiv oli Stor age M ana ger
Global Namespace
Policy Engine
Inte rf ac e node
Inter fa ce node
3. Parallel data movement using multiple interface nodes to TSM server - backup to Tape, Virtual Tape Lib, or De-duplication
Storage Pool 2
Figure 1-34 Parallel data streams between SONAS and Tivoli Storage Manager
37
In this way, the SONAS Software and Tivoli Storage Manager work together to exploit the SONAS scale out architecture to perform these functions at petabyte levels of scalability and performance. As higher data rates are required, more interface nodes can be allocated to scale out the performance in a linear fashion, as shown in Figure 1-35.
High performance data movement to external storage using Tivoli Storage Manager server
/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web
/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg
Policy Engine
Inte rfa c e node
Interface nodes
Inter fa ce node
Scale Out
Stora ge nodes
Stora ge nodes
Storage Pool 1
Storage Pool 2
Storage Pool 3
Figure 1-35 High performance parallel data movement at scale, from SONAS to external storage
SONAS scale out architecture combined with Tivoli Storage Manager can be applied to maintain desired time windows for automated tiered storage, HSM, and backup/restore, even as file systems grow into hundreds of terabytes to petabytes. Integration: SONAS only requires external Tivoli Storage Manager servers to exploit: Accelerated HSM to external storage pools Accelerated backup/restores and HSM that exploit the SONAS Software scan engine Accelerated external data movement that exploits multiple parallel interface nodes to raise the backup/restore and HSM data rates All internal data movement within a SONAS (between internal SONAS logical storage pools) or between SONAS machines (SONAS async replication) is done by the SONAS Software itself, and does not require any involvement with an external Tivoli Storage Manager servers. Of course, SONAS also supports conventional external software that performs backup/restore and HSM, through normal walk the directory tree and normal copying of files. You can find more information about SONAS and Tivoli Storage Manager integration in SONAS and Tivoli Storage Manager integration on page 119 and Backup and restore of file data on page 185.
38
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg
Note: all three files, no change to logical directory IBM Scale Out NAS 5. Perform results of scan
Policy Engine
Inte rfa c e node
Interface nodes
Inter fa ce node
Scale Out
Stora ge nodes
Stora ge nodes
Storage Pool 1
Storage Pool 2
Storage Pool 3
During all of these physical data movement and management operations, the user logical file path and appearance remains untouched. The user does not have any idea that this large scale, high performance physical data management is being automatically performed on their behalf.
39
40
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 2.
Hardware architecture
In this chapter, we discuss the basic hardware structure of the SONAS appliance product, The configuration consists of a collection of interface nodes that provide file services to external application machines running standard file access protocols such as NFS or CIFS, a collection of storage nodes that provide a gateway to the storage, and at least one management node that provides a management interface to the configuration. In addition to the nodes, there are switches and storage pods. We describe the hardware components of the SONAS appliance, providing the basis for the configuration, sizing, and performance considerations discussed throughout the book.
41
2.1 Nodes
The SONAS system consists of three types of server nodes. A set of interface nodes that provide connectivity to your Internet Protocol (IP) network for file services to external application machines running standard file access protocols such as NFS or CIFS A management node that provides a management interface to the SONAS configuration Storage nodes that provide a gateway to the SONAS storage The management node, the interface nodes, and the storage nodes all run the SONAS Software product in a Linux operating system. Product software updates to the management node are distributed and installed on each of the interface nodes and storage nodes in the system. The interface nodes, management nodes, and storage nodes are connected through a scalable, redundant InfiniBand fabric allowing data to be transferred between the interface nodes providing access to the application and the storage nodes with direct attachments to the storage. InfiniBand was chosen for its low overhead and high speed, 20 Gbits/sec for each port on the switches. The basic SONAS hardware structure is shown in Figure 2-1.
42
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
43
One of the bottom two adapter slots, which can optionally contain zero or one Dual-port 10 Gb Converged Network Adapter (CNA) (FC 1101) The bottom two adapter slots, which can have zero, one, or two optional Ethernet cards, but only one of each kind of adapter, with a maximum total number of six additional ports Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Two redundant hot-swappable power supplies Six redundant hot-swappable cooling fans
44
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
45
A single management node is required. The SONAS system will continue to operate without a management node, but configuration changes can only be performed from an active management node. Nodes: SONAS administrators will only interact with the storage and interface nodes directly for the purpose of debug under the guidance of IBM service. You will have no need to access the underlying SONAS technology components for SONAS management functions, and will have no need to directly access the interface or storage nodes. The management node contains two hot-swappable 300 GB 2.5-inch 10K RPM SAS hard disk drives with mirroring between the two HDDs for high-availability. The hard disk drives contain the SONAS System Software product, containing the operating system and all other software needed for an operational SONAS system. The third hot-swappable 300 GB 2.5-inch 10K RPM SAS hard disk drive stores the logging and trace information for the entire SONAS system. Two of the PCIe x8 adapter slots are already populated with two single-port 4X Double Data Rate (DDR) InfiniBand Host Channel Adapters (HCA). The two HCAs attach to two independent InfiniBand switches in the SONAS system and interconnect the management node to the other components of the SONAS system. A management node contains the following components: Two Intel Nehalem EP quad-core processors 32 GB of Double Data Rate (DDR3) memory standard Four onboard 10/100/1000 Ethernet ports: Two of which are used to connect to your Internet Protocol (IP) network for health monitoring and configuration Two of which connect to the internal private management network within the SONAS system for health monitoring and configuration Two 300 GB 2.5-inch Small Form Factor (SFF) 10K RPM SAS Slim-HS hard disk drives with mirroring between the two HDDs (RAID 1) for high availability One non-mirrored 300 GB 2.5-inch SFF 10K RPM SAS Slim-HS hard disk drive for centralized collection of log files and trace data Four PCIe Gen 2.0 x8 adapter slots: Two of which each contain a single-port 4X Double Data Rate (DDR) InfiniBand Host Channel Adapter (HCA) for use within the system Two of which are available for your use to add more adapters for host IP interface connectivity Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Two redundant hot-swappable power supplies Six redundant hot-swappable cooling fans The management node comes with all of the cables that are required to connect it to the switches withinthe base rack. The management node is assumed to be in the SONAS base rack with the two InfiniBand switches.
46
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2.2 Switches
The SONAS system contains internal InfiniBand, internal Ethernet switches, and external customer supplied external Ethernet switches.
47
The 96-port InfiniBand switch comes standard with the following components: Two Voltaire sFB-2004 Switch Fabric boards One Voltaire sLB-2024 24-port 4X DDR InfiniBand Line Board One Voltaire sMB-HM Hi-memory Management board containing an embedded InfiniBand fabric manager. Two sPSU power supplies All fan assemblies The 96-port switch comes standard with one 24-port 4X DDR InfiniBand line board providing 24-port 4X DDR (20 Gbps) IB ports. Up to three additional sLB-2024 24-port 4X DDR InfiniBand line boards can be added for a total of 96-ports. The 96-port InfiniBand switch comes standard with two sLB-2004 Switch Fabric boards. Up to two additional sLB-2004 Switch Fabric boards can be added to provide additional backplane bandwidth. The 96-port InfiniBand switch comes standard with two power supplies. Up to two additional sPSU power supplies can be added for redundancy. The two standard power supplies are capable of powering a fully configured 96-port InfiniBand switch with the following components: Four sFB-2004 Switch Fabric boards Four sLB-2024 24-port 4X DDR line boards Two sMB-HM Hi-memory Management boards You can upgrade the 96 port switch non-disruptively. InfiniBand backplane bandwidth: InfiniBand switches = 20 Gbit/sec per port (2 GBytes/sec per port) InfiniBand 36-port switch backplane = 1.44 Tbits/sec (144 GBytes/sec total) InfiniBand 96-port switch backplane = 3.84 Tbits/sec (384 GBytes/sec total) The InfiniBand switches have sufficient bandwidth capability to handle a fully configured SONAS solution.
48
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Internal IP addresses
During installation you can choose one of the IP address ranges listed next. The range that you select must not conflict with the IP addresses used for the customer Ethernet connections to the management node(s) or interface nodes. These are the available IP address ranges: 172.31.*.* 192.168.*.* 10.254.*.*
49
Storage StorageController Controller 2851-D R1 2851-DR1 Includes I ncludes 1 1 drawer draw er of of60 60disks disks Storage Pod
Storage StorageController Controller 2851-D R1 2851-DR1 Includes I ncludes 1 1 drawer draw er of of60 60disks disks Disk nit Disk Storage Stor ageExpansion ExpansionU Unit 2851-DE1 2851-D E1 Includes Includes 1 1drawer drawer of of 60 60 disks disks Sto rage Pod
or
Storage Stor ageController C ontroller 2851-DR1 2851-DR1 Includes Includes 1 1drawer drawer of of60 60disks disks
Stor age Controller Storage Controller 2851-DR1 2851-DR 1 Includes Includes 1 1drawer drawer of of 60 60 disks disks
2a
Storage Pod
2b
Figure 2-2 Adding additional storage controllers and expansion units to storage pod
50
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Storage Pod
Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks dis ks
Storage Pod
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks St orage Control ler S torage Cont roller 2851-DR1 2851-DR1 I Includes ncludes 1 1 drawer drawer of of60 60 disks disk s
Dis kS torage E xpansion Unit Disk Storage Expansion Unit 2851-DE1 2851-DE1 Inc ludes 11 drawer Includes drawerof of60 60disks disks
2a
2b
Storage roller St orage Cont Controller 2851-DR1 2851-DR1 Includes I ncludes11drawer drawerof of60 60disks dis ks Dis k Storage on Uni t Disk St orage Expansi Ex pansion Unit 2851-DE1 2851-DE1 Includes Includes 1 1 drawer drawerof of60 60disks disks
Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl udes 11drawer Inc ludes drawerof of 60 60 disks disk s
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 I Includes nc ludes 1 1 drawer drawerof of60 60 disks disks Disk orage Expans ion Dis kSt S torage Expansi on Unit Unit 2851-DE1 2851-DE1 Includes I ncludes11drawer drawerof of60 60 dis ks di sks
Storage roller St orage Cont Controller 2851-DR1 2851-DR1 Includes I ncludes11drawer drawerof of60 60disks dis ks Disk ion Uni t Disk Storage St orage Expans Ex pansion Unit 2851-DE1 2851-DE1 Includes Includes11drawer drawerof of60 60 disks dis ks
Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl udes 11drawer Includes drawerof of 60 60disks di s ks Di skkStorage Dis St orage Expansion E xpansionUnit Unit 2851-DE 1 2851-DE1 Inc ludes ks Incl udes1 1 drawer drawer of of60 60 dis disks
Then completely fill the Storage Pod by adding final 2851-DE1 drawer of 60 drives
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks Dis k Storage t Disk St orage Expansion Ex pansionUni Unit 2851-DE1 2851-DE1 Inc ludes 1 Includes 1 drawer drawerof of60 60disks disks
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Inc ludes 1 Includes 1 drawer drawerof of60 60 S ARA AS disk s SA RAor orS SAS di sks Disk DiskStorage S torage Expansion E xpansion Unit Unit 2851-DE 1 2851-DE1 Includes s Inc ludes11drawer drawerof of60 60 disk di sks S torage Pod
Storage P o d
Figure 2-3 Adding additional storage controllers and expansion units to storage pod.
Pods: The storage within the IBM SONAS is arranged in storage pods, where each storage pod contains: Two storage nodes One or two high density storage controllers Zero, one or two high density disk Storage expansion units
51
All 60 disk drives in the storage controller must be the same type and capacity, that is, a drawer of 60 drives must be all SAS or all Nearline SAS, and all 60 drives must be the same capacity size. You cannot mix drive types or sizes within an enclosure. Controller and attached expansion driver can contain various disk types. You can order one high-density disk-storage expansion unit to attach to each storage controller. The expansion unit also contains a drawer of 60 disk drives; these 60 drives must be the same size and type. The size/type used in the expansion unit drawer, can be other than the size / type used in the storage controller drawer. Each SONAS storage controller supports up to four Fibre Channel host connections, two per Controller. Each connection is auto-sensing and supports 2 Gb/s, 4 Gb/s or 8 Gb/s. Each RAID controller contains: Cache, 4 GB Two 8 Gbps Fibre Channel host ports One drive-side SAS expansion port The storage controller is configured by default to work only with RAID 5 or RAID 6 arrays according to the hard disk drive type. Currently you cannot change the predefined RAID levels. The storage controller automatic drive failure recovery procedures ensure that absolute data integrity is maintained while operating in degraded mode. Both full and partial (fractional) rebuilds are supported in the storage controller. Rebuilds are done at the RAID level. Partial rebuilds will reduce the time to return the RAID level to full redundancy. The timer will begin when a disk in the RAID level is declared missing. If the disk reappears prior to the expiration of the timer, a fractional rebuild will be done. Otherwise, the disk will be declared failed, replaced by a spare and a full rebuild will begin to return the Storage Pool to full redundancy. The default partial rebuild timer (Disk Timeout) setting is 10 minutes. The controller supports limit between 0 and 240 minutes, but currently only supported value is default configuration. Under heavy write workloads, it is possible that the number of stripes that need to be rebuilt will exceed the systems internal limits prior to the timer expiration. When this happens, a full rebuild will be started automatically instead of waiting for the partial rebuild timeout (see Table 2-1 on page 52).
Table 2-1 Configured and supported RAID arrays Disk drive type RAID level Number of RAID arrays per controller or expansion unit 6 6 6 6 6 RAID configuration Total Drives Raw usable capacity
1TB 7.2K RPM SATA 2TB 7.2K RPM SATA 2 TB 7.2K RPM Nearline SAS 450GB 15k RPM SAS 600GB 15k RPM SAS
60 60 60 60 60
46 540 265 619 456 93 956 704 567 296 93 956 704 567 296 20 564 303 413 248 27 419 071 217 664
52
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
53
Two of the PCIe adapter slots are available for customer use to add more adapters for host IP interface connectivity. Six additional Ethernet connections to the customer TCP/IP data network are possible for each interface node. A 4-port Ethernet adapter card feature can provide four 1 GbE connections. A 2-port Ethernet adapter card feature can provide two 10 GbE connections. You can have zero or one of each feature in a single interface node.
2 2 2 2
2 4 6 8
54
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
1. PCI slot 1 (SONAS single-port 4X DDR InfiniBand HCA) 2. PCI slot 2 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP data path connectors) 3. PCI slot 3 (SONAS single-port 4X DDR InfiniBand HCA) 4. PCI slot 4 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP data path connectors) 5. Ethernet 2 (SONAS GbE management network connector) 6. Ethernet 1 (SONAS GbE management network connector) 7. Ethernet 4 (TCP/IP data path connector) 8. Ethernet 3 (TCP/IP data path connector) 9. Integrated Management Module (IMM) Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Failover: If you are using only Ethernet ports 3 (point 8) and 4 (point 7) for external network connections to an interface node, then that daughter card is a single point of failure for that one node. In the event of a failure of an entire network card, the interface node with this network card will be taken offline, and the workload running on that interface node will be failed over, by SONAS Software, to another interface node. So failure of a single interface node will not be a significant concern, although very small systems can see a performance impact. As an example, a system with the minimum two interface nodes can see workload on the remaining interface node double if one interface node fails.
55
56
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
57
58
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
59
60
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2.5.1 Rack types: How to choose the correct rack for your solution
A SONAS system can consist of one or more racks, into which the components of the system are installed. A 42U enterprise class rack is available. Note that installation of SONAS components in customer-supplied racks is not permitted. The rack can have two or four power distribution units (PDU) mounted inside of it. The PDUs do not consume any of the racks 42U of space. The first pair of PDUs is mounted in the lower left and lower right sidewalls. The second pair of PDUs is mounted in the upper left and upper right sidewalls. The rack supports either Base PDUs or Intelligent PDUs (iPDUs). The iPDUs can be used with the Active Energy Manager component of IBM Systems Director to monitor the energy consumption of the components in the rack. When installed in the rack, the iPDUs are designed to collect energy usage information about the components in the rack and report the information to the IBM Active Energy Manager over an attached customer-provided local area network (LAN). Using iPDUs and IBM Systems Director Active Energy manager, you can gain a more complete view of energy used with the datacenter. There are three variations of the SONAS rack: Base rack Interface expansion rack Storage expansion rack
Chapter 2. Hardware architecture
61
Base rack
The Scale Out Network Attached Storage (SONAS) system always contains a base rack that contains the management node, InfiniBand switches, a minimum two interface nodes, and a keyboard, video, and mouse (KVM) unit. The capacity of the SONAS system that you order affects the number of racks in your system and the configuration of the base rack. Figure 2-13 shows the three basic SONAS racks.
There are three available options of the SONAS base rack: 2851-RXA feature code 9003, 9004 and 9005. Your first base rack will depend on how you are going to scale out the SONAS system in the future.
62
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have at least one management node installed. The rack must have a minimum of two interface nodes, with the rest of the interface node bays being expandable options for a total of 16 interface nodes.
63
Rack: The rack must have two 50-port 10/100/1000 Ethernet switches for an internal IP management network. The rack must have a minimum of one interface node, with the rest of the interface node bays being expandable options for a total of 20 interface nodes.
64
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
65
Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have two Storage Nodes. The rack must have a minimum of one Storage Controller. It can be expanded as follows: Add a disk storage expansion unit #1.2 to the first storage controller #1.1. Add a 2nd storage controller #2.1 to the first storage pod. Add a disk storage expansion unit #2.2 to the second storage controller in the 1st storage pod, if the first storage controller #1.1 also has a disk storage expansion unit. Add the start of a second storage pod, which include two storage nodes (#3 and #4) and another storage controller #3.1 attached to these storage nodes. Add a disk storage expansion unit #3.2 to storage controller #3.1. Add a second Storage controller #4.1 to the second storage pod. Add a disk Storage expansion unit #4.2 to storage controller #4.1, if storage controller #3.1 also has a disk storage expansion unit. Power limitation affects number of SAS drives per rack. At the current time, SONAS does not yet support a 60A service option, which can limit the total amount of HW that can be installed in the expansion rack, 2851-RXB. Because of the power consumption of a Storage Controller (MTM 2851-DR1) fully populated with sixty (60)15K RPM SAS hard disk drives and the power consumption of a disk storage expansion unit (MTM 2851-DE1) fully populated with sixty (60) 15K RPM SAS hard disk drives, a Storage Expansion Rack (2851-RXB) is limited to a combined total of six Storage Controllers and Disk Storage Expansion Unit when they are fully populated with sixty (60) 15K RPM SAS hard disk drives. Tip: This is a known requirement to provide a 60 amp power option. When that option is available in the future, this will provide enough electrical power that you will be able to fully populate a SONAS Storage Expansion rack with all SAS drives.
66
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
SAS disk drives require more power than SATA drives, at the current time, each storage expansion rack can hold up to 360 SAS drives or up to 480 Nearline SAS or SATA drives (see Figure 8-4 on page 253)1. Nearline SAS or SATA drives are always configured within a storage controller or a storage expansion unit at the RAID 6 array. There are eight data drives and two parity drives per an array, this means that within a storage controller or a storage expansions unit there are 48 data drives. SAS drives are always configured within the storage controller or storage expansion unit as a RAID 5 array. There are eight data drives, one parity drive and one spare drive per an array, which means that within a storage controller or a storage expansion unit there are 48 data drives. Table 2-3 shows summary of possible configurations. This is preconfigured and cannot be changed.
Table 2-3 Drive types configuration summary Drive type SATA Nearline SAS SAS SAS Drive capacity 1 or 2 TB 2 TB 450 GB 600 GB RAID array RAID 6 RAID 6 RAID 5 RAID 5 Total drives 60 60 60 60 Data drives 48 48 48 48 Parity drives 12 12 6 6 Spare drives 0 0 6 6
Generally speaking, SAS drives have smaller seek time, larger data transfer rate, and higher Mean Time Between Failures (MTBF) than cheaper but higher capacity Nearline SAS or SATA drives. The SONAS internal storage will perform disk scrubbing, as well as isolation of failed disks for diagnosis and attempted repair. The storage can conduct low-level formatting of drives, power-cycle individual drives if they become unresponsive, correct data using checksums on the fly, rewrite corrected data back to the disk, and use smart diagnostics on Nearline SAS or SATA disks to determine if the drives need to be replaced. SONAS supports drives intermix; it is possible to have within the same storage pod, a storage enclosure with high performance SAS disks and another storage enclosure with high capacity Nearline SAS or SATA disks. Non-enterprise class application data or rarely used data can be automatically migrated within SONAS from faster, but smaller and more expensive SAS disks to slower, but larger and cheaper Nearline SAS or SATA disks.
When a 60 amp power option is available for the SONAS Storage Expansion rack, this restriction will be lifted.
67
2.6.1 Differences between SONAS with XIV and standard SONAS system
The SONAS with XIV will be similar to a standard SONAS system, with the following exceptions: It will be limited to the SONAS Base Rack (2851-RXA) Configuration #3 (FC 9005), with two 36-ports InfiniBand switches (2851-I36), a management node (2851-SM1), Keyboard-video-mouse (KVM) unit and two 10/100/1000 Level 2 50-port Ethernet switches It can have between two and six SONAS interface nodes (2851-SI1) within the base rack. It will have one pair of storage nodes (2851-SS1) within the base rack The SONAS Software Management GUI and Health Center do not provide monitoring or management functionality with regards to the XIV, or the SAN switch(es) by which it is connected. The requirement for built-in SONAS storage is removed as part of this specialized configuration. It is not supported for this specialized configuration of SONAS to support a mixed storage environment; there cannot be a combination of internal SONAS storage and external XIV storage.
68
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
69
2.6.3 SONAS base rack configuration when used with XIV storage
Figure 2-17 shows the maximum configuration of the SONAS Base Rack (2851-RXA) when ordered with specify code #9006 (indicating configuration #3) and the SONAS i-RPQ number #8S1101. Note that to mitigate tipping concerns, the SONAS interface nodes will be moved to the bottom of the rack. Also notice that components that are not part of the SONAS appliance (including SAN switches) cannot be placed in the empty slots.
Figure 2-17 Maximum configuration for SONAS base rack for attaching XIV storage
70
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
This specialized SONAS configuration is available for original order plant manufacture only. It is not available as a field MES. The SAN switches must be mounted in a customer supplied rack and cannot be mounted in the SONAS rack. The open slots in the SONAS base rack will be covered with filler panels for aesthetic and airflow reasons. Components that are not part of the SONAS (including SAN switches) cannot be placed in the empty slots. One or two XIV storage systems can be attached. Any model of XIV storage can be used. All available XIV configurations starting from 6 modules are supported. The firmware code level on the XIV must be version 10.2 or higher. Larger SAN switches, such as the SAN40B, can also be used. Any switch on the XIV supported switch list can be used, provided there are sufficient open, active ports with SFPs to support the required connectivity. Connectivity for external block device users sharing the XIVs is beyond the scope of this specialized offering, and must be planned/set up by the end user so as not to interfere with the connectivity requirements for the IBM SONAS and XIV. The following SONAS file system settings have been tested and are intended to be used: 256K Block Size Scatter Block Allocation One failure group if only one XIV is present Two failure groups (supports metadata replication requirement), if two XIVs are present. Metadata replication: if two XIVs are present, the SONAS file system metadata will be replicated across XIV systems. It is supported to add more storage to the XIV system, provision LUNs on that additional storage, have the LUNs recognized by the SONAS system and available to be added to an existing file system or new file system in the SONAS system. It is supported to share the XIV systems between SONAS and other block storage applications, provided that the LUNs allocated to SONAS are hard allocated (no thin provisioning). While the current specialized offering only supports one storage pod with one pair of storage nodes, and only supports one or two XIV systems, this is not an architectural limitation. It is only a testing and support limitation. IBM requires the use of IBM Services to install this specialized configuration. This assures that the proper settings and configuration are done on both the XIV and the SONAS appliance for this offering. The intent of this specialized SONAS configuration offering is to allow existing or aspiring users of the IBM XIV storage system to be able to attach XIV to an IBM SONAS appliance.
71
72
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 3.
Software architecture
This chapter provides a description of the software architecture, operational characteristics, and components of the IBM Scale Out Network Attached Storage (IBM SONAS) software. We review the design and concepts of the SONAS Software licensed program product that operates the SONAS parallel clustered architecture. We present an overview of the SONAS Software functionality stack, the file access protocols, the SONAS Cluster Manager, the parallel file system, central policy engine, scan engine, automatic tiered storage, workload allocation, availability, administration, snapshots, asynchronous replication, and system management services. This is an excellent chapter in which to gain an overview of all of these SONAS Software concepts, and provide the base knowledge for further detailed discussions of these topics in subsequent chapters.
73
CIFS
NFS
FTP
HTTPS
future
Enterprise Linux
IBM Servers
This chapter describes the following SONAS components: SONAS data access layer: the CIFS, NFS, FTP, HTTPS file protocols SONAS Cluster Manager for workload allocation and high availability SONAS authentication and authorization SONAS data repository layer: the parallel clustered file system SONAS data management services: Automated data placement and management: Information Lifecycle Management (ILM) and Hierarchical Storage Management (HSM)
74
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Back up and restore data protection and HSM, using integration with Tivoli Storage Manager as discussed in SONAS and Tivoli Storage Manager integration on page 119 Snapshots for local data resiliency Remote async replication for remote recovery SONAS system management services: GUI, Health Center, CLI, and management interfaces Monitoring agents, security, and access control lists We review the functions of each of the SONAS Software components as shown in Figure 3-1, starting at the top and working our way down.
CIFS
NFS
FTP
HTTPS
future
Scan Engine
The network file access protocols that are supported by SONAS today are CIFS, NFS, FTP, and HTTPS. These file access protocols provides the mapping of the client file requests onto the SONAS parallel file system. The file requests are translated from the network file access protocol to the SONAS native file system protocol. The SONAS Cluster Manager is used to provides cross-node and cross-protocol locking services for the file serving functions in CIFS, NFS, FTP, and HTTPS. The CIFS file serving function maps of CIFS semantics and security onto the POSIX-based parallel file system with native NFSv4 Access Control Lists.
75
Following this section, we then discuss the role the SONAS Cluster Manager plays in concurrent access to a file from multiple platforms (concurrently access a file from both CIFS and NFS, for example). For additional information about creating exports for file sharing protocols you can refer to Creating and managing exports on page 378.
See 3.3.3, Principles of interface node failover and failback on page 83 for details.
76
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Recall to disk is transparent to the application, so no additional operation beside the file open is needed. Directory browsing using the Windows Explorer supports file property display without the need to recall off-line or migrated files. Support of SONAS Snapshot integrated into the Windows Explorer VSS (Volume Shadow Services) interface, allowing users with proper authority to recall files from SONAS Snapshots. This file version history support is for versions created by SONAS Snapshots. The standard CIFS timestamps are made available: Created Time stamp: The time when the file is created in the current directory. When the file is copied to a new directory, a new value will be set. Modified Time stamp: The time when the file is last modified. When the file is copied to elsewhere, the same value will be carried over to the new directory. Accessed Time stamp: The time when the file is last accessed. This value is set by the application program that sets or revises the value (this is application dependent; unfortunately various applications do not revise this value)
See 3.3.3, Principles of interface node failover and failback on page 83 for details.
Chapter 3. Software architecture
77
78
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
CIFS
NFS
FTP
HTTPS
future
Scan Engine
The SONAS Cluster Manager has the following responsibilities: 1. Coordinates the mapping of the various file sharing protocols onto the SONAS parallel file system. The CIFS file serving function maps of CIFS semantics and security onto the POSIX-based parallel file system and NFSv4 Access Control Lists 2. Provides the clustered implementation and management of the interface nodes, including tracking and distributing record updates across the interface nodes in the cluster. 3. Controls the interface nodes in the cluster. SONAS Cluster Manager controls the public IP addresses used to publish the NAS services, and moves them as necessary between nodes. By monitoring scripts, SONAS Cluster Manager monitors and determines the health state of each individual interface node. If an interface node has problems, such a hardware failures, or software failures such as broken services, network links, or the node becomes unhealthy. In this case, SONAS Cluster Manager will dynamically migrate affected public IP addresses and in-flight workloads to healthy interface nodes, and uses tickle-ack technology with the affected user clients, so that they reestablish connection to their new interface node. 4. Provides the interface to manage cluster IP addresses, add and removes nodes, ban and disable nodes.
79
5. The SONAS Cluster Manager coordinates advanced functions such as the byte-range locking available in the SONAS parallel file system, it manages the interface nodes and coordinates the multiple file sharing protocols to work in conjunction with the SONAS parallel file system base technology so as to allow concurrent access, parallel read and write access, for multiple protocols and multiple platforms, across multiple SONAS interface nodes. It is the key to guarantee full data integrity to all files, anywhere within the file system. For information about how to administer the SONAS cluster manager, see Cluster management on page 351.
Global Namespace
Management Management Node Node
...
....
IP Mgmt. Network
Tape
Storage Pod
Storage Storage Node Node Storage Storage Node Node
Storage Pod
Storage Storage Node Node Storage Storage Node Node
...
Storage Expansion
Storage Expansion
Storage Expansion
Storage Expansion
There are three types of nodes in a SONAS. The nodes are divided and configured according to one of three roles. All nodes are in a global cluster, and a copy of SONAS Software runs on each of the nodes. 80
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
A node performs only one of the three roles: Interface node: Provides the connections to the customer IP network for file serving. These nodes establish and maintain the connections to CIFS, NFS, FTP, or HTTP users, and serve the file requests. All four of these protocols can and do co-exist on the same interface node. Each interface node can talk to any of the storage nodes. Storage node: Acts as a storage servers, and reads and writes data to and from the actual storage controllers and disks. Each storage node can talk to any of the interface nodes. A storage node serves file and data requests from any requesting interface node. SONAS Software writes data in a wide stripe across multiple disk drives in a logical storage pool. If the logical storage pool is configured to span multiple storage nodes and storage pods, the data striping will also span storage nodes and storage pods. Management node: Monitors and manages SONAS global cluster of nodes, and provides Command Line Interface management and GUI interface for administration. Command Line Interface commands come into the SONAS through the management node. Notice that SONAS is a two-tier architecture, because there are multiple clustered interface nodes in the interface tier and multiple clustered storage nodes in the storage tier. This is an important aspect of the design, as it allows independent scalability of interface nodes for user file serving throughput from storage pods and storage nodes for storage capacity and performance. Each SONAS node is an IBM System x commercial enterprise class 2U server, and each node runs a copy of IBM SONAS Software licensed program product (5639-SN1). SONAS Software manages the global cluster of nodes, provides clustered auto-failover, and provides the following functions: The IBM SONAS Software manages and coordinates each of these nodes running in a peer-peer global cluster, sharing workload equitably, striping data, running the central policy engine, performing automated tiered storage. The cluster of SONAS nodes is an all-active clustered design, based upon proven technology derived from the IBM General Parallel File System (GPFS). All interface nodes are active and serving file requests from the network, and passing them to the appropriate storage nodes. Any interface node can talk to any storage node. All storage nodes are active and serving file and data requests from any and all interface nodes. Any storage node can respond to a request from any interface node. SONAS Software will stripe data across disks, storage RAID controllers, and storage pods. SONAS Software also coordinates automatic node failover and failback if necessary. From a maintenance or failover and failback standpoint, any node can be dynamically deleted or inserted into the global cluster. Upgrades or maintenance can be performed by taking a node out of the cluster, upgrading it if necessary, and re-inserting it into the cluster. This is a normal mode of operation for SONAS, and this is the manner in which rolling upgrades of software and firmware are performed. SONAS Software is designed with the understanding that over time, various generations and speeds of System x servers will be used in the global SONAS cluster. SONAS Software understands this and is able to distribute workload equitably, among various speed interface nodes and storage nodes within the cluster.
81
In order to cluster SONAS interface nodes so that they can serve the same data, the interface nodes must coordinate their locking and recovery. This coordination is done through the SONAS Cluster Manager. It is the SONAS Cluster Managers role to manage all aspects of the SONAS interface nodes in the cluster. Clusters usually cannot outperform a standalone server to a single client, due to cluster overhead. At the same time, clusters can outperform standalone servers in aggregate throughput to many clients, and clusters can provide superior high availability. SONAS is a hybrid design that provide the best of both of these approaches. From a incoming workload allocation standpoint, SONAS uses the Domain Name Server (DNS) to perform round-robin IP address balancing, to spread workload equitably on an IP address basis across the interface nodes, as shown in Figure 3-5.
SONAS.virtual.com
SONAS.virtual.com
10.0.0.10 10.0.0.11
10.0.0.12 10.0.0.13
10.0.0.14 10.0.0.15
SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15
SONAS allocates a single user network client to a single interface node, to minimize cluster overhead. SONAS Software does not rotate a single clients workload across interface nodes. That is not only unsupported by DNS or CIFS, but will also decrease performance, because caching and read-ahead is done in the SONAS interface node. It is for this reason that any one individual client is going to be assigned, for the duration of their session, to one interface node at the time they authenticate and access the SONAS. At the same time, workload from multiple users that can be numbering into the thousands or more, is equitably spread across as many SONAS interface nodes as are available. If more user network capacity is required, you simply add more interface nodes. SONAS scale out architecture thus provides linear scalability as the numbers of users grow. Agnostically to the application or the interface nodes, SONAS Software will always stripe data across disks, storage RAID controllers, and storage pods, thus providing wide data striping performance and parallelism to any file serving requests, by any interface node. This is shown in Figure 3-6.
82
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Single connection
write file
write file
Figure 3-6 SONAS interface node workload allocation - parallelism at storage level
SONAS provides a single high performance NFS, CIFS, FTP, or HTTPS connection for any one individual network clients. In aggregate, multiple users are IP-balanced equitably across all the interface nodes, thus providing scale out capability; the more interface nodes, the more user capacity that is available. SONAS was designed to make the connection a standard CIFS, NFS, FTP, or HTTP connection, in order to allow attachability by as wide a range of standard clients as possible, and to avoid requiring the installation of any client side code.
83
This state and session information and metadata for each user and connection is stored in memory in each node in a high performance clustered design, along with appropriate shared locking and any byte-range locking requests, as well as other information needed to maintain cross-platform coherency between CIFS, NFS, FTP, HTTP users Notification technologies called tickle ack are used to tickle the application and cause it to reset the network connection.
SONAS.virtual.com
SONAS.virtual.com
Client I
10.0.0.13 10.0.0.10 10.0.0.11
Client II
10.0.0.12 10.0.0.14 10.0.0.15
SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15
At the time of the failover of the node, if the session or application is not actively in a connection transferring data, the failover can usually be transparent to the client. If the client is transferring data, depending on the protocol and application, the application service failover might be transparent to the client, depending on nature of the application, and depending on what is occurring at the time of the failover. In particular, if the client application, in response to the SONAS failover and SONAS notifications, automatically does a retry of the network connection, then it is possible that the user will not see an interruption of service. Examples of software that do this include many NFS-based applications, as well as Windows applications that do retries of the network connection, such as the Windows XCOPY utility. If the application does not do automatic network connection retries, or the protocol in question is stateful (that is, CIFS), then a client side reconnection might be necessary to re-establish the session. Unfortunately for most CIFS connections, this will be the likely case. For more information about interface node cluster failover, see Chapter 6, Backup and recovery, availability, and resiliency functions on page 181.
In SONAS, there is the concept of a storage pod as a modular building block of storage, as illustrated in Figure 3-4 on page 80. Each storage pod contains between 60 and 240 disk drives, arranged in groups of 60 drives, and each storage pod contains two active - active storage nodes. The two storage nodes provide resiliency and backup for each other in the storage pod. If a storage node fails, the remaining healthy storage node in the storage pod takes over the load of the failed storage node. An individual storage node is very high in capacity and throughput, to allow good operation in the event of a failed storage node. Furthermore, as we saw in Chapter 1, Introduction to IBM Scale Out Network Attached Storage on page 1, recall that SONAS can be configured to perform storage load balancing by striping data across any of the following components: Disks Storage RAID controllers Storage pods Logical storage pools in SONAS can be defined, and usually are defined, to span disks, storage RAID controllers and storage pods. Furthermore, the data striping means that files are spread in blocksize chunks across these components in order to achieve parallel performance and balanced utilization of the underlying storage hardware. One of the purposes of this dispersion of SONAS data is to mitigate the effect of a failed storage node. Files in SONAS are spread across multiple storage nodes and storage pods, with the intent that only a small portion of any file can be affected by a storage node failure, and only in terms of performance, not of data availability that is maintained. As the SONAS grows larger and scales out to more and more storage nodes, the failure of any one storage node becomes a smaller and smaller percentage of the overall storage node aggregate capacity. The SONAS scale out architecture thus has the effect of reducing the amount of impact of a storage node failure, as the SONAS grows. Just as with interface nodes, storage nodes can be dynamically removed and re-inserted into a cluster. Similar to the interface node methodology, the method of upgrade or repair of a storage node is to take the storage node out of the cluster. The remaining storage node in the storage pod will dynamically assume the workload of the pod. The offline storage node can then be upgraded or repaired, and then re-inserted into the cluster. When this is done, workload will then be automatically rebalanced across the storage nodes in the storage pod. During all of these actions, the file system stays online and available and file access to the users is maintained.
3.3.5 Summary
We have seen that the IBM SONAS provides equitable workload allocation to a global cluster of interface nodes, including high availability through clustered auto-failover. In summary: All SONAS nodes operate in a global cluster. Workload allocation to the interface nodes is done in conjunction with external Domain Name Servers. The global SONAS cluster offers dynamic failover/failback, and if the application supports network connection retries, can provide transparent failover of the interface nodes. Normal upgrade and maintenance for SONAS nodes is by dynamic removal and insertion of nodes into and out of the cluster.
85
We now proceed to discuss in more detail the SONAS Software components that provide the functionality to support these capabilities.
CIFS
NFS
FTP
HTTP
Figure 3-8 All file accesses traverse the SONAS Cluster Manager including concurrent accesses
The SONAS Cluster Manager is logically positioned in the file access path-length, as it is the SONAS Cluster Manager that provides the mapping of the multiple protocols onto the SONAS parallel file system, simultaneously managing the necessary locking to guarantee data integrity across all the interface nodes. Finally, if necessary, the SONAS Cluster Manager provides the failover and failback capabilities if the interface node experiences an unhealthy or failed state. The SONAS Cluster Manager works together with the SONAS parallel file system to provide concurrent file access from multiple platforms in the following way: SONAS Cluster Manager: Provides the mapping and concurrency control across multiple interface nodes, and when multiple various protocols access the same file, and provides locking across users across the interface nodes SONAS parallel file system: Provides the file system concurrent access control at the level of the physical file management, provides ability to manage and perform parallel access, provides the NFSv4 access control list security, and provides the foundational file system data integrity capabilities 86
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
We discuss the parallel file system in more detail a little later. First, let us explain how the IBM SONAS Cluster Manager provides multiple concurrent interface node file serving with data integrity, across the following network protocols at the same time: CIFS (typically these are Windows users) NFS (typically these are UNIX or Linux users) FTP HTTPS SONAS Software Cluster Manager functionality supports multiple exports and shares of the file system, over multiple interface nodes, by providing distributed lock, share, and lease support. The SONAS Cluster Manager is transparent to the NFS, CIFS, FTP, and HTTPS clients; these clients are unaware of, and do not need to know that the SONAS Cluster Manager is servicing and managing these multiple protocols concurrently. When sharing files and directories, SONAS reflects changes made by one authorized user, to all other users that are sharing the same files and directories. As an example, if a SONAS-resident file is renamed, changed, or deleted, this fact will immediately be properly reflected to all SONAS-attached clients on other platforms, including those using other protocols, as shown in Figure 3-9.
delete files
SONAS User1
SONAS User2
01:00
12 9 6 3
01:01
12 9 6 3
Figure 3-9 SONAS concurrent access to a shared directory from multiple users
SONAS Software employs sophisticated distributed cluster management, metadata management, and a scalable token management system to provide data consistency while supporting concurrent file access from thousands of users. All read and write locking types are kept completely coherent between NFS and CIFS clients, globally, across the cluster. SONAS Cluster Manager provides the capability to export data from a collection of nodes using CIFS, NFSv2, NFSv3, FTP, and HTTPS.
87
CIFS
NFS
SONAS file system provides ability for multiple concurrent readers/writers from multiple platforms
Interface nodes
Global Namespace
Policy Engine
Interface nodes
Interface nodes
..
scale out
Storage nodes
..
Storage nodes
Storage nodes
scale out
Physical
Tier 1
Tier 2
Tier 3
Figure 3-10 SONAS provides concurrent access and locking from multiple platforms
SONAS Software has multiple facilities to provide scalability. These include the distributed ability for multiple nodes to act as token managers for a single file system. SONAS Software also provides scalable metadata management by providing for a distributed metadata management architecture, thus allowing all nodes of the cluster to dynamically share in performing file metadata operations while accessing the file system. This distinguishes SONAS from other cluster NAS filer architectures that might have a centralized metadata server handling fixed regions of the file namespace. A centralized metadata server can often become a performance bottleneck for metadata intensive operations and can represent a scalability limitation and single point of failure. SONAS solves this problem by managing metadata at the node which is using the file or in the case of parallel access to the file, at a dynamically selected node which is using the file.
88
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Clustered CIFS provided by Clustered Trivial Data Base (CTDB), which clusters the SONAS CIFS component, and monitors interface node services including start, failover, and failback of public IP address FTP daemon HTTPS daemon In SONAS Cluster Manager, the software used includes the following components: SONAS CIFS component, which provides Windows CIFS file serving, including mapping CIFS semantics, userids, security identifiers, NTFS access control lists, and other required CIFS mapping, to the underlying SONAS parallel file system Clustered Trivial Data Base (CTDB), which in combination with SONAS CIFS component provides a fully clustered CIFS capability Working together, the SONAS Cluster Manager and these components provide true multi-protocol, active/active clustering within a single global namespace, spanning multiple interface nodes, all clustered transparently to applications.
89
CTDB assures data integrity by tracking and assuring that only the owning interface node has the most recent copy of the record and that only the proper owning node has the ability to updated the record. When required, CTDB is specifically architected to provide the high performance, lightweight messaging framework and trivial data bases to quickly cross-notify, cross-share, and properly pass ownership among requesting interface nodes, to update records with integrity and high performance
90
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Here are specific enhancements that IBM has made in the SONAS Cluster Manager for CIFS: Clustering enhancements: Multiple exports and shares of the same file system over multiple nodes including distributed lock, share and lease support Failover capabilities on the server Integration with NFS, FTP, HTTPS daemons in regard of locking, failover and authorization Performance optimization with the SONAS file system (with GPFS) NTFS Access Control List (ACL) support in SONAS CIFS component using the native GPFS NFSv4 ACL support HSM support within SONAS CIFS component to allow destaging of files to tape and user transparent recall. VSS integration of SONAS Snapshots In the following sections, we examine SONAS authentication and authorization.
91
Care must be taken with time synchronization; all of the SONAS nodes must have their time set by a network time protocol (NTP) server, and the same server must synchronize the time for the authentication server such as an Active Directory (AD) server and/or Kerberos KDC server. Note that and Active Directory (AD) domain controller can be used as an NTP time source. To set up a SONAS system, obtain administrative information for the selected authentication server in advance. Examples of the information required are administrative account, password, SSL certificate, Kerberos keytab file. Refer to the Managing authentication server integration chapter in the IBM Scale Out Network Attached Storage Administrators Guide, GA32-0713 for the information required for each authentication protocol. Additional information can be also found in Authentication using AD or LDAP on page 266.
92
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The SONAS authentication of users occurs according to the diagram shown in Figure 3-11.
Clients
w/o Kerberos
4. Response
3. Response
Authentication Server
SONAS
3. Kerberos Ticket
4. Response
Clients
with Kerberos
1. User Auth. Request 2. Granted Kerberos Ticket
Clients without Kerberos send (1) a user authentication request to SONAS that (2) sends the authentication request to the external authentication server. The authentication server then (3) sends a response to SONAS and SONAS, then (4) sends the response back to the client. In the case of Kerberos, the client sends (1) a user authentication request directly to the authentication server that also has a Kerberos Distribution Center (KDC). The authentication server then (2) replies with a Kerberos ticket for the client. The client then (3) sends a request to SONAS with the Kerberos ticket that was granted and SONAS then (4) sends the response back to the client. Kerberos tickets have a lease time before expiring, so a client can access SONAS multiple times without requiring re-authentication with the KDC.
SONAS ID mapping
SONAS Software is designed to support multiple various platforms and protocols all accessing the SONAS concurrently. However, Windows CIFS systems use Security Identifiers (SID) internally to identify users and groups, whereas a UNIX systems uses a 32 bit userid / group id (uid/gid). To make both worlds work together in SONAS and provide full concurrent and consistent access from both platforms, SONAS performs a user mapping between Windows SID and UNIX uid/gid. Because the underlying SONAS data is stored in a POSIX-compliant UNIX and Linux style file system based on IBM GPFS, all Access Control List (ACL) and access information is ultimately controlled using SONAS GPFS uid/gid. that is the standard way of controlling user access in UNIX based environments. Therefore, while accessing SONAS data from UNIX or Linux systems using the NFS protocol, there are no issues because their uid/gid directly maps to the UNIX system uid/gid.
93
However, when Windows clients access SONAS, the SONAS Software provides the mapping between the Windows user Security identifier (SID) and the internal file system UID to identify users. In SONAS, depending on the type of authentication used, various methods are applied to solve this UID to SID mapping requirement. The SONAS user ID mapping flow is shown in Figure 3-12.
User- or Groupname Microsoft Security ID (SID)
Windows AD
SONAS file system
CIFS
User- / Groupname
NFS
UID / GID
Shared Id map db
NFS provides UID / GID at client level only. This means the mapping has to happen on NFS CLIENT level: - Create user with correct IDs manually - Use Microsoft AD Service for Unix (SFU) Interface nodes
SONAS maps Usernames and Groups to Unix User/group IDs consistently across all nodes
To solve the ID mapping issue, SONAS supports multiple authentication server integrations: LDAP and LDAP with MIT Kerberos Samba primary domain controller (PDC) for Microsoft Windows NT version 4 (NT4) Active Directory Server (ADS itself works as Kerberos), and AD with Microsoft Windows Services for UNIX (SFU) Network Information Service (NIS) as an extension to AD/Samba PDC
94
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
95
CIFS
NFS
FTP
HTTPS
future
Scan Engine
Figure 3-13 SONAS Software - parallel file system, policy engine, scan engine
We discuss core SONAS file system concepts, including the high-performance file system itself, the manner in which the policy engine and scan engine provide the foundation for SONAS Information Lifecycle Management (ILM) (discussed in detail in SONAS data management services on page 107), and characteristics of the SONAS file system for configuration, performance, scalability, and storage management. As mentioned, the SONAS file system is based upon IBM General Parallel File System (GPFS) so, if you are familiar with IBM GPFS, then you will be quite familiar with the concepts discussed in this section. The SONAS file system offers more than a traditional file system; it is the core foundation for an end to end NAS file management infrastructure within SONAS. IBM utilizes IBM GPFS technology to provide a proven high performance parallel grid file system architecture, with high reliability and high scalability. In addition to providing file storage capabilities, the SONAS file system also provides storage management and information life cycle management tools, centralized administration and facilities that, in conjunction with the SONAS Cluster Manager, allows for shared high performance access from multiple NAS protocols simultaneously. IBM SONAS was designed to utilize the IBM GPFS long history as a high performance parallel file system, supporting many types of applications ranging from relational databases, to digital media, to high performance analytics, to scalable file serving.
96
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The core GPFS technology is installed today in across many industries including financial, retail and government applications. GPFS has been tested in very demanding large environments for over 15 years, making GPFS a solid foundation for use within the SONAS as the central parallel file system. For more detailed information about configuring SONAS file systems, see File system management on page 354. We now discuss the SONAS file system in greater detail.
The SONAS cluster can contain up to 256 mounted file systems. There is no limit placed upon the number of simultaneously opened files within a single file system.
97
98
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Let us see how SONAS scalability is achieved using the expandability of the SONAS building block approach. Figure 3-14 shows a SONAS interface node, performing a small single read or write on a disk. Because this read or write is small in size, only the resources of one path, one storage node, and one RAID Controller / disk are sufficient to handle the IO operation.
InfiniBand network
Storage Pod
Storage Storage Node Node (NSD (NSD Server) Server) Storage Storage Node Node (NSD (NSD Server) Server)
RAID
Controller
RAID
Controller
Raid disk
Figure 3-14 A small single read or write in the SONAS file system
The power of the SONAS file system, however, is in its ability to read or write files in parallel chunks of the defined blocksize, across multiple disks, controllers, and storage nodes inside a storage pod, as shown in Figure 3-15.
InfiniBand network
Storage Pod Storage Storage Node Node (NSD (NSD Server) Server) Storage Storage Node Node (NSD (NSD Server) Server)
RAID
Controller
RAID
Controller
RAID
Controller
RAID
Controller
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Raid (NSD)
Figure 3-15 A high parallel read or write in the SONAS file system; this can be one file
99
The scalability of the SONAS file system does not stop with a single storage pod, In addition to the very large parallel write capability of a single storage pod as shown in Figure 3-16.
InfiniBand network
Storage Storage Node Node Storage Storage Node Node
Storage Pod
Figure 3-16 SONAS file system parallel read / write capability to one storage pod
If the file is big enough, or if the aggregate workload is big enough, the SONAS file system easily expands to multiple storage pods in parallel as shown in Figure 3-17.
InfiniBand network
Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node
Storage Pod
Storage Pod
Storage Pod
Storage Pod
Figure 3-17 SONAS file system parallel read/ write capability to multiple storage pods
We can see that the SONAS file system provides the capability for extremely high parallel performance. This is especially applicable to modern day analytics-intensive data types with the associated large data objects and unstructured data. The SONAS file system recognizes typical access patterns like sequential, reverse sequential and random and optimizes I/O access for these patterns.
100
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The SONAS file system has implemented a sophisticated distributed metadata server function, in which multiple nodes act, share, acquire, and relinquish roles as token managers for a single file system. This distributed architecture avoids metadata server bottlenecks, and has been proven to scale to very large file systems. Along with distributed token management, the SONAS file system provides scalable metadata management by allowing all nodes of the cluster accessing the file system to perform file metadata operations. This key and unique feature distinguishes SONAS from other cluster file systems which have a centralized metadata server handling fixed regions of the file namespace. The SONAS file system design avoids a centralized metadata server, to avoid problems where there is a performance bottleneck for metadata intensive operations. This also improves availability, as the distributed metadata server function provides additional insulation against a metadata server single points of failure. SONAS implements the GPFS technology that solves this problem by managing metadata at the node which is using the file or in the case of parallel access to the file, at a dynamically selected node which is using the file.
101
SONAS filesets
SONAS also utilizes a file system object called a fileset. A fileset is a directory subtree of a file system namespace that in many respects behaves like an independent file system. Filesets provide a means of partitioning the file system to allow administrative operations at a finer granularity than the entire file system. Filesets allow the following operations: Define quotas on both data blocks and inodes. Can be specified in a policy to control initial data placement, migration, and replication of the files data. SONAS supports a maximum of 1000 filesets per file system.
102
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Access control
The SONAS filesystem uses NFSv4 enhanced access control to allow all SONAS users, regardless of the NAS protocol by which they access the SONAS, to be able to take advantage of this robust level of central security and access control to protect directories and files. SONAS file system implements NFSv4 access control lists (ACLs) in addition to traditional ACL support. SONAS ACLs are based on the POSIX model. Access control lists (ACLs) extend the base permissions, or standard file access modes, of read (r), write (w), and execute (x) beyond the three categories of file owner, file group, and other users, to allow the definition of additional users and user groups. In addition, SONAS introduces a fourth access mode, control (c), which can be used to govern who can manage the ACL itself.
103
Fileset
SONAS file system provides the concept of a fileset, which is a sub-tree of the file system namespace and provides a way to partition the global namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a user defined policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of SONAS file system rules in a user defined policy.
104
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Filesets can be used to change the replication (mirroring) status at the file level, allowing fine grained control over the space used for data availability. You can use a policy that says:
replicate all files in /database/payroll which have the extension *.dat and are greater than 1 MB in size to storage pool #2.
In addition, file management policies allow you to prune the file system, deleting files as defined by policy rules. File management policies can use more attributes of a file than placement policies because after a file exists, there is more known about the file. In addition to the file placement attributes you can now utilize attributes such as last access time, size of the file or a combination of user and file size. This can result in policies such as: delete all files with a name ending in .temp that have not been accessed in 30 days, move all files that are larger than 2 GB to pool2, or migrate all files owned by Sally that are larger than 4 GB to the Nearline SAS storage pool. Rules can include attributes related to a pool instead of a single file using the threshold option. Using thresholds you can create a rule that moves files out of the high performance pool if it is more than 80% full, for example. The threshold option comes with the ability to set high low and pre-migrate thresholds. This means that GPFS begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate threshold is set GPFS continues to copy data to Tivoli Storage Manager until the pre-migrate threshold is reached. This allows the data to continue to be accessed in the original pool until it is quickly deleted to free up space the next time the high threshold is reached. SONAS file system policy rule syntax is based on the SQL 92 syntax standard and supports multiple complex statements in a single rule enabling powerful policies. Multiple levels of rules can be applied because the complete policy rule set is evaluated for each file when the policy engine executes. All of these data management functions are described in more detail in SONAS data management services on page 107.
105
Storage Pod
Storage Pod
Storage Pod
Storage Pod
Figure 3-18 SONAS file system two-tier architecture with internal GPFS NSD clients and NSD servers
As shown previously, the fact that the disks are remote to the interface nodes, is transparent to the interface nodes themselves, to the users. The storage nodes, NSD server nodes, are responsible for the serving of disk data blocks across the internal InfiniBand network. The SONAS file system is thus composed of storage pod 'building blocks' for storage, in which a balanced number of storage nodes, NSD storage servers, are preconfigured within the SONAS storage pod, to provide optimal performance from the disk storage. The SONAS cluster runs on enterprise class commercial Intel-based servers - and these run on a Linux-based kernel. he SONAS file system nodes use a native InfiniBand protocol built on Remote Memory Direct Access (RDMA) technology to transfer data directly between the interface node NSD client memory and the storage node NSD server memory thus exploiting the 20 Gbit/sec per port data transfer rate of the current SONAS internal InfiniBand switches, maximizing throughput, and minimizing node CPU utilization.
106
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The SONAS file system is highly available and fault tolerant: Data protection mechanisms include journaling, replication, mirroring. Internal peer-to-peer global cluster heartbeat mechanism allows recovery from multiple disk, node, and connectivity failures. Recovery software mechanisms are implemented in all layers. Let us now examine the SONAS data management services in more detail.
CIFS
NFS
FTP
HTTPS
future
Scan Engine
We also discuss the role and usage of Tivoli Storage Manager together with external Tivoli Storage Manager servers to provide accelerated backup and restore, and tor provide HSM to external storage pools. Finally, we describe local data resiliency using Snapshots, and remote resiliency using asynchronous replication.
3.6.1 SONAS: Using the central policy engine and automatic tiered storage
SONAS uses policies to control the lifecycle of files that it manages and consequently control the costs of storing data by automatically aligning data to the appropriate storage tier based on the policy rules setup in by the SONAS administrator.
107
Figure 3-20 illustrates a tiered storage environment that contains multiple storage tiers. Each tier has specific performance characteristics and associated costs, for example, poolfast contains fast and expensive disk, whereas pooltape contains relatively inexpensive tapes.
Performance comes at a price, and is the main cost differentiator in storage acquisitions. For this reason setting policies can help control costs by using the appropriate storage tier for a specific sets of data and making room on the more expensive tiers for new data with higher performance requirements. The SONAS policy implementation is based on and uses the GPFS policy implementation. File reside in SONAS storage pools and policies are assigned to files and control the placement and movement of files between storage pools. A SONAS policy consists in a collection of rules and the rules control what actions are executed and against what files the actions are performed. So the smallest entity controlled by a rule is a file. SONAS policy rules are single statements that define an operation such as migrate and replicate a file. SONAS has three types of policies: Initial file placement These rules control the placement of newly created files in a specific storage pool. File management These rules control movement of existing files between storage pools and the deletion of old files. Migration policies are used to transfer data between the SONAS storage pools and to the external HSM storage pool and to control replication of SONAS data. These rules control what happens when data gets restored to a SONAS file system.
Policy rules are SQL-like statement that specify conditions that, when true, cause the rule to be applied. Conditions that cause GPFS to apply a rule include these: Date and time when the rule is evaluated, that is, the current date and time Date and time when the file was last accessed Date and time when the file was last modified Fileset name File name or extension File size User ID and group ID
108
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
SONAS evaluates policy rules in order, from first to last, as they appear in the policy. The first rule that matches determines what is to be done with that file. Example 3-1 shows sample rule syntax.
Example 3-1 Sample rule syntax
RULE 'mig1' MIGRATE FROM POOL 'pool_1' THRESHOLD(90,80,70) WEIGHT(KB_ALLOCATED) TO POOL 'pool_2' RULE 'del1' DELETE FROM POOL 'pool_1' WHERE (DAYS(CURRENT_TIMESTAMP) DAYS(ACCESS_TIME) > 30) AND lower(NAME) LIKE '%.tmp' Each SONAS filesystem is mapped to storage pools. The default pool for a filesystem is the system pool also called pool1. A file system can have one or more additional storage pools after the system pool. Each storage pool is associated with one or more NSDs or LUNs. SONAS also manages
external storage pools. An external storage pool is not mapped to standard NSD devices, it is
a mechanism for SONAS to store data in an external manager such as Tivoli Storage Manager. SONAS interfaces with the external manager using a standard protocol called Data Management API (DMAPI) that is implemented in the SONAS GPFS filesystem. Policies control the location of files among storage pools in the same filesystem. Figure 3-21 shows a conceptual representation of a filesystem, pools, and NSDs:
109
A filesystem is managed by one active policy, policy1 in the example. The initial file placement policies control the placement of new files. File placement policies are evaluated and applied at file creation time. If placement policies are not defined all new files are placed in the system storage pool. Migration and deletion rules, or file management rules, control the movement of files between SONAS disk storage pools and external storage pools like Tivoli Storage Manager HSM and the deletion of old files. Migration and deletion rules can be scheduled using the cron scheduler. File migration between pools can also be controlled by specifying thresholds. Figure 3-22 shows a conceptual representation of these rules.
SONAS introduces the concept of tiered and peered storage pools: Tiered Pools The pools that NSDs are assigned to can be tiered in a hierarchy using GPFS file management policies. These hierarchies are typically used to transfer data between a fast pool and a slower pool (Pool1 Pool2) using migration. When coupled with HSM, data flows in a hierarchy from Pool1 Pool2 Pool3 (HSM). The pools that NSDs are assigned to can be operated as peers in a hierarchy using GPFS initial file placement policies. These policies allow files to be placed according to rules in either the fast pool Pool1or the slower pool Pool2. When coupled with HSM data flows to either Pool1 or Pool2 pool based on initial file placement policies, then from both Pool1 and Pool2 pools the data flows to Pool3 (HSM) based on file management policies.
Peered Pools
To simplify implementation of HSM and storage pooling, SONAS provides templates for various standard usage cases. Customized cases can be created from the default templates by using the SONAS CLI. The standard usage cases, also called ILM profiles, are shown in the diagram in Figure 3-23.
110
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
new file
pool1
new file
pool1
Peered pools: Placement policies only pool2 Tiered pools: Files placed in pool1 and then moved to pool2 Default pool and HSM: Files placed in pool1 then moved to TSM HSM pool3 Peered pools and HSM: Placement policies for pool1,2 and migration from pool1,2 to pool3 Tiered pools and HSM: Files placed in pool1, then migrated to pool2 and then to TSM HSM pool3
new file
pool1
pool2
new file
pool1
pool3
new file
pool1
pool2
pool3
new file
pool1
pool2
pool3
The standard ILM policy profiles are based on the assumption that pool1 is the fastest pool using the fastest storage devices such as SAS disks and pool2 is based on less expensive disk such as Nearline SAS. SONAS GPFS metadata must always reside in the fastest storage pool, pool1 in our examples as it is the data that has the highest IO requirements when SONAS GPFS file system scan operations are performed. For additional information about configuration of SONAS policy rules, see SONAS policies on page 159.
3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics
The use of SONAS HSM provides the following advantages: It frees administrators and users from manual file system pruning tasks, and defers the need to purchase additional disk storage. It allows Tivoli Storage Manager HSM to extend the SONAS disk space and automates the movement of seldom-used files to and from external near line storage. It allows pre-migration, a method that sends a copy of the file to be migrated to the Tivoli Storage Manager server prior to migration, allowing threshold migration to quickly provide space by simply stubbing the premigrated files. To use the Tivoli Storage Manager HSM client, you must provide a Tivoli Storage Manager server external to the SONAS system, and the server is accessed through the Ethernet connections on the interface nodes. See SONAS and Tivoli Storage Manager integration on page 119 for more information about the configuration requirements and connection of a SONAS and Tivoli Storage Manager server. The current version of SONAS requires that HSM be configured and managed using the CLI as at the time of writing GUI support is not present for HSM. HSM migration work can cause
111
additional overhead on the SONAS interface nodes, especially in environments that regularly create large amounts of data and want to migrate it early, so take care when planning the timing and frequency of migration jobs. When using HSM space management on a filesystem, each file in the filesystem can be in one of three states:
Resident when the file resides on disk in the SONAS appliance Premigrated when the file resides both on the disk in the SONAS and in Tivoli Storage
Manager HSM
112
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
You must ensure that sufficient network bandwidth and connectivity exists between the interface nodes they select to run HSM on to the external storage server they are providing. The Tivoli Storage Manager server has to be prepared for use by the SONAS system. A Tivoli Storage Manager storage pool using the to store migrated data must be set up. Server time needs to be synchronized with the SONAS system, and both systems must access the same NTP server. Set Tivoli Storage Manager server authentication to ON (set auth on). HSM can be added to a filesystem at the time of filesystem creation or at a later time. Attention: HSM cannot be removed from a file system through CLI commands; services need to be engaged. The diagram in Figure 3-24 shows the steps that need to be performed to add HSM to a SONAS filesystem using the SONAS CLI.
mkfs/chfs create/change fs
The mkfs and chfs commands are used to create a new filesystem or modify a filesystem for HSM usage, as these commands allow you to add multiple NSDs and storage pools to the filesystem. The cfghsmnode command is used to validate the connection to Tivoli Storage Manager and sets up HSM parameters. The startbackup command can optionally be used to verify the Tivoli Storage Manager connection for a specific filesystem. If startbackup executes correctly, you know you have a valid connection to Tivoli Storage Manager for use by HSM. The cfghsmfs command adds HSM support for a given filesystem, it enables SONAS CIFS component HSM support and stores HSM configuration information to the CTDB registry. You then create a policy with the mkpolicy command and set the policy for a filesystem with the setpolicy command. For more information about creating and managing policies, see Figure 10-149, Call Home test on page 434.
113
After creation of the policy, you can schedule the policy execution with the SONAS scheduler by using the mkpolicyrule command. SONAS HSM also provides the lshsmlog command to view HSM errors, as well as the lshsmstatus command to verify HSM execution status.
114
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Snapshots can be made by administrators with proper authority through the SONAS Management GUI, or through the SONAS Command Line Interface (CLI). The SnapShot appears as a special directory called .snapshots and located in the filesystem root directory, as shown in Figure 3-25.
/fs1/file1 /fs1/file2 /fs1/subdir1/file3 /fs1/subdir1/file4 /fs1/subdir2/file5 /fs1/.snapshots/snap1/file1 /fs1/.snapshots/snap1/file2 /fs1/.snapshots/snap1/subdir1/file3 /fs1/.snapshots/snap1/subdir1/file4 /fs1/.snapshots/snap1/subdir2/file5
Figure 3-25 SONAS Snapshot appears as a special directory in the file system
Snapshots of a SONAS file system are read-only; changes are made only to the active (that is, normal, non-snapshot) files and directories. Snapshots are only made of active file systems, you cannot make a Snapshot of an existing snapshot. Individual files, groups of files, or entire directories can be restored or copied back from Snapshots. For additional information about configuring snapshots, see Snapshots on page 193.
115
Figure 3-26 SONAS Snapshots are accessible for Windows CIFS users by Windows Explorer
SONAS Snapshots are intended as a point in time copy of an entire SONAS file system, and preserves the contents of the file system at a single point in time. The snapshot function allows a backup or mirror program to run concurrently with user updates and still obtain a consistent copy of the file system as of the time that the snapshot was created. SONAS Snapshots also provide an online backup capability that allows easy recovery from common problems such as accidental deletion of a file, and comparison with older versions of a file.
116
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Next, we discuss the SONAS asynchronous replication capability that is designed to address these requirements. At a high level, SONAS asynchronous replication works as follows: 1. The first step is to execute a central policy engine scan for async replication. The SONAS high performance scan engine is used for this scan. As part of the asynchronous replication, an internal snapshot will be made of both the source file system and the target file system. The first step is shown in Figure 3-27.
Global Namespace
Interface node
Policy Engine
Interface node
Storage node
Storage node
Target file system 2 snapshot File system 1 snapshot File system 2 snapshot
Figure 3-27 SONAS async replication step 1 - execute a policy, makes snapshots
2. The next step is to make a mathematical hash of the source and target snapshots, and compare them, as shown in Figure 3-28.
Global Namespace
Policy Engine
Interface node
hash
Interface node
hash
Storage node
hash
Target file system 2 snapshot File system 1 snapshot File system 2 snapshot
Figure 3-28 SONAS async replication step 2 - compare mathematical hash of snapshots
117
The final step is to exploit the parallel data transfer capabilities of SONAS by having multiple nodes participate in the transfer of the async replication changed blocks to the target remote file systems, as shown in Figure 3-29.
Global Namespace
Policy Engine
Interface node
Interface node
Storage node
Storage node
Target file system 2 snapshot File system 1 snapshot File system 2 snapshot
Figure 3-29 SONAS async replication step 3 - transfer data using multiple interface nodes
The internal snapshot at the source side assures that data being transmitted is in data integrity and consistency, and is at a single point in time. The internal snapshot at the target is there to provide a backout point in time capability, if for any reason the drain of the changes from source to target fails before it is complete. Let us review a few more details about the SONAS asynchronous replication. SONAS asynchronous replication is designed to cope with connections that provide low bandwidth, high latency and low reliability. The basic steps in of SONAS asynchronous replication are as follows: 1. Take a snapshot of both the local and remote file system(s). This ensures first that we are replicating a frozen and consistent state of the source file system. 2. Collect a file path list with corresponding stat information, by comparing the two with a mathematical hash, in order to identify changed blocks 3. Distribute the changed file list to a specified list of source interface node(s) 4. Run a scheduled process that performs rsync operations on the set of interface nodes, for a given file list, to the destination SONAS. Rsync is a well-understood open source utility, that will pick-up the changed blocks on the source SONAS file system, and stream those changes in parallel to the remote, and write them to the target SONAS file system. 5. The snapshot at the remote SONAS system insures that a safety fallback point is available if there is a failure in the drain of the new updates. 6. When the drain is complete, then the remote file system is ready for use. 7. Both snapshots are automatically deleted after a successful replication run. The target SONAS system is an independent SONAS cluster that might be thousands of miles away. At the current release level SONAS R1.1.1, asynchronous replication is available for replicating incremental changes at the file system level to one other site. Asynchronous replication is done using an IBM enhanced and IBM supported version of open source 'rsync'. The enhancements include the ability to have multiple SONAS nodes in parallel work on the rsync transfer of the files.
118
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The asynchronous replication is unidirectional; changes on the target site are not replicated back. The replication schedule is configured thru the SONAS GUI or by CLI. Depending on the number of files included in the replication, the minimal interval will vary depending on the amount of data and files to be sent. For additional information about how to configure asynchronous replication, see Local and remote replication on page 198.
119
Figure 3-30 shows a diagram of the SONAS and Tivoli Storage Manager configuration.
Storage Pod
IBM SONAS
disk
As compared to normal, conventional backup software, SONAS and Tivoli Storage Manager integration is designed to provide significantly accelerated backup elapsed times or high performance HSM to external storage, by exploiting the following technologies: The fast SONAS scan engine is used to identify files for Tivoli Storage Manager to back up or migrate. This is much faster compared to standard Tivoli Storage Manager backups, or other conventional backup software, that needs to traverse potentially large filesystems and checking each file against the Tivoli Storage Manager server. The SONAS scan engine is part of the SONAS file system and knows exactly which files to back up and migrate and will build a list of files, the filelist, to back up or migrate. The list is then passed to Tivoli Storage Manager for processing. Instead the standard operation of Tivoli Storage Manager requires that it traverse all files in the file system and send the information to the Tivoli Storage Manager server to determine which files need a backup and which files are already present in the file server. Multiple SONAS interface nodes can be configured to work in parallel so that multiple Tivoli Storage Manager clients can stream data to the Tivoli Storage Manager server at an accelerated rate. The SONAS Software will distribute parts of the filelist as backup jobs to multiple Tivoli Storage Manager clients configured on a given set of interface nodes. Each interface node then operates in parallel on its own subset of the files in the filelist. Each Tivoli Storage Manager process can establish multiple sessions to the Tivoli Storage Manager server. Tivoli Storage Manager customers can make use of their existing Tivoli Storage Manager servers to back up SONAS, if the server has enough capacity to accommodate the new workload. Configuring SONAS to perform Tivoli Storage Manager functions requires only a few commands, these commands need to be issued both on the Tivoli Storage Manager server and the SONAS system. These command perform both the initial configuration and the scheduling of the backup operations. HSM migration operations are configured separately using the policy engine, as discussed in SONAS data management services on page 107. In SONAS, Tivoli Storage Manager backup is performed over LAN through one or more interface nodes and these connect to one or more Tivoli Storage Manager servers. It is not 120
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
possible to do LAN-free backup at this time from SONAS directly to storage devices managed the Tivoli Storage Manager server. For more information about how to configure SONAS with Tivoli Storage Manager, see Backup and restore of file data on page 185.
121
Manager disk-pool as the primary pool to store data. If you configure the disk pool larger than the normal amount of backup data that gets backed up per backup-run, so that all data first gets copied to disk, then no tape mount is required during a backup Depending on the amount of data in SONAS, it might be necessary to have one dedicated Tivoli Storage Manager server per filesystem considering that one SONAS filesystem can contain 2 billion files. If you need to back up large files to Tivoli Storage Manager, larger than 1 MB, then you might consider sending them directly to tape without storing them on a disk storage pool. You will need to configure as many tape drives as the number of parallel sessions you have configured to Tivoli Storage Manager in SONAS. When using SONAS HSM that migrates data outside the SONAS environment, probably consider using tape as the final destination of the data because if you use disk, you defeat the purpose of migration. When using HSM to tape, remember to plan for the application delay in accessing the data because of the time required to mount and position the tape and then the time required to recall the data to SONAS disk. The Tivoli Storage Manager backup is not using the classical process to traverse the filesystem, compare the client contents with those on the server, and identify the changes, because this is time-consuming due to the interaction between the filesystem, Tivoli Storage Manager client, and remote Tivoli Storage Manager server. Instead, the SONAS Software is called to use the high performance scan engine and the policy engine to identify changes in the filesystem, and to generate the list of files that need to be expired, and the list of files that need to be backed up. Various scripts are provided with the SONAS Software to define the interface nodes involved in the backup, the relationship of which filesystem needs to be backed up to which Tivoli Storage Manager server, and to schedule, start and stop backup and restores operations. Do not consider the use of SONAS HSM with Tivoli Storage Manager as a replacement for backups. HSM must be viewed as an external storage extension of local SONAS disk storage. A Tivoli Storage Manager backup implies two concepts, the first is that the backup is a copy of the original file, regardless of where the original file is and that can be either inside a SONAS filesystem or in Tivoli Storage Manager external storage. The second concept is that the backup file can exist in multiple versions inside Tivoli Storage Manager storage, based o the Tivoli Storage Manager backup policies you configure. Tivoli Storage Manager backups will allow you to restore a file that has been damaged or lost, either because of deletion or logical corruption of the original file or because of media failure either in SONAS storage or in Tivoli Storage Manager storage. When a file is migrated using the Tivoli Storage Manager HSM server to the external Tivoli Storage Manager HSM storage, there is still only one copy of the file available, because the original is deleted on the SONAS file system itself, and replaced by the Tivoli Storage Manager/HSM stub file. Also, HSM with Tivoli Storage Manager maintains only the current copy of the file, giving no opportunity to store multiple versions. In comparison, Tivoli Storage Manager backup/archive (or typically any backup/archive software) gives you the full ability to storage multiple backup versions of a file, and to track and manage these backup copies in an automated way. It is a Tivoli Storage Manager best practice to back up a file before the file has been migrated by Tivoli Storage Manager HSM to external storage. With proper configuration, you can specify in Tivoli Storage Manager management classes that a file is not eligible for HSM migration unless a backup has been made first with the Tivoli Storage Manager backup-archive capability. Generally an HSM managed file lifecycle implies file creation, the 122
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
backup of the file shortly after creation, the file then stays on disk for a given amount of time and is later migrated to Tivoli Storage Manager HSM storage. If the file becomes a candidate for migration very shortly after creation the following two scenarios can occur: If you specify in Tivoli Storage Manager that migration requires backup then the file will not be migrated until a backup cycle has successfully completed for the file. The file will be copied from SONAS to Tivoli Storage Manager two times: one time for backup and one time for migration. If you specify in Tivoli Storage Manager that migration does not require backup then the file will be migrated and a subsequent backup cycle will cause the file to be copied inside Tivoli Storage Manager from Tivoli Storage Manager HSM storage to Tivoli Storage Manager backup storage. The file will be copied from SONAS to Tivoli Storage Manager only one time and the second copy will be made by the Tivoli Storage Manager server. Migration: If ACL data of a premigrated file is modified, these changes are not written to the Tivoli Storage Manager server, if the file will be migrated after this change. To avoid losing the modified ACL data, use the option migraterequiresbackup yes. This setting will not allow the migration of files whose ACL data has been modified and no current backup version exists on the server. You can back up and migrate your files to the same Tivoli Storage Manager server or to various Tivoli Storage Manager servers. If you back up and migrate files to the same server, the HSM client can verify that current backup versions of your files exist before you migrate them. For this purpose, the same server stanza for backup and migration must be used. For example, if you are using the defaultserver and migrateserver Tivoli Storage Manager options, they must both point to the same server stanza within the Tivoli Storage Manager dsm.sys file. You cannot point to various server stanzas, even if they are pointing to the same Tivoli Storage Manager server. To restore stub files rather than backup versions of your files, for example, if one or more of your local file systems is damaged or lost, use the Tivoli Storage Manager backup-archive client restore command with the restoremigstate option. Your migrated and premigrated files remain intact on the Tivoli Storage Manager server, and you need only restore the stub files on your local system. However you cannot use the backup-archive client to restore stub files for your migrated files, if they have been backed up before the migration. Instead use the Tivoli Storage Manager HSM dsmmigundelete command to recreate stub files for any migrated or premigrated files that are lost. If you back up and migrate data to tape volumes in the same library, make sure that there are always a few tape drives available for space management. You can achieve this by limiting the number of tape drives which can be used simultaneously by backup and archive operations. Specify a number for the mountlimit, which is less than the total number of drives available in the library (see the mountlimit option of the define devclass command in the IBM Tivoli Storage Manager Administrator's Reference for your operating system). Using disk storage as your primary storage pool for space management might, depending on the average size of your files, result in a better performance than using tape storage pools. If you back up files to one Tivoli Storage Manager server and migrate them to another Tivoli Storage Manager server, or if you are using various Tivoli Storage Manager server stanzas for backup and migration, the HSM function cannot verify that current backup versions of your files exist before you migrate them. Use the backup-archive client to restore the actual backup versions only.
123
124
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
7. Set Tivoli Storage Manager server authentication to ON (set auth on). 8. Make sure that Tivoli Storage Manager server date/time and SONAS nodes data/time are in sync. 9. Create a new SONAS backup schedule using the StartBackupTSM template.
Licensing considerations
You are required to pay a license charge for the Tivoli Storage Manager client code only if you are using Tivoli Storage Manager functions, and you only pay the license charge for the interface nodes that are attached to Tivoli Storage Manager servers and actively run Tivoli Storage Manager client code. The Tivoli Storage Manager HSM client requires the Tivoli Storage Manager backup/archive client, so to use HSM functionality, both clients must be licensed for each interface node running the code. Even though Tivoli Storage Manager can be licensed for a subset of interface nodes, it is best to license the function on all interface nodes for the following reasons: A SONAS filesystem can be mounted on a subset of nodes or on all nodes. Mounting the file system on all nodes guarantees the maximum level of availability of the resource in case of failover. To manage a file system, Tivoli Storage Manager code must run on at least one of the nodes where the file system is mounted. It is best to run Tivoli Storage Manager code on multiple nodes where the filesystem is mounted to guarantee service during failover. The Tivoli Storage Manager client can execute parallel backup streams from multiple nodes for the same filesystem thus increasing backup and restore throughput. When using Tivoli Storage Manager HSM, file recalls can occur on any node and need to be serviced by a local Tivoli Storage Manager HSM client.
125
126
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
CIM Agent
(handles SNMP using an Adaptor)
SSH Daemon
Interface Nodes
CLI Server
Business Layer
CIM Agent
(handles SNMP sing an Adaptor)
SSH Daemon DB
GUI
(ISC Controller)
Storage Nodes
SONAS Backend
Management Node
Web Browser
SONAS uses multiple specialized gatherer tasks to collect data and update the database, as shown in Figure 3-32. For example, clicking the Refresh button on the File Systems GUI page starts a File System gatherer task, which will get the needed information from the nodes attached to the cluster. The last time the gather was run is displayed on the bottom right button in the file system GUI window.
Gatherer Tasks File Systems SSH Client Exports SONAS Backend SSH SSH Daemon Daemon DB
Node Node
127
roles are used to segregate GUI administrator users according to their working scope within the Management GUI. These defined roles are as follows: Administrator: This role has access to all features and functions provided by the GUI. This role is the only one that can manage GUI users and roles. Operator: The operator can do the following tasks: Check the health of the cluster. View the cluster configuration. Verify the system and file system utilization. Manage to set thresholds and notifications settings.
Export administrator: The export administrator is allowed to create and manage shares, plus perform the tasks the operator can execute. Storage administrator: The storage administrator is allowed to manage disks and storage pools, plus perform the tasks the operator can execute. System administrator: The system administrator is allowed to manage nodes and tasks, plus perform the tasks the operator can execute. For additional information about administration roles and defining administrators, see User management on page 399. SONAS has a central database that stores configuration information and events. This information is used and displayed by the management node and collected to the management node from the other nodes in the cluster. The SONAS Management GUI and Health Center provide panels for most functions; a partial list follows: Storage management File system management Pool management Fileset management Policy management Access control list (ACL) management Synchronous replication management Heiarchical Storage management Tivoli Storage Manager backup management Async replication management Snapshot management Quota management Cluster management Protocol management (CIFS, NFS, HTTPS, FTP) Export management Event log Node availability Node utilization (CPU, memory, I/O) Performance management (CPU, memory, I/O) File system utilization (capacity) Pool / disk utilization (capacity) Notifications / call-home Hardware monitoring File access services such as NFS, HTTPS, FTP, and CIFS File system services Nodes including CPUs, memory DIMMs, VRM, disk drives, power supplies, fans and onboard network interface ports I/O adapters including storage and network access Storage utilization
128
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Panels are available for most of the major functions, as shown in Figure 3-33.
Figure 3-33 SONAS Management GUI has panels for most aspects of SONAS
129
SONAS has a complete Topology Viewer, that shows in graphical format, the internal components of the SONAS, reports on their activity, and provides a central place to monitor and display alerts. You can click an icon and drill down into the details of the particular component, this function is especially useful when drilling down to solve a problem. In Figure 3-34, we see an example of the SONAS Management GUI Topology Viewer.
Each of the icons is clickable, and will expand to show status of an individual components. The SONAS Management GUI is the focal point for extended monitoring facilities and the SONAS Health Center.
130
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 3-35 SONAS Health Center historical system utilization graphical reports
The length of time that can be reported is determined by the amount of log space set aside to capture data. For additional information about the Health Center, see Health Center on page 420.
131
File Sets Quotas Snapshots Replication ILM automatic tiered storage HSM Physical management of disk storage Performance and Reports System Utilization SONAS Console Settings Scheduled Tasks The SONAS CLI is designed to be familiar to the standard UNIX, Windows, and NAS administrator.
Call Home
SMTP
GUI
SONAS supports the following kinds of notifications: Summary email that collects all messages and sends out the list on a regular basis Immediate email and SNMP traps: Contents are the same for both email and SNMP. Log messages are instantly forwarded. A maximum number of messages can be defined after that number is reached further messages are collected and a summary is sent.
132
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
These messages originate from multiple sources including Syslog, GUI gatherer (gpfs status, ctdb status), CIM messages from providers in cluster and SNMP messages from nodes in cluster. SONAS also allows you to set utilization thresholds, when the threshold is reached a notification gets sent, for various resources including: CPU usage File system usage GPFS usage Memory usage Network errors
133
Network#A Network#A
Network#B Network#B
Network#C Network#C
DNS DNS SONAS#A SONAS#A IPA1, IPA1, IPA2, IPA2, IPA3 IPA3 SONAS#B SONAS#B IPB1, IPB1, IPB2, IPB2, IPB3 IPB3 SONAS#C SONAS#C IPC1, IPC1, IPC2, IPC2, IPC3, IPC3, IPC4,IPC5,IPC6 IPC4,IPC5,IPC6
Netwkgrp#1 Netwkgrp#1
Netwkgrp#2 Netwkgrp#2
IPA1
IPC4
IPA2
IPB1 IPC5
IPB3
IPC1
IPC2
IPC3
filesys #1
filesys #3
filesys #2
In the example in Figure 3-37, filesystem filesys#2 is accessible through the export exportfs#2 through all networks. Filesystem filesys#1 instead will be accessible only through network#A and the DNS alias SONAS#A. Accessing filesys#1 over network#B and DNS alias SONAS#B can cause problems because the DNS might return IP address IPB3, which is associated with node#4 that does not mount filesys#1. When creating network groups, take care to ensure that filesystems accessed through a given network are mounted on all that networks network group nodes. In failover situations the IP address will be taken over by another node in that specific network group, and this ensures that in case of failover that specific IP address will allow you to access all filesystems associated with, or mounted on the nodes of, that network group. One way to ensure that there are no mismatches between mounted filesystems and network groups is to mount the share on all interface nodes and access it only on a given network group. Network groups can be used for multiple reasons: When we limit the client access to two or three nodes, we increase the probability of finding data in cache more so than if the access were spread across many nodes, so that it gives a performance benefit. Another use is to segregate workloads such as production and test in the same SONAS cluster.
134
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
135
TSM TSM Server: Server: TSMs#1 TSMs#1 proxy proxy target: target:fs1tsm fs1tsm nodes:node#1,node#2,node#3 nodes:node#1,node#2,node#3
TSM TSM Server: Server: TSMs#2 TSMs#2 proxy proxy target: target:fs2tsm fs2tsm nodes:node#1,node#2,node#3,node#4,node#5,node#6 nodes:node#1,node#2,node#3,node#4,node#5,node#6 proxy: proxy:fs3tsm fs3tsm nodes:node#2, nodes:node#2, node#3, node#3, node#4 node#4
filesys #1
filesys #3
filesys #2
We have multiple grouping concepts in action here. On the Tivoli Storage Manager server side we define one proxy target for each filesystem and this proxy target is associated with multiple proxy agent nodes. You can define a subset of nodes as proxy agents but this might lead to errors if a backup is run from a node that is not defined as a proxy agent so, to avoid such errors, define all interface nodes as proxy agents for Tivoli Storage Manager. The cfgtsmnode command will create a Tivoli Storage Manager server definitions or stanzas on the node where the command is run, and running the command on multiple nodes will create a group of definitions for the same server. To avoid missing Tivoli Storage Manager server stanzas on a node you can define all available Tivoli Storage Manager servers to all nodes. The cfgbackupfs command configures the backup to run on a subset group of nodes. To execute the backup of a filesystem on a node the following requirements must be met: The filesystem must be mounted on that node A Tivoli Storage Manager server stanza must have been defined on the node for the target Tivoli Storage Manager server Tivoli Storage Manager server proxy target and agent node definitions need to be in place for that node The interface node mist have network connectivity to the Tivoli Storage Manager server The (green) arrows in Figure 3-38 show the data path for the backups. Data flows from the filesystem to the group of interface nodes defined with the cfgtsmnode and cfgbackupfs commands and to the Tivoli Storage Manager server through the network, we see that groups of nodes can perform backup operations in parallel, for example, backups for filesys#3 are executed by nodes node#2, #3 nd #4. 136
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The network must be available to access the Tivoli Storage Manager servers, as the network is accessed from the interface nodes using network groups, the Tivoli Storage Manager server used to back up a given filesystem must be accessible from the interface nodes where the filesystem is mounted and where the Tivoli Storage Manager server stanza has been defined.
137
Independent scalability of storage and nodes, which provides simple but flexible configurations tailored to your specific workload characteristics, yet remains flexible and reconfigurable for the future Concurrent access to files from all nodes, distributed token management, and automatic self-tuning and workload balancing, and high availability by the Cluster Manager; these combine to provide very high performance, reduced administrative costs related to migrating hot spot files Storage Pool striping, which provides very high performance, fast access to data High performance metadata scanning across all available resources/nodes, integrated Tivoli Storage Manager clients, which provides ability to perform HSM and automatic tiered storage at high scalability, and well as accelerate faster backups of files Snapshots, Asynchronous Replication, which provide robust data protection and disaster Recovery SONAS Software provides the ability for central management of storage, providing the functionality for a highly automated, extremely flexible, and highly scalable self-managing system. You can start with a small SONAS with less than 100 TB, and continue to seamlessly grow and linearly scale and increased performance, using the SONAS Software to manage scalability at petabytes. SONAS Software supports the full capability of the current SONAS to scale is up to 30 interface nodes and 60 storage nodes. The current largest SONAS storage subsystem, capable of supporting up to 14.4 petabytes of raw storage. One copy of SONAS Software runs on each node of a SONAS. A current maximum SONAS configuration is shown in Figure 3-39.
Logical
Global Namespace
Policy Engine
Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Interfac e nodes
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e no de s
Stor ag e node s
Stor ag e node s
Stor ag e node s
> scale > out > > scale > out >
Stor ag e no de s Stor ag e no de s
scale out
Figure 3-39 SONAS Software manages all aspects of a maximum size SONAS
As storage needs continue to grow over time, the SONAS Software is designed to continue to scale out and support even larger configurations, while still maintaining all the storage management and high performance storage characteristics that we discussed in this chapter.
138
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
139
140
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 4.
Networking considerations
In this chapter we provide information about networking as related to SONAS implementation and configuration. We begin with a brief review of Network Attached Storage concepts and terminology. Following that, we discuss technical networking implementation details for SONAS.
141
142
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The receiving NAS system keeps track of the initiating clients details, so that the response can be directed back to the correct network address. The route for the returning I/O follows, more or less, the reverse path outlined previously.
It is important to note that a database application accessing a remote file located on a NAS device, by default, must be configured to run with file system I/O. As we can see from the previous diagram, it cannot use raw I/O to achieve improved performance (that is only possible with locally attached storage).
143
Security
For directory and file level security, NFS uses UNIX concepts of User, Groups (sets of users sharing a common ID), and Other (meaning no associated ID). For every NFS request, these IDs are checked against the UNIX file systems security. However, even if the IDs do not match, a user can still have access to the files. CIFS, however, uses access control lists that are associated with the shares, directories, and files, and authentication is required for access.
Locking
The locking mechanism principles vary. When a file is in use NFS provides advisory lock information to subsequent access requests. These inform subsequent applications that the file is in use by another application, and for what it is being used. The later applications can decide if they want to abide by the lock request or not. So UNIX or Linux applications can access any file at any time. The system relies on good neighbor responsibility and proper system administration is clearly essential. CIFS, on the other hand, effectively locks the file in use. During a CIFS session, the lock manager has historical information concerning which client has opened the file, for what purpose, and in which sequence. The first access must complete before a second application can access the file. 144
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
4.1.5 Authentication
SONAS supports the following authentication methods: Microsoft Active Directory LDAP (Lightweight Directory Access Protocol) NIS (Network Information Service) Samba PDC / NT4 mode At the current release level, a single SONAS system can support only one of these authentication methods at a time. In order to access a SONAS system, the user must be authenticated using the authentication method that is implemented on a particular SONAS machine.
145
SONAS.virtual.com
SONAS.virtual.com
SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15
As shown in Figure 4-2, in SONAS each network client is allocated to one and only one interface node, in order to minimize cluster overhead. SONAS Software does not rotate a single clients workload across interface nodes. That is not only unsupported by DNS or CIFS, but will also decrease performance, as caching and read-ahead is per done per SONAS interface node. At the same time, workload from multiple users, numbering into the thousands or more, is equitably spread across as many SONAS interface nodes as are available. If more user network capacity is required, you simply add more interface nodes. SONAS scale out architecture provides linear scalability as the numbers of users grow. SONAS requires an external server which runs an instance of the domain name server (DNS). Based on using the DNS, SONAS will round robin each incoming request for a file to the next available public IP interface on an available interface node. SONAS serves multiple IP addresses and client gets one of these IP addresses in a round robin manner. If one of the interface node goes down, another interface node starts serving the same IP address.
146
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
performance due to cache misses; moreover, this is not supported by DNS and CIFS. When you expand your SONAS system and add new interface nodes, add the new IP addresses to the DNS server and the load will be distributed across newly configured nodes.
The external DNS server contains duplicated address records (A records) with various IP addresses. These IP addresses are configured on interface nodes. The name server rotates addresses for the name, which has multiple A records. The diagram in Figure 4-3 lists these DNS load balancing steps: Step 1: First SONAS Client sends request to external DNS Server about sonas.pl.ibm.com IP address. Step 2: DNS Server rotates addresses for the name and provides first available in this moment IP address for the client - 192.168.0.11. Step 3: Client 1 connects to data through Interface Node 1. Step 4: SONAS Client 2 send request to external DNS Server about sonas.pl.ibm.com IP address. Step 5: DNS Server rotates address for the name and provides first available in this moment - it means next - IP address for the client - 192.168.0.12 Step 6: Second client connects to data through Interface Node 2.
Chapter 4. Networking considerations
147
SONAS.virtual.com
SONAS.virtual.com
SONAS.virtual.com
10.0.0.13 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.14 10.0.0.15 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15
148
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
At the time of the failover of the node, if the session or application is not actively in a connection transferring data, the failover can usually be transparent to the client. If the client is transferring data, depending on the protocol and application, the application service failover might be transparent to the client, depending on nature of the application, and depending on what is occurring at the time of the failover. In particular, if the client application, in response to the SONAS failover and SONAS notifications, automatically does a retry of the network connection, then it is possible that the user will not see an interruption of service. Examples of software that do this can include many NFS-based applications, as well as Windows applications that do retries of the network connection, such as the Windows XCOPY utility. If the application does not do automatic network connection retries, or the protocol in question is stateful (that is, CIFS) then a client side reconnection might be necessary to re-establish the session. Unfortunately for most CIFS connections, this will be the likely case. In case of failure of an interface node, all configured IP addresses on the node are taken over and balanced by remaining interface nodes. IP balancing is done by round robin algorithm, so it means that SONAS does not check which node is more loaded in cache or bandwidth aspect. This is illustrated in Figure 4-5. IP addresses configured on interface node 2 are moved by SONAS to interface nodes 1 and 2. It means that from the SONAS client point of view host name and IP address are still the same. Failure of the node is almost transparent for the client and now it accesses data through interface node 3 as indicated by Step 6.
149
DNS host names: NFS consists of multiple separate services, protocols, and daemons that need to share metadata among each other. If due to client crash, on reboot, the client is redirected to another interface node, there is a remote possibility that the locks might be lost from the client but are still present on the previous interface node, creating problems for connection. Therefore, the use of DNS host names for mounting NFS shares is not supported. In order to balance the load on SONAS, it is best to mount shares using various IP addresses. This is an NFS limitation; for example, CIFS uses only a single session, so DNS host names can be used.
4.3 Bonding
Bonding is a method in which multiple network interfaces are combined to function as one logical bonded interface for redundancy or increased throughput. SONAS network ports can be bonded into two configurations using standard IBM SONAS bonding tools. Before creating a bond interface you have to be sure that no network is assigned to the slaves and there is no active IP address on any of the slaves. When network interfaces are bonded, a new logical interface is created, which consists of slave physical interfaces. The bonded devices can be monitored through the IBM SONAS GUI Topology pages. The MAC address of the bonding device is taken from first added slave device, then the MAC address is passed to all following slaves and remains persistent until the bonding logical device is brought down or deconfigured. The bonding interface has a hardware address of 00:00:00:00:00:00 until the first slave is added.
150
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
`
SONAS Client 1
1
sonas1.pl.ibm.com ?
DNS Server
2 3
10.0.0.3
NETWORK GROUP 1
NETWORK GROUP 2
IN IN IN IN
A A A A
151
In case of failure of an interface node in the network group, IP addresses will be taken over only by the remaining interface nodes in this network group. This is shown in Figure 4-7.
`
SONAS Client 1
1
sonas1.pl.ibm.com ?
2 3
10.0.0.3
10.0.0.3
DNS Server
NETWORK GROUP 1 Interface node2 Interface node1 10.0.0.5 10.0.0.6 10.0.0.3 10.0.0.4
NETWORK GROUP 2 sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com. IN IN IN IN A A A A 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.6
This concept can be useful to separate traffic between the production and test environment, or between two applications. It is important to understand that you can separate only network traffic, and cannot separate internal data traffic. All interface nodes have access to all exports and file systems data is accessible by interface nodes through all storage pods. To limit data placement, you can use policies as described in SONAS: Using the central policy engine and automatic tiered storage on page 107, but still it might be impossible to effectively separate traffic between two environments. You can limit or separate only network traffic to/from SONAS front-end (interface nodes). All data can be written/read to/from all storage pods, according the logical storage pool configuration and policy engine rules that are in effect. By default, a single group will contain all interface nodes; that group is called the default network group. You are allowed to configure and add nodes to custom network groups only when these nodes are detached from the default network group. It is not possible to configure a node in both default and custom network groups. It is not possible to remove the default network group, but it can be empty.
152
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
153
So the total time taken for an IO request is given by the sum of t_lat, t1, t2 and t3, we call this sum t_sum. Figure 4-9 shows the time it takes to transfer requests and responses over the network links, for example, a 61140 byte response will require 0.457764 msec over a 1GigE link, that can transfer 134217728 bytes/second, and 10 times less or 0.045776 msec on a 10GigE link.
1GigE ms/req 10GigE ms/req 134217728 1342177280 0.000872 0.000087 0.457764 0.045776
155
The faster the request transfer time over the link, the more requests (such as requests/sec or IO/sec) you can get over the link per unit of time, and consequently the greater the amount of data that can be transferred over the link per unit of time (such as MB/sec). Now introduce network latency into the equation; each IO will be delayed by a a given amount of latency milliseconds, t_lat, and so each request from the application client will have periods of data transfer, t1and t3, and idle periods measured by t_lat. During the t_lat periods the network bandwidth is not used by the application client and so it is effectively wasted, the bandwidth really available to the application client will thus be diminished by the sum of the idle periods. The table shown in Figure 4-10 calculates how the reduction of effective bandwidth is correlated with increasing network latency, and how this changes over 1 GigE and 10 GbitE links. The last four lines show a 10 GigE link, with latency - t_lat - increasing 0 to 0.001, 0.01 and 0.1 msec, t1 and t3 msec are the times spent on the network link, function of bandwidth or bytes/sec and t2, the internal latency in the server is assumed to be zero. The t_sum value is the sum of t_lat+t1+t2+t3 representing the request response time. So, for the 10 GigE case with 0.01 msec l_lat we have a response time t_sum of 0.055864 msec and so we can drive 17901 IO/sec. Each IO transfers 117 bytes request plus 61440 bytes response in total or 61557 bytes in total, and at 17901 IO/sec, we can drive a throughput of 61557 x 17901 or 1051 MB/sec (tot). Considering only the effective data transferred back to the server, 61440 bytes per IO, we also have 61440 x 17901 or 1049 MB/sec.
0 0 0 0 0 0 0 0
0.458635 2180 0.459635 2176 0.468635 2134 0.558635 1790 0.045864 21804 0.046864 21339 0.055864 17901 0.145864 6856
We can see that with a latency value of 0 on a 10 GigE link we can get a throughput of 1278MB/sec and adding in a network latency of 0.1 msec we get a a throughput of 402MB/sec that represents a 69% reduction in effective bandwidth. This reduction might appear surprising given the theoretical bandwidth available.
156
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
MB/sec (resp)
MB/sec (tot)
Nework link
t_lat ms
IO/sec
t_sum
t1 ms
t2 ms
t3 ms
The charts in Figure 4-11 show how bandwidth decreases for a single client accessing a server as latency increases: The first chart shows that the drop is much greater at higher bandwidth values: The 10 G MB/s line drops much more sharply as latency increases than does the 1 G MB/sec line. This means that the adverse effect of latency is more pronounced, the greater the link bandwidth. The second chart shows the effect of latency on a workload with a smaller blocksize or request size: 30720 bytes instead of 61440 bytes. The chart shows that at 0.1 msec latency, the throughput drops to just over 200 MB/sec with a 30720 byte response size instead of the 400 MB/sec that we get with the same latency of 0.1 msec, but a request size of 61440 bytes.
1400
1G MB/s
1200 Throughput [MB/sec] 1000 800 600 400 200 0 0 1400 0.001 latency [msec] 0.01
10G M B/s
Effect of network latency on throughput: 117 byte requests and 61440 byte response s
0.1
1G MB/s
1200 Throughput [MB/sec] 1000 800 600 400 200 0 0 0.001 latency [msec] 0.01
10G M B/s
Effect of network latency on throughput: 117 byte requests and 30720 byte response s
0.1
To summarize, evaluate your network latency to understand the effect that it can have on expected throughput for single client applications. Latency has a greater impact with larger network bandwidth links and smaller request sizes. These adverse effects can be offset by having multiple various clients access the server in parallel so they can take advantage of the unused bandwidth.
157
158
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 5.
SONAS policies
In this chapter we provide information about how you can create and use SONAS policies. We discuss the following topics: Creating and managing policies Policy command line syntax Policy rules and best practices Sample policy creation walkthrough
159
External storage pool definition rule Creates list of files for Tivoli Storage Manager server
160
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Rules must adhere to a specific syntax as documented in the Managing Policies chapter of the IBM Scale Out Network Attached Storage Administrators Guide, GA32-0713. This syntax is similar to the SQL language because it contains statements such as WHEN (TimeBooleanExpression) and WHERE SqlExpression. Rules also contain SQL expression clauses that allow you to reference various file attributes as SQL variables and combine them with SQL functions and operators. Depending on the clause, an SQL expression must evaluate to either true or false, a numeric value, or a character string. Not all file attributes are available to all rules.
161
A file can be a potential candidate for only one migration or deletion operation during one runpolicy run; only one action will be performed. The SONAS runpolicy command uses the SONAS scan engine to determine the files on which to apply specific actions. The SONAS scan engine is based on the GPFS mmapplypolicy command in the background, and mmapplypolicy runs in three phases.
162
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 5-1 shows the CLI policy commands and their interaction.
SONAS db mkpolicy create policy chpolicy change policy rmpolicy remove policy policy1 rule1 rule2 policy7 rule3 rule4
lspolicy list policies 1-all defined in db or 2-specific policy details 3-applied to all filesys SONAS cluster setpolicy apply policy to fs for new files runpolicy execute policy on fs for existing files cron Filesys44 when to run default applied filesys policy? filesys22 policy1
filesys44 policy1
163
Create a policy with the name test_copy as a copy of the existing policy test mkpolicy test_copy -CP test Create a policy with the name default with two rules assigned and marks it as the default policy mkpolicy default -R "set pool 'system';DELETE WHERE NAME LIKE '%temp%'" -D
Where <device> specifies the filesystem and <policyName> the policy contained in the database to be tested. Without the -T option, the policy will only be checked for correctness against the file system. Using the -T option will do a test run of the policy, outputting the result of applying the policy to the file system and showing which files will be migrated, as shown in Example 5-2.
Example 5-2 Checking policies for correctness [[email protected] ~]# chkpolicy gpfs0 -P HSM_external -T ... WEIGHT(inf) MIGRATE /ibm/gpfs0/mike/fset2/sonaspb26/wv_4k/dir1/test184/f937.blt TO POOL hsm SHOW() ... [I] GPFS Policy Decisions and File Choice Totals: Chose to migrate 311667184KB: 558034 of 558039 candidates; Chose to premigrate 0KB: 0 candidates; Already co-managed 0KB: 5 candidates; Chose to delete 0KB: 0 of 0 candidates; Chose to list 0KB: 0 of 0 candidates; 0KB of chosen data is illplaced or illreplicated; Predicted Data Pool Utilization in KB and %: silver 4608 6694109184 0.000069% system 46334172 6694109184 6.921624% EFSSG1000I Command successfully completed.
164
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[root@plasma]# lspolicy -P TEMPLATE-HSM Policy Name Declaration Name Default Declarations TEMPLATE-HSM stub_size N define(stub_size,0) TEMPLATE-HSM is_premigrated N define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size)) TEMPLATE-HSM is_migrated N define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size)) TEMPLATE-HSM access_age N define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))) TEMPLATE-HSM mb_allocated N define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) TEMPLATE-HSM exclude_list N define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE '%/.ctdb/%')) TEMPLATE-HSM weight_expression N define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END)) TEMPLATE-HSM hsmexternalpool N RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC' TEMPLATE-HSM hsmcandidatesList N RULE 'hsmcandidatesList' EXTERNAL LIST 'candidatesList' EXEC 'HSMLIST' TEMPLATE-HSM systemtotape N RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) [[email protected] ~]# lspolicy -A Cluster Device Policy Set Name Policies Applied Time Who applied it? plasma.storage.tucson.ibm.com testsas gtkpolicyhsm_flushat_4_12_20hr gtkpolicyhsm_flushat_4_12_20hr 4/26/10 11:17 PM root
165
A typical use case for a cron job is to transfer large amounts of data at known periods of low activity so that the migration thresholds set in the filesystem policy are rarely activated. If the files being transferred are going to external Tivoli Storage Manager storage and will be accessed at a later time the files can be premigrated by the cron job, otherwise they can be migrated by the cron job. Now assume that the filesystem is a single pool called Pool1 and migrates data to an external Tivoli Storage Manager pool called Pool3. The thresholds for this pool are 80,75 so that if the filesystem is over 80% full, then HSM will migrate data until the pool is 75% full. Assume for discussion a usage pattern that is heavy write activity from 8AM to 12PM, then heavy mixed activity (reads and writes) from 12PM to 6PM, then activity tapers off, and the system is essentially idle at 10PM. With normal threshold processing, the 80% threshold is most likely to be hit between 8AM and 12PM when Pool1 is receiving the most new data. Hitting this threshold will cause the filesystem to respond by migrating data to Pool3. The read activity associated with this migration will compete with the current host activity, slowing down the host jobs and lengthening the host processing window. If the daily write activity consisted of 10-20% of the size of the disk pool, migration will not be required during the host window if the pool started at 80%-20%=60% full. A 5% margin might be reasonable to ensure that the threshold is never hit in normal circumstances. A reasonable cron job for this system is to have a migration policy set for 10PM that has a migration threshold of 60,55 so if the filesystem is over 60% full, migrate to 55%. In addition a cron job must be registered to trigger the policy at 10PM. The cron job will activate the policy that is currently active on the filesystem. The policy will need to include two migration clauses to implement these rules, a standard threshold migration rule using threshold 80,75: RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) And a specific 10PM migration rule using threshold 60,55: RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) AND HOUR(CURRENT_TIMESTAMP)=22 This scenario has an issue, SONAS filesystem will use the lowest of the two configured thresholds to trigger its lowDiskSpace event: RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75) RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55) In this case the SONAS filesystem will trigger a policy scan at 60% and this operation will happen every 2 minutes, and generally it will not be at 10PM. The scan will traverse all files in the filesystem and, as it is not 10PM, it will not find any candidates but it will create al lot of wasted metadata activity, the policy will work, just burn lots of CPU and disk IOPs. How can this behavior be avoided? There are two solutions, either avoid threshold in the cron job call or use backup coupled with HSM. To avoid the threshold, consider your storage usage, determine a time period that accomplishes your goal without using a threshold, for example, using a rule that states migrate all files that have not been accessed in the last 3 days using a statement like this: (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 2 166
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
This method has the advantage that it avoids threshold spin but has the disadvantage that it cannot premigrate files. To avoid the current cron job current limitation that only allows you to run the active filesystem policy, the one that was put there with setpolicy, you can use an external scheduler to execute a SONAS command using ssh and do a runpolicy <mySpecificPolicy> using a command similar to the following example: ssh <[email protected]> runpolicy <mySpecificPolicy>
Placement rule
Placement rules
All SONAS policies must end with a default placement rule. If you are running with HSM, consider using default = system to set the system pool as default. Data will be probably configured to cascade, so put most of the data in the fastest pool, then let it cascade through tiers, using the following statement: RULE 'default' set pool 'system'
167
If you are running with ILM and tiered storage, consider default = pool2 where pool2 is a slower pool. The files default to slower pool and select files for faster pool explicitly. That way, if you forget a filter, it goes into the slower, and hopefully larger, pool. Use a statement such as this one: RULE 'default' set pool 'pool2' Remember placement rules only apply to files created after placement rule is applied and that placement rules do not affect recalled files as they will return to the pool they migrated from.
Macro defines
Policies can be coded using defines, also called macro defines. These are essentially named variables used to make rules easier to read. For example, the statement creates a define named mb_allocated and sets it to the size of the file in MB. define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) Defines offer a convenient way to encapsulate weight expressions so as to provide common definitions across the policy. These are typical common exclusions: special file migration exclusion definition: Always use this when migrating. migrated file migration exclusion definition: Always use this when migrating.
Summary
A policy is a set of rules; macros can be used to make rules easier to read. Rules determine what the policy does, and the first rule matched applies to a file, so order will matter. There are two major types of rules: placement rules determine what pool a file is placed in when it first appears in the filesystem, and migration rules specify the conditions under which a file that exists in the filesystem is moved to another pool. Migration policies must include the special file exclusion clause and migrated exclusion clause.
168
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Non-threshold migration will need an associated cron job to trigger it, as discussed later for migration filters. The policy is terminated by the default placement rule: RULE 'default' set pool 'system' We used a default of a higher performance pool because subsequent tiering will cascade data from high performance to low performance pools.
169
When SONAS identifies that a threshold has been reached, it will trigger a new lowspace event every two minutes so long as the fill level of the filesystem is above the threshold. SONAS knows that a migration was already triggered, so it ignores the new trigger and it will not do any additional processing, the migration that started earlier continues execution.
170
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The filesystem high threshold must allow the peak use period to finish without filling the filesystem 100%. Always use a threshold if you are using Information Lifecycle Management/HSM. Even if you do not expect to hit the threshold, this will provide a safety net in case your other policies have bugs, or in case your usage profile changes. Be aware that a cron job that exploits a low threshold rule will cause metadata spin. Migrations rules with no threshold do not trigger automatically but need a cron job for that. Tivoli Storage Manager will clone backups if HSM migration is done first, migration will still take the same amount of time to move data from SONAS to Tivoli Storage Manager, but backups might be faster depending on server throughput. The migrequiresbackup option can be set at the Tivoli Storage Manager server and the option can be used to prevent the following scenario: If ACL data of a premigrated file are modified, these changes are not written to the Tivoli Storage Manager server, if the file will be migrated after this change. To avoid loosing the modified ACL data, use the option migrequiresbackup yes. This setting will not allow you to migrate files whose ACL data has been modified and for which no current backup version exists on the server. When using migrequiresbackup, you must back up files or you might run out of space because HSM will not move files.
You can also list the available storage pools for a specific filesystem by selecting Storage Storage Pools as shown in Figure 5-4. Note that you have only one storage pool, system, for our file system. The name, system, is the default storage pool name and cannot be removed.
171
To assign a disk to a filesystem, proceed to the Files Files Systems panel. Select the redbooks file system to which you want to assign a new NSD disk with another storage pool. After selecting that filesystem, you will see the File System Disks window as shown in Figure 5-5.
Click the Add a disk to the file system button and a panel like that in Figure 5-6 is shown. Select the disks to add, choose a disk type, specify a storage pool name, and click OK.
172
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
After the task completes, you will see that the filesystem now resides on two disks, with the file system and storage pool usage as shown in Figure 5-7.
Modify the storage pool assignment for the NSD called gpfs4nsd using the chdisk and lsdisk commands as shown in Example 5-5. Attributes such as pool name, usage type, and failure group cannot be changed for disks that ar active in a filesystem.
Example 5-5 Change storage pool and data type assignment
[sonas02.virtual.com]$ chdisk gpfs4nsd --pool silver --usagetype dataonly EFSSG0122I The disk(s) are changed successfully! [sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs5nsd redbook2 1 gpfs4nsd 1 gpfs6nsd 1
Availability up up up up
Timestamp 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/21/10 10:50 PM 4/22/10 10:46 PM
173
To add the gpfs4nsd to the redbook2 filesystem use the chfs command as shown in Example 5-6.
Example 5-6 Add a disk to the redbook2 filesystem
[sonas02.virtual.com]$ chfs --add gpfs4nsd redbook2 The following disks of redbook2 will be formatted on node strg002st002.virtual.com: gpfs4nsd: size 1048576 KB Extending Allocation Map Creating Allocation Map for storage pool 'silver' 31 % complete on Thu Apr 22 22:53:20 2010 88 % complete on Thu Apr 22 22:53:25 2010 100 % complete on Thu Apr 22 22:53:26 2010 Flushing Allocation Map for storage pool 'silver' Disks up to size 24 GB can be added to storage pool 'silver'. Checking Allocation Map for storage pool 'silver' 83 % complete on Thu Apr 22 22:53:32 2010 100 % complete on Thu Apr 22 22:53:33 2010 Completed adding disks to file system redbook2. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem redbook2 has been successfully changed.
You can verify the storage pools and NSD assignment with the lspool command as shown in Example 5-1:
Example 5-7 Listing storage pools
[sonas02.virtual.com]$ Filesystem Name Size gpfs0 system 2.00 redbook system 1.00 redbook2 silver 1.00 redbook2 system 1.00 lspool Usage GB 4.2% GB 14.7% GB 0.2% GB 14.7% Available fragments Available blocks 350 kB 1.91 GB 696 kB 873.00 MB 14 kB 1021.98 MB 704 kB 873.00 MB Disk list gpfs1nsd;gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd
Repeat the lsdisk command to confirm the correct filesystem to disk assignments as shown in Example 5-8:
Example 5-8 Listing NSD disks
[sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs4nsd redbook2 1 gpfs5nsd redbook2 1 gpfs6nsd 1 Type dataAndMetadata dataAndMetadata dataAndMetadata dataOnly dataAndMetadata Pool system system system silver system system Status ready ready ready ready ready ready Availability up up up up up Timestamp 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/21/10 10:50 PM 4/22/10 3:03 AM 4/22/10 10:59 PM
174
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In the policy details section of the window type your policy. Note that you can also load the policy from a file on your computer by pressing the Load policy button. Click the Set policy button and choose apply at the prompt to set the policy. After this click the Apply policy button and choose apply at the prompt to apply the policy. After applying the policy you will see a panel as shown in Figure 5-9 showing a summary of the policy that will be applied. Policies are now active.
175
Important: The CLI mkpolicy and mkpolicyrule commands do not accept the RULE statement, so the RULE statement must be removed from all policies statements.
176
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
We create the policy and add the first rule using the mkpolicy command to create the policy with the first rule and the mkpolicyrule command to append policy rules to the redpolicy policy as shown in Example 5-10.
Example 5-10 Create a new policy
[sonas02]# mkpolicy -P "redpolicy" -R " set POOL 'silver' WHERE UPPER(name) like '%.TXT' ;" [sonas02]# mkpolicyrule -P "redpolicy" -R " MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ;" [sonas02]# mkpolicyrule -P "redpolicy" -R " set POOL 'system' "
We list all policies defined using the CLI with the lspolicy -P all command (Example 5-11.)
Example 5-11 List all policies
[sonas02]# lspolicy Policy Name Rule Number redpolicy 1 redpolicy 2 redpolicy 3 -P all Rule Is Default set POOL 'silver' WHERE UPPER(name) like '%.TXT' N MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' N set POOL 'system' N
Important: You cannot list policies created using the GUI with lspolicy. Now we validate the policy using the chkpolicy command as shown in Example 5-12.
Example 5-12 Validate the policy
[sonas02]# chkpolicy -P "redpolicy" -T validate -d redbook2 -c sonas02.virtual.com No error found. All the placement rules have been validated.
After successful validation, we set the policy for filesystem redbook2 using the setpolicy command as shown in Example 5-13. We then run the lspolicy -A command to verify what filesystems have policies.
Example 5-13 Set the policy
[sonas02]# setpolicy -P "redpolicy" -d redbook2 -c sonas02.virtual.com
[[email protected] ~]# lspolicy -A Cluster Device Policy Name Applied Time Who applied it? sonas02.virtual.com redbook2 redpolicy 4/26/10 11:00 PM root sonas02.virtual.com gpfs0 N/A sonas02.virtual.com redbook N/A
Attention: Policies created with the GUI do not appear in the SONAS CLI lspolicy -A command. The redbook filesystem does have a valid policy that was set using the GUI as shown in Example 5-14. It was created using the GUI because it has RULE statements with word comments that are not allowed by the CLI.
Example 5-14 Policies applied to filesystems
[sonas02]# lspolicy -d redbook Cluster Device Policy Last update sonas02.virtual.com redbook RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE 'default' set POOL 'system' 4/26/10 10:59 PM [sonas02]# lspolicy -d redbook2 Cluster Device Policy Last update sonas02.virtual.com redbook2 /* POLICY NAME: redpolicy */ ; RULE '1' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE '2' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE '3' set POOL 'system' 4/26/10 11:10 PM
177
We verify that files ending with the .txt extension are placed in the silver pool, other files go to the system pool and .pdf files are allocated in the system pool and subsequently moved to the silver pool. We have created three files, we list them with ls -la and then run the GPFS mmlsattr command to verify file placement, as shown in Example 5-16. The files are placed as follows: test1.mp3 on the system pool test2.txt on the silver pool test3.pdf on the system pool
Example 5-16 Files allocated and managed by policies
[[email protected] export2]# ls -la drwxr-xr-x 2 VIRTUAL\administrator root 8192 Apr 23 drwxr-xr-x 4 root root 32768 Apr 22 -rw-r--r-- 1 root root 0 Apr 23 -rw-r--r-- 1 root root 0 Apr 23 -rw-r--r-- 1 root root 0 Apr 23 04:52 02:32 04:51 04:51 04:52 . .. test1.mp3 test2.txt test3.pdf
[[email protected] export2]# mmlsattr -L test* file name: test1.mp3 metadata replication: 1 max 2 data replication: 1 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: test2.txt 1 max 2 1 max 2 no silver root
178
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Now we can apply the policy using the GUI by going to Files Policies, selecting our file system and clicking the Apply policy button and choosing Apply. The apply policy causes the migration rule to be applied. After policy execution, we verify the correct placement of the files using the mmlsattr command as shown in Example 5-17. The files will now be placed on storage pools as follows: test1.mp3 remains on the system pool. test2.txt remains on the silver pool. test3.pdf has been moved to the silver pool.
Example 5-17 List file status
[[email protected] export2]# mmlsattr -L test* file name: test1.mp3 metadata replication: 1 max 2 data replication: 1 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name:
file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name:
Important: The mmlsattr command is a GPFS command that must be run on SONAS using root authority. However, SONAS does not support running commands with root authority. SONAS development recognizes the need for an equivalent SONAS command to verify file placement of files in storage pools.
179
180
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 6.
181
182
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
int002st002.virtual.com.
5. To re-enable activity on node int001st002.virtual.com, we select it and click the Resume button, Figure 6-2 shows the resulting status. Note that the public IP addresses have been rebalanced across the nodes and that the status for the node is active.
183
MAC Master/Slave Up/Down 02:1c:5b:00:01:01 UP 02:1c:5b:00:01:02 UP 02:1c:5b:00:01:03 UP 02:1c:5b:00:02:01 UP 02:1c:5b:00:02:02 UP 02:1c:5b:00:02:03 UP 02:1c:5b:00:00:01 UP 02:1c:5b:00:00:02 UP 02:1c:5b:00:00:03 UP
In Figure 6-3 we see that in normal operating conditions each interface node has two public IP addresses. Figure 6-4 shows that after a node failover, all public IP addresses have been moved to interface node int002st002, and node int001st002 is hosting no IP addresses.
184
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[[SONAS]$ Node int001st002.virtual.com int001st002.virtual.com int001st002.virtual.com int002st002.virtual.com int002st002.virtual.com int002st002.virtual.com mgmt001st002.virtual.com mgmt001st002.virtual.com mgmt001st002.virtual.com
Interface MAC 02:1c:5b:00:01:01 02:1c:5b:00:01:02 02:1c:5b:00:01:03 02:1c:5b:00:02:01 02:1c:5b:00:02:02 02:1c:5b:00:02:03 02:1c:5b:00:00:01 02:1c:5b:00:00:02 02:1c:5b:00:00:03
185
186
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Mount requests: As each node can start up to eight parallel sessions, the Tivoli Storage Manager client maxnummp parameter must be set to eight. This means that a Tivoli Storage Manager client node can initiate up to eight mount requests for Tivoli Storage Manager sequential media on the server.
187
Set up the SONAS client definitions on the Tivoli Storage Manager servers. You must execute these steps on all the Tivoli Storage Manager servers: 1. Connect to the first Tivoli Storage Manager server to be configured as a Tivoli Storage Manager administrator with the Tivoli Storage Manager command line interface (CLI) client by running the dsmadmc command on a system with the Tivoli Storage Manager administrative interface installed 2. Register a virtual node name for the SONAS cluster. You can choose any name you like providing it is not already registered to Tivoli Storage Manager. You can choose the SONAS cluster name, for example, sonas1 with password sonas1secret and register the node to a Tivoli Storage Manager domain called sonasdomain. Use the Tivoli Storage Manager register node command as follows: register node sonas1 sonas1secret domain=sonasdomain 3. Register one Tivoli Storage Manager client node for each SONAS interface node that will run the Tivoli Storage Manager client. Assuming we have the following three interface nodes: int1st2, int2st2 and int3st2 we register a separate Tivoli Storage Manager node and password for each one using the Tivoli Storage Manager register node command as follows: register node int1st2node int1st2pswd domain=sonasdomain register node int2st2node int1st2pswd domain=sonasdomain register node int3st2node int1st2pswd domain=sonasdomain 4. Grant all the Tivoli Storage Manager client nodes representing the interface nodes proxy access to the Tivoli Storage Manager virtual node representing the SONAS cluster using the Tivoli Storage Manager grant proxynode administrator command, assuming we have the following three interface nodes, Tivoli Storage Manager clients: int1st2node, int2st2node and int3st2node: and that the cluster is called sonas1, we run the following Tivoli Storage Manager administrator command: grant proxynode target=sonas1 agent=int1st2node,int2st2node,int3st2node 5. Now we create a Tivoli Storage Manager server stanza, an entry into the Tivoli Storage Manager configuration file, on all the SONAS interface nodes. Assuming the Tivoli Storage Manager server is called tsmsrv1 and has IP address tsmsrv1.com with port 1500 and we have the following three interface nodes to configure for backup: int1st2, int2st2 and int3st2. 6. Connect to node int1st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int1st2node sonas1 int1st2 int1st2pswd 7. Connect to node int2st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int2st2node sonas1 int2st2 int2st2pswd 8. Connect to node int3st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int3st2node sonas1 int3st2 int3st2pswd 9. Repeat steps from 1 to 8 for all the Tivoli Storage Manager servers you want to configure. Now the Tivoli Storage Manager servers will be configured on all interface nodes. You can verify this by issuing the SONAS lstsmnode command without arguments to see all Tivoli Storage Manager stanza information about all interface nodes.
188
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
# lsbackupfs File system TSM serv List of nodes Status gpfsjt tsmsrv1 int1st2,int2st2 NOT_STARTED
Now that the backup is fully configured we can run our first backup operation using the SONAS startbackup CLI command. The command accepts a list of one or more filesystems, and specifying no arguments makes the command backup all filesystems with configured backup destinations. For example, to start backing up the file system gpfsjt, issue: startbackup gpfsjt The command starts backup execution as a background operation and returns control to the caller. You will have to monitor the status and completion of the backup operation for the specific filesystem using the lsbackup SONAS command, as shown in Example 6-2.
Example 6-2 lsbackup command output # lsbackup gpfsjt Filesystem Date Message gpfsjt 20.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started. gpfsjt 19.01.2010 12:30:52 G0702IEFSSG0702I The filesys gpfsjt backup was done successfully. gpfsjt 18.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started.
You can also list the Tivoli Storage Manager server and backup interface node associations the status of the latest backup, and validate the backup configuration by using the lsbackupfs -validate SONAS command, for example, see Example 6-3.
Example 6-3 Listing backup configuration and status # lsbackupfs -validate File system TSM server List of nodes Status Start time gpfsjt tsmsrv1 int1st2,int2st2 COMPLETED_SUCCESSFULLY 1/21/10 04:26 (.. continuation of lines above ..) .. End time Message Validation Last update .. 1/21/10 04:27 INFO: backup ok (rc=0). Node is OK,Node is OK 1/21/10 04:27
189
Tivoli Storage Manager backups can be scheduled by using the CLI or GUI using the scheduled task called StartBackupTSM. To schedule a backup of all SONAS filesystems at 4:15 AM you use mktask as shown next: mktask StartBackupTSM --parameter sonas02.virtual.com --minute 15 --hour 4 Files backed up to Tivoli Storage Manager can be restored using the startrestore SONAS CLI command. The startrestore command takes the filename or pattern as an argument so you need to know the name of the files or directories to restore, you can also specify a restore date and time. Specifying no date time filters will return the most recent backup data. The files will be restored to the original location or to another location if desired and you can choose wether to replace the original files. An example of the restore command with the replace option follows. startrestore "/ibm/gpfsjt/dirjt/*" -R The lsbackupfs command will show if a restore is currently running by displaying RESTORE_RUNNING in the message field.
190
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The SONAS HSM client must be configured to run on all the interface nodes in the SONAS cluster as migrated files can be accessed from any node and so the Tivoli Storage Manager HSM client needs to be active on all the nodes. All SONAS HSM configuration commands will be run using the SONAS CLI and not the GUI.
191
To migrate a file, HSM sends a copy of the file to a Tivoli Storage Manager server and replaces the original file with a stub file on the local file system. A stub file is a small file that contains the information required to locate and recall a migrated file from the Tivoli Storage Manager Server. It also makes it appear as though the file still resides on your local file system. Similar to backups and archives, migrating a file does not change the access time (atime) or permissions for that file. SONAS storage management policies control and automate the migration of files between storage pools and external storage. A feature of automatic migration is the premigration of eligible files. The HSM client will detect this condition and begin to automatically migrate eligible files to the Tivoli Storage Manager Server. This migration process will continue to migrate files until the file system utilization falls below the defined low threshold value. At that point, the HSM client will begin to premigrate files. To premigrate a file, HSM copies the file to Tivoli Storage Manager storage and leaves the original file intact on the local file system (that is, no stub file is created). An identical copy of the file resides both on the local file system and in Tivoli Storage Manager storage. The next time migration starts for this file system, HSM can quickly change premigrated files to migrated files without having to spend time copying the files to Tivoli Storage Manager storage. HSM verifies that the files have not changed since they were premigrated and replaces the copies of the files on the local file system with stub files. When automatic migration is performed, premigrated files are processed before resident files as this allows space to be freed in the file system more quickly A file managed by HSM can be in multiple states: Resident Migrated Premigrated A resident file resides on the local file system. For example, a newly created file is a resident file. A migrated file is a file that has been copied from the local file system to Tivoli Storage Manager storage and replaced with a stub file. A premigrated file is a file that has been copied from the local file system to Tivoli Storage Manager storage but has not been replaced with a stub file. An identical copy of the file resides both on the local file system and in Tivoli Storage Manager storage. A file can be in the premigrated state after premigration. If a file is recalled but not modified, it will also be in the premigrated state.
To return a migrated file to your workstation, access the file in the same way as you might access a file that resides on your local file system. The HSM recall daemon automatically recalls the migrated file from Tivoli Storage Manager storage. This process is referred to as transparent recall.
192
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
6.3 Snapshots
SONAS offers filesystem level snapshots that allow you to create a point in time copy of all the user data in a filesystem. System data and currently existing snapshots are not copied with the snapshot operation. The snapshot function allows other programs, such as backups, to run concurrently with user updates and still obtain a consistent copy of the file system at the time the snapshot copy was created. Snapshots also provide an online backup capability that allows easy recovery from common problems such as accidental deletion of a file, and comparison with older versions of a file. One SONAS cluster supports a maximum of 256 snapshots for each filesystem. When you exceed the 256 snapshot limit you will not be able to create new snapshots and will receive an error until you remove one or more existing snapshots. The SONAS snapshots are space efficient because they only keep a copy of data blocks that have subsequently been changed or have been deleted from the filesystem after the snapshot has been taken.
193
194
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
4. Click the Create new snapshot button. 5. You will be prompted for a name for the new snapshot; accept the default name if you want the snapshot to be integrated with Windows VSS previous versions and click OK to proceed. 6. You will see a task progress indicator window as shown in Figure 6-7. You can monitor task progression using this window.
7. You can close the task progress window by clicking the Close button.
195
8. You will now be presented with the list of available snapshots as shown in Figure 6-8.
To list all snapshots from all filesystems, you can use the lssnapshot command as shown in Figure 6-10. The command retrieves data regarding the snapshots of a managed cluster from the database and returns a list of snapshots:
[SONAS]$ lssnapshot Cluster ID Device name Path Status 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid Figure 6-10 List all snapshots for all filesystems
Creation Used (metadata) Used (data) 09.04.2010 02:32:43.000 16 0 5 09.04.2010 01:59:06.000 16 0 4 08.04.2010 22:52:56.000 64 1 1
Note the ID Timestamp field is the same for all snapshots, and this indicates the timestamp of the last SONAS database refresh. The lssnapshots command with the -r option forces a refresh of the snapshots data in the SONAS database by scanning all cluster snapshots before retrieving the data for the list from the database.
Removing snapshots
Snapshots can be removed using the rmsnapshot command or from the GUI. For example, to remove a snapshot for filesystem gpfsjt using the command line, proceed as shown in Figure 6-11 using the following steps: 1. Issue the lssnapshot command for filesystem gpfsjt and choose a snapshot to rem.ove by choosing that snapshots name, for example, @GMT-2010.04.08-23.58.37. 2. Issue the rmsnapshot command with the name of the filesystem and the name of the snapshot. 3. To verify if the snapshot has been removed, issue the lssnapshot command again and check that the removed snapshot is no longer present.
196
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[SONAS]$ lssnapshot -d gpfsjt ClusID Devname Path Status Creation Used (metadata) 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 09.04.2010 01:59:06.000 16 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 [SONAS]$ rmsnapshot gpfsjt @GMT-2010.04.08-23.58.37
[SONAS]$ lssnapshot -d gpfsjt ClusID DevName Path Status Creation Used (metadata) Used (data) ... 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 ... 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 ... Figure 6-11 Removing snapshots
Note that to create scheduled cron tasks, you must issue the mktask command from the CLI, it is not possible to create cron tasks from the GUI. To list the snapshot task that you have created you can use the lstask command an shown in Figure 6-13.
[[SONAS]$ lstask -t cron Name Description Status Last run Runs on Schedule MkSnapshotCron This is a cronjob for scheduled snapshots. NONE N/A Mgmt node Runs at every 5th minute. Figure 6-13 List scheduled tasks
And to verify that snapshots are being correctly performed you can use the lssnapshot command as shown in Figure 6-14.
[SONAS]$ lssnapshot Cluster ID Device name Path 72..77 gpfsjt @GMT-2010.04.09-03.15.06 72..77 gpfsjt @GMT-2010.04.09-03.10.08 72..77 gpfsjt @GMT-2010.04.09-03.05.03 72..77 gpfsjt @GMT-2010.04.09-03.00.06 72..77 gpfsjt @GMT-2010.04.09-00.32.43 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Figure 6-14 List snapshots
Status Creation Used (metadata) Used (data) ID 09.04.2010 05:15:08.000 16 0 9 09.04.2010 05:10:11.000 16 0 8 09.04.2010 05:05:07.000 16 0 7 09.04.2010 05:00:07.000 16 0 6 09.04.2010 02:32:43.000 16 0 5 08.04.2010 22:52:56.000 64 1 1
197
198
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
replication
synchronous asynchronous
periodic continous
Asynchronous replication is normally used when the additional latency due to the distance becomes problematic because it causes an unacceptable elongation to response times to the primary application.
199
Figure 6-17 shows two SONAS clusters with file1 replicated using intracluster replication and file2 replicated with intercluster replication.
SONAS cluster#1
Interf.1 Interf.2 Interf.n
SONAS cluster#2
Interf.1 Interf.2 Interf.n
stor pod1
stor pod2
stor pod1
InTERcluster replication
file2
stor1
200
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Synchronous replication does not distinguish between the two storage copies and these are both peers, SONAS does not have a preferred failure group concept where it will send all reads, reads are sent to disks in both failure groups. Synchronous replication in the SONAS filesystem offers the following replication choices: No replication at all Replication of metadata only Replication of data and metadata It is best that metadata replication always be used for file systems within SONAS cluster. Synchronous replication can be established at file system creation time or later when the filesystem already contains data. Depending on when replication is applied, various procedures must be followed to enable synchronous replication. Synchronous replication requires that the disks belong to two distinct failure groups so as to ensure that the data and metadata is not replicated to the same physical disks. It is best that the various failure groups be defined on various storage enclosures, storage controllers to guarantee a possibility if failover in the case that a physical disk component becomes unavailable. Synchronous replication has the following prerequisites: Two separate failure groups must be present. The two failure groups must have the same number of disks. The same number of disks from each failure group and the same disk usage type must be assigned to the filesystem.
201
We use lsdisk to see the available disks and lsfs to see the filesystems as shown in Figure 6-18.
[SONAS]$ Name Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd gpfs6nsd lsdisk File system File system gpfs0 gpfs0 gpfsjt
Failure group Type Failure group Type 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 2 dataAndMetadata
Availability Availability up up up
Timestamp Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 1:55 4/13/10 1:55 4/13/10 1:55
AM AM AM AM AM AM
[SONAS]$ lsfs Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 1 1 whenpossible T Figure 6-18 Disks and filesystem before replication
Using the example in Figure 6-18, we verify the number of disks currently assigned to the
gpfsjt filesystem in the lsdisk output and see there is only one disk used called gpfs3nsd.
To create the synchronous replica, we need the same number of disks as the number of disks currently assigned to the filesystem. From the lsdisk output, we also verify that there are a sufficient number of free disks that are not assigned to any filesystem. We use the disk called gpfs5nsd to create the data replica. The disk called gpfs5nsd is currently in failure group 1 as the primary disk, and we must assign the disk to a separate failure group 2, using the chdisk command as shown in Figure 6-19 and then we verify the disk status with lsdisk. Also verify that the new disk, gpfs5nsd is in the same pool as the current disk gpfs3nsd:
[SONAS]$ chdisk gpfs5nsd --failuregroup 2 EFSSG0122I The disk(s) are changed successfully! [SONAS]$ Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd gpfs6nsd lsdisk File system gpfs0 gpfs0 gpfsjt
Failure group Type Pool 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata userpool 2 dataAndMetadata system 2 dataAndMetadata userpool
Availability up up up
Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 2:15 4/13/10 2:15 4/13/10 2:15
AM AM AM AM AM AM
202
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
At this point we add the new disk to file system gpfsjt using the chfs -add command as illustrated in Figure 6-20 and verify the outcome using the lsdisk command.
[SONAS]$ chfs gpfsjt -add gpfs5nsd The following disks of gpfsjt will be formatted on node mgmt001st002.virtual.com: gpfs5nsd: size 1048576 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' 52 % complete on Tue Apr 13 02:22:03 2010 100 % complete on Tue Apr 13 02:22:05 2010 Completed adding disks to file system gpfsjt. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem gpfsjt has been successfully changed. [SONAS]$ Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs5nsd gpfs4nsd gpfs6nsd lsdisk File system gpfs0 gpfs0 gpfsjt gpfsjt
Failure group Type Pool 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata system 2 dataAndMetadata system 1 dataAndMetadata userpool 2 dataAndMetadata userpool
Availability up up up up
Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 2:26 4/13/10 2:26 4/13/10 2:26
AM AM AM AM AM AM
[SONAS]$ lsfs Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 1 1 whenpossible T Figure 6-20 Add a disk to a filesystem
From the lsdisk output, we can see that gpfs5nsd is assigned to filesystem gpfsjt, and from the lsfs output, we notice that we still only have one copy of data and metadata as shown in the Data replicas and Metadata replicas columns. To activate data and metadata replication, we need to execute the chfs -R command as shown in Figure 6-21.
[SONAS]$ chfs gpfsjt -R all EFSSG0020I The filesystem gpfsjt has been successfully changed. [SONAS]$ lsfs Cluster DevicenMountpoint Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 2 2 whenpossible T Figure 6-21 Activate data replication
The lsfs command now shows that there are two copies of the data in the gpfsjt filesystem. Now we perform the restripefs command with the replication switch to redistribute data and metadata as shown in Figure 6-22.
203
Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... 64 % complete on Thu Apr 15 23:11:00 2010 85 % complete on Thu Apr 15 23:11:06 2010 100 % complete on Thu Apr 15 23:11:09 2010 Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... EFSSG0043I Restriping of filesystem gpfsjt completed successfully. [[email protected] dirjt]#
Figure 6-22 Restripefs to activate replication
SONAS does not offer any command to verify that the file data is actually being replicated. To verify the replication status, connect to SONAS as a root user and issue the mmlsattr command with the -L switch as illustrated in Figure 6-23. The report shows the metadata and data replication status; we can see that we have two copies for both metadata and data.
s
[[email protected] userpool]# mmlsattr -L * file name: f1.txt metadata replication: 2 max 2 data replication: 2 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: f21.txt 2 max 2 2 max 2 no userpool root
Filesystem synchronous replication can also be disabled using the chfs command as shown in the following example: chfs gpfsjt -R all After changing the filesystem attributes, the restripefs command must be issued to remove replicas of the data, as shown in the following example: restripefs gpfsjt --replication
204
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The SONAS interface nodes are defined as the elements for performing the replication functions. When using async replication, the SONAS system detects the modified files from the source system, and only moves the changed contents from each file to the remote destination to create an exact replica. By only moving the changed portions of each modified file, the network bandwidth is used very efficiently. The file based movement allows the source and destination file trees to be of differing sizes and configurations, as long as the destination file system is large enough to hold the contents of the files from the source. Async replication allows all or portions of the data of a SONAS system to be replicated asynchronously to another SONAS system and in the event of an extended outage or loss of the primary system the data kept by the backup system will be accessible in R/W by the customer applications. Async replication also offers a mechanism to replicate the data back to the primary site after the outage or new system is restored.
205
The backup system also offers concurrent R/O access to the copy of the primary data testing/validation of the disaster recovery mirror. The data at the backup system can be accessed by all of the protocols in use on the primary system. You can take R/W snapshot of the replica, which can be used to allow for full function disaster recovery testing against the customer's applications. Typically, the R/W snapshot is deleted after the disaster recovery test has concluded. File shares defined at the production site are not automatically carried forward to the secondary site, and must be manually redefined by the customer for the secondary location, an these shares must be defined as R/O until such time that they need to do production work against the remote system in full R/W, for example, for business continuance in the face of a disaster. Redefinition to R/W shares can be done by using the CLI or GUI. The relationship between the primary and secondary site is a 1:1 basis: one primary and one secondary site.The scope of an async replication relationship is on a file system basis. Best practices will need to be followed to ensure that the HSM systems are configured and managed to avoid costly performance impacts during the async replication cycles that can be due to the fact that the file has been migrated to offline storage before being replicated and needs to be recalled from offline storage for replication to occur.
206
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
207
208
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 6-25 illustrates the relationship between the primary and secondary sites for this scenario.
Users
DR Users
rsync File tree A snapshot File tree B replica rsync File tree B replica snapshot File tree B snapshot File tree A replica snapshot File tree B
209
This is done through administration commands; you start on the destination SONAS system: 1. Define the source SONAS system to the destination SONAS: cfgrepl sourcecluster -target Where sourcecluster is the hostname or IP address of the source cluster's Management Node. 2. Define the file tree target on the destination SONAS to hold the source SONAS file tree. This creates the directory on the destination SONAS to be used as the target of the data for this replication relationship. mkrepltarget path sourcecluster Where path is the file system path on the destination SONAS, to be used to hold the contents of the source SONAS file tree, and sourcecluster is the hostname or IP address of the source cluster's management node (matching the one provided to the cfgrepl command). After the destination SONAS system is defined, the source SONAS needs to be configured through the following administrative actions: 1. Configure the async relationship on the source SONAS cluster cfgrepl targetcluster {-n count | --pairs source1:target1 [, source2:target2 ]} --source Where: targetcluster is the hostname or IP address of the target cluster's Management Node. count is the number of node pairs to use for replication. pairs is the explicit mapping of the source/destination node pairs to use for replication. 2. Define the relationship of the source file tree to the target file tree: cfgreplfs filesystem targetpath Where filesystem is the source file tree to be replicated to the destination and targetpath is the full path on the destination where the replica of the source has to be made. The configuration of the async replication determines how the system performs the mirroring of the data for disaster recovery. The configuration step identifies which SONAS nodes participate in the replication for the source and destination systems. At least one source and target pair must be specified with the cfgrepl CLI command, multiple pairs can be entered separated by commas. When setting up replication using this command, the following restrictions are in place: All source nodes must be in the same cluster. The IP addresses of the source node must be the internal IP addresses associated with the InfiniBand network with the SONAS. All target nodes must be in the same cluster. The IP addresses of the target nodes must be the public IP addresses of the interface nodes that CTDB controls. Source and target cannot be in the same cluster. The first source node specified controls the replication, and is considered the replication manager node. Multiple source nodes can replicate to the same destination.
210
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The cfgrepl command creates a configuration file, /etc/asnc_repl/arepl_table.conf, which contains the information provided with the following internal structure: src_addr1 src_addr2 src_addr3 dest_addr1 dest_addr2 dest_addr3
Part of the async configuration needs to ensure that the source cluster can communicate to the destination cluster without being challenged with the SSH/scp password requests. To achieve this, the ssh key from the id_rsa.pub from the destination SONAS system needs to be added to the authorized_keys file of the source nodes participating in the async operation.
211
The name of the snapshot is based off of the path to the async replication directory on the destination system, with the extension _cnreplicate_tmp appended to it. For example, if the destination file tree for async is /ibm/gpfsjt/async, then the resulting snapshot directory will be created in the source file system: /ibm/gpfs0/.snapshots/ibm_gpfsjt_async_cnreplicate_tmp These snapshots are alongside any other snapshots created by the system as a part of user request. The async replication tool will ensure that it only operates on snapshots it created with its own naming convention. These snapshots do count towards the 256 snapshot limit per a file system, and can therefore be accounted for with the other snapshots used by the system. After the successful completion of async replication, the snapshot created in the source file system is removed. After the completion of the async replication, a snapshot of the filesystem containing the replica target is performed. The name of the snapshot is based off of the destination path to the async replication directory with the extension _cnreplicate_tmp appended to it. As with source snapshots, these snapshots are alongside any other snapshots created by the system as a part of user request. The async replication tool will ensure that it only operates on snapshots it created with this naming convention. These snapshots do count towards the 256 snapshot limit per a file system, and can therefore be accounted for with the other snapshots used by the system.
212
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
213
Business continuance
The steps for enabling the recovery site involve the following major components: 1. Perform baseline file scan of file tree replica used as the target for the async replication 2. Define shares/exports to the file tree replica 3. Continue production operation against remote system The baseline scan will establish the state of the remote system files which was last received by the production site, which will track the changes made from this point forward. For the configuration where the secondary site was strictly only a backup for the production site, establishing the defined shares for the replica to enable it for production is the primary consideration. Figure 6-27 illustrates this scenario.
Users
DR Users
If the second site contained its own production file tree in addition to replicas, then the failure also impacts the replication of its production file systems back to the first site as illustrated in Figure 6-28.
Common AD w/SFU LDAP, NIS
User group A SONAS cluster#1 File tree A File tree A snapshot File tree B replica File tree B replica snapshot
User group B SONAS cluster#2 File tree A replica File tree A replica snapshot File tree B File tree B snapshot
214
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The steps to recover at the disaster recovery site are as follows: 1. Run the startrepl command with -S parameter to run a scan only on the destination system to establish a point in time of the current file tree structure. This allows the system to track changes to the destination file tree in order to assist in delta file update back to the original production system. 2. Define shares to destination file systems as R/W using the mkexport command, or change existing R/O shares used for validation/testing to R/W using the chexport command. 3. Proceed with R/W access to data at disaster recovery location against the file tree.
215
Users
DR Users
In the scenario where the second site was used for both active production usage and as a replication target, the recovery is as illustrated in Figure 6-30.
rsync File tree A snapshot File tree B replica rsync File tree B replica snapshot File tree B snapshot File tree A replica snapshot File tree B
The loss of the first site also lost the replica of the second's site file systems, which will need to be replicated back to the first site. The recovery steps for an active-active configuration are outlined as follows: 1. Configure the async replication policies such that the source to destination moves from secondary site to the new primary site for file tree A. 2. Perform async replication with full replication parameter back of file tree A to new primary site; the time to transfer the entire contents electronically can be long time, based on the amount of data and network capabilities. 3. Halt production activity to secondary site, perform another async replication to ensure that primary and secondary sites are identical. 4. Perform baseline scan of file tree A at site 1.
216
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
5. Define exports and shares to file tree A at site 1. 6. Begin production activity to file tree A at site 1. 7. Configure async replication of the source/destination nodes to direct replication back from new primary site to secondary site for file tree A. 8. Resume original async replication of file tree A from new primary site to secondary site. 9. For the first async replication of file tree B from secondary site to new primary site, ensure that the full replication parameter is invoked, to ensure that all contents from file tree B are sent from secondary site to new primary site.
217
[root@sonas02 bin]# backupmanagementnode --component auth,ssh,ctdb,derby EFSSG0200I The management node mgmt001st002.virtual.com(10.0.0.20) has been successfully backuped. [root@sonas02 bin]# ssh strg001st002.virtual.com ls /var/sonas/managementnodebackup mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2 mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2 mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2 mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2 Figure 6-31 Activate data replication
Attention: Whereas administrator backup of management node configuration information is allowed and documented in the manuals, the procedure to restore the configuration information is not documented and needs to be performed under the guidance of IBM support personnel. The restoration of configuration data is done using the cnmgmtconfbak command that is used by the GUI when building up a new management node. The cnmgmtconfbak command can also be used for listing of available archives and it requires you to specify --targethost <host> and --targetpath <path> to any backup/restore/list. Figure 6-32 shows the command switches and how to get a list of available backups.
[root@sonas02]# cnmgmtconfbak Usage: /opt/IBM/sofs/scripts/cnmgmtconfbak <command> <mandatory_parameters> [<options>] commands: backup - Backup configuration files to the bak server restore - Restore configuration files from the bak server list - List all available backup data sets on the selected server mandatory parameters: --targethost - Name or IP address of the backup server --targetpath - Backup storage path on the server options: [-x] [-v] [-u N *] [-k N **] -x - Debug -v - Verbose --component - Select data sets for backup or restore (if archive contains data set. (Default:all - without yum!) Legal component names are: auth, callhome, cim, cron, ctdb, derby, role, sonas, ssh, user, yum, misc (Pls. list them separated with commas without any whitespace) only for backup -k|--keep - Keep N old bak data set (default: keep all) only for restore -p|--fail_on_partial - Fail if archive does not contain all required components -u|--use - Use Nth bak data set (default: 1=latest) [root@sonas02]# cnmgmtconfbak list --targethost strg001st002.virtual.com --targetpath /var/sonas/managementnodebackup 1 # mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2 2 # mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2 3 # mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2 4 # mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2 Figure 6-32 Configuration backup restore command (..cont..)
218
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Remote server: You can back up the configuration data to a remote server external to the SONAS cluster by specifying the --targethost switch. The final copy of the archive file is performed by the scp command, so the target remote server can be any server to which we have a passwordless access established. Establishing passwordless access to a remote server does require root access to the SONAS cluster.
219
220
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 7.
221
222
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
223
Nearline SAS or SATA drive configuration: SATA or Nearline SAS drives have a larger capacity than SAS drives, up to 2 TB in SONAS configuration. These drives require less power per drive than SAS drives. This is why the current maximum configuration inside a single Storage Expansion rack with Nearline SAS drives is 480 drives. Be sure to have the same type and size of physical drives inside a logical storage pool; it is not desirable to mix drives types/sizes within the SONAS logical storage pool.
When a future 60 amp power option is available for the SONAS storage expansion rack, this restriction will be lifted.
224
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The other option is to add an extra Dual-port 10 Gb Converged Network Adapter (FC 1101). Indeed this feature provides a PCIe 2.0 Gen 2 x8 low-profiles dual-port 10 Gb Converged Network Adapter (CNA) with two SFP+ optical modules. The CNA supports short reach (SR) 850nm multimode fiber (MMF). You are responsible for providing the network cables to attach the network connections on this adapter to their IP network. One of feature code 1101 can be ordered for an interface node. The manufacturer of this card is Qlogic, OEM part number: FE0210302-13. The last option is to purchase both adapters. Figure 7-2 summarizes available connectivity configurations within a single interface node.
Table 7-2 Number of ports available in Interface Node. Number of ports in various configurations of a single Interface Node on board 1 GbE connectors Available features Feature Code 1100, Quad-port 1 GbE Network Interface Card (NIC) 0 0 1 (with 4 ports) 1 (with 4 ports) Feature Code 1101, Dual-port 10 GbE Converged Network Adapter (CNA) 0 (1 with two ports) 0 1 (with 2 ports) Total number of data path connectors
2 2 2 2
2 4 6 8
Cabling considerations: For each interface node in the base rack no InfiniBand cables need to be ordered. Copper InfiniBand cables are automatically provided for all interface nodes in the base rack. The length of the copper InfiniBand cables provided is based on the position of the interface node in the rack. You must however order InfiniBand cable features for inter-rack cabling after determining the layout of your multi-rack system, tor example, if an Interface Expansion Rack is required. Indeed, multiple InfiniBand cable features are available, the main difference is if you are using a 36 or 96 port InfiniBand switch configuration. The connectors are not the same inside the two models, the 36 ports model requires QSFP connectors while the 96 ports model requires X4 connectors as shown in Figure 7-1.
For additional Quad-Port adapters, Cat 5e cables or better are required to support 1 Gb network speeds, but Cat 6 provides better support for 1 Gbps network speeds. The 10 GbE data-path connections support short reach (SR) 850 nanometer (nm) multimode fiber (MMF) optic cables that typically can reliably connect equipment up to a maximum of 300 meters (M) using 2000MHz*km BW OM3 fiber.
225
For additional information about the SONAS Rack Base Feature Code 9005, see Rack types: How to choose the correct rack for your solution on page 61.
226
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
For additional information about the Feature Code 9003 SONAS Rack Base, see Rack types: How to choose the correct rack for your solution on page 61.
227
For additional information about the Feature Code 9004 SONAS Base Rack, see Rack types: How to choose the correct rack for your solution on page 61.
228
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
For additional information about the SONAS Storage Expansion Rack, see SONAS storage expansion unit on page 53.
229
For additional information about the SONAS Interface Expansion Rack, see Rack types: How to choose the correct rack for your solution on page 61.
230
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
3 4 5 6 7 8 9 10 11 12
14 14 13 13 12 12 11 11 10 10
28 28 26 26 24 24 22 22 20 20
231
Interface nodes
Maximum number of hard disk drives 2160 2160 1920 1920 1680 1680 1440 1440 1200 1200 960 960 720 720 480 480 240 240
Maximum storage capacity using 2 TB disks 4320 4320 3840 3840 3360 3360 2880 2880 2400 2400 1920 1920 1440 1440 960 960 480 480
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1
18 18 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2
Table 7-4 shows the maximum storage capacity using the 96 port InfiniBand switch.
Table 7-4 Maximum Storage Capacity with 96 port InfiniBand switch configuration. Number of Storage pods Number of Storage nodes Number of Storage controllers 2 4 6 8 10 12 14 16 Number of Disk Storage Expansion units 2 4 6 8 10 12 14 16 Maximum number of hard disk drives 240 480 720 960 1200 1440 1680 1920 Maximum storage capacity using 2 TB disks 480 960 1440 1920 2400 2880 3360 3840
1 2 3 4 5 6 7 8
2 4 6 8 10 12 14 16
232
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Maximum number of hard disk drives 2160 2400 2640 2880 3120 3360 3600 3840 4080 4320 4560 4800 5040 5280 5520 5760 6000 6240 6480 6720 6960 7200
Maximum storage capacity using 2 TB disks 4320 4800 5280 5760 6240 6720 7200 7680 8160 8640 9120 9600 10080 10480 11040 11520 12000 12480 12960 13440 13920 14400
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
233
This means, depending of your application but also your entire software stack, you might require more performance or capacity here or there in order to fit your requirements. Basically like all Storage solutions, the better you know how your application works the easier it is to size the storage which will host your data. In this section we describe in detail various business application characteristics from a Storage point of view and how they can impact your choice regarding the sizing. First of all, you have to keep in mind that network file based solutions are not always the most appropriate option according to your workload. SONAS is only one product from the wide IBM Storage product portfolio. For instance if your daily business application is using Direct Attached Storage (DAS), by design this locally storage attached solution will have a lower latency than a network attached solution like SONAS. If your business application is latency bound, SONAS might not be the better option. Still dealing with network design, if your application is using very small access in a random way, a network attached solution will not provide you tremendous performance. For more details regarding good candidates for a SONAS solution, see the chapter titled SONAS usage cases in IBM Scale Out Network Attached Storage, SG24-7874.
You will probably not use only one application. Indeed you can have many applications and many users using the shares for various purposes through NFS or CIFS, you can also have ISVs running and accessing data. For all these workloads you will have to accumulate bandwidth and capacity.
Access pattern
The access pattern is more difficult to identify. What we mean here by access pattern is the workload access type, random or sequential, the file size and the read/write ratio. When your application is performing IO on the Storage Pool, the IO access must be considered as sequential if you are able to get contiguous data when performing successive request access which have consecutive physical addresses. On the contrary it will be considered as random if, in order to retrieve contiguous data, you have to access to non-consecutive locations on the drive. In both cases, random or sequential, these access are writing or reading files. The file size is basically the size required on the Storage solution to store these files. We do not take into account snapshot, backup or replication concepts which can increase the size required to store a single file. Finally your business application does not perform reads or writes exclusively. The read/write ratio is actually the ratio between the average number of reads and the average number of writes during execution. Again, you will probably not use only one application. As SONAS allows you to use for all users and applications a unique global name space, this will lead to a mix of access types in case of multiple applications. Indeed if one application does sequential access while the second one is also accessing data from a random way, the global access on the SONAS File System might not be 100% sequential or 100% random. This is the same for file size and read/write ratio.
235
Access pattern
As discussed, access type can be random or sequential. On top of the access type you have the file size and then the read/write ratio. Basically from benchmark considerations, small random access are often associated to IO/s performance, while large sequential one are often associated to MB/s performance. These two values are basic disk and Storage Controllers characteristics, but they can also be appropriate metrics for your business application. Study and consider the workload access types, patterns, and file sizes of your installation, to obtain a better idea of your application or environment needs in term of IO/s or MB/s. From a disk level the IO/s are determined by the disk drive technology. The IO/s metric can be determined from disk characteristics such as: Average Seek Time. Rotational Latency
236
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The Average Seek Time is the time required by the disk drive to position the drive head over the correct track, while the Rotational Latency is the time required for the target sector to rotate under the disk head before it can be read or written. Average latency is estimated as the time required for one half of a full rotation. You can see Average Seek time and Rotational Latency in manufacturer specifications. Today in disk drive technology, there are two major types. High performance 15K RPM Serial Attached SCSI (SAS) disk drives have a lower Average Seek time and Rotational Latency (due to their higher rotational speed) than high capacity 7.2K RPM Nearline SAS or Serial Advanced Attachment Technology (SATA) disk drives. Currently, 15K RPM SAS disk drives generally have seek times in the 3-4 millisecond range, and are capable of sustaining an IOPS rate between 160 and 200 IOPs per second per disk drive. 7.2K RPM Nearline SAS and SATA disk drives generally have longer seek times in the 8-9 millisecond range, and are capable of sustaining 70-80 IOPS per second per disk drive. These are general Rules of Thumb for planning your SONAS system. You have 60 disks (SAS, Nearline SAS, or SATA) within a single Storage Controller drawer. Note that this does *not* mean that the Storage Controller performance is 60 times the performance of a single disk. Same if you add an additional Storage Expansion to increase the number of disk to 120 per storage Controller, the overall performance will not be 120 times the performance of a single disk. First because of the software RAID technology used in the Storage Controller read and write performances are not the same, even if both read and write are by definition a single IO, you do not have the same performance for a read and write operations, even for two write operations, you will not have same performance. Because of Raid 5 and Raid 6 definition, as described in Figure 7-7, you have to deal with parity bit. Actually depending on the RAID algorithm, the parity bit is not always on the same disk.
237
The biggest penalty you might have in terms of performance is when you are trying to update a single data inside your RAID array as shown in Figure 7-8. Four IO operations are required for a single data updates.
For a full write on all disks there are no longer penalties; see Figure 7-9.
238
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Then you have a bottleneck due to the Storage Controller itself just as every Storage Controller, it just cannot scale perfectly with the number of disks inside. Regarding the bandwidth or MB/s characteristic you also have the amount of bandwidth from a single disk, whether they are SAS, Nearline SAS, or SATA. Which is not the same depending on the IO access read or write, and the overall bandwidth of the Storage Controller will not be the sum of each disks bandwidth. Both because of the RAID overhead and the Storage Controller bottleneck. Basically MB/s performance, which deals with sequential access, depends on the Storage controller technology and algorithms. Refer to the Chapter 2, Hardware architecture on page 41 to review performance differences between a configuration with two storage controllers and a configuration with a storage controller and an storage expansion unit. In both configurations, you have the same number of disks, but with two controllers performance is better. As read and write performances are not identical from a Storage Controller point of view, this is why the Storage controller presents both read and write performances; the read/write ratio can help you to better size your SONAS environment. Even if IO/s and MB/s of characteristics of a single disk are supposed to be the same for both read and write requests, the RAID layer implies additional overhead for write access even if algorithms used in Storage controllers are designed to perform as well as possible for both read and write. This means for the exact same capacity, and storage configuration, you will have better performance with a high read/write ratio, simply because your application is performing much more read than write requests.
239
then will store data in cache. SONAS keeps a specific user allocated to the same interface node for the duration of the session (regardless of whether it is NFS or CIFS or others), specifically in order to provide users these caching capabilities. This amount of cache can be increased with the appropriate feature codes. SONAS interface nodes can configured with a total of 32 GB, 64 GB, or 128 GB of cache per interface node. If your application basically does much reuse, or has a high cache hit ratio, this caching capability can increase performance, especially with the additional cache memory. But if the caching ratio is low, then the caching effect might not help you much. Keep in mind than even if your application has a high cache hit ratio, if you have many applications running, many users accessing data, or a significant software stack, you must have enough cache memory to take advantage of this caching. If you do not know precisely your workload characteristics, next we describe methods to retrieve them. This will be useful for your SONAS sizing, and also helpful to understand more precisely your daily business application I/O behavior.
Access pattern
Tivoli Storage Productivity Center is again an appropriate option to retrieve any kind of IO utilization and access information. If Tivoli Storage Productivity Center is not set up on your current environment, you can retrieve the read and write information and it can be the size from your storage subsystem if it includes monitoring in the graphical view or CLI. From your servers, you can find this information from tools such as iostat, dstat, nestat, vmstat, and nmon. You can also ask your application developers for information regarding IO access.
240
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Tools that are helpful include nmon and nmon_analyser tools from UNIX systems (NFS), or perfmon tools with appropriate counters from Windows (CIFS) for graphical reports. From NFS access, you can also use iostat/dstat/netstat/vmstat tools.
bin 2^10 bin_dec% 2^10 2% 2^20 5% 2^30 7% 2^40 10% 2^50 13% 2^60 15% 2^70 18%
Note that at the terabyte scale we are off by around 10% that grows to 13% at the petabyte scale. That is also the reason why you only get around 55GB of space on your laptops 60GB drive. From a SONAS perspective the disk storage space is presented and discussed in the decimal notation and 60TB of disk is 60x1012 bytes of storage. On the other hand when you format the disk drives the space is presented using the binary notation, so 1TB is 240 bytes. Note that when we discuss network capacities and bandwidth using the Gbit and 10 Gbit Ethernet adapters we are using the binary notation. So a 1 Gbit Ethernet link corresponds to 230 bits per second or 134,217,728 bytes per second.
241
This table provides the amount of raw storage per 60 disk drive drawer. A SONAS storage pod will contain four drawers of 60 drives, thus the minimum configuration can grow up to four times the capacity shown. A full storage pod will have a total of 240 disk drives. If all 2 TB drives are used, this equates to a raw capacity of 480 TB, or after the RAID overhead is taken away, is a usable capacity of 372 TB. Capacity in SONAS is added by adding additional Storage Expansion Racks.
242
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Simultaneously in November 2010, IBM withdrew from SONAS marketing the following disk drive options: Feature Number 1300: 1 TB SATA hard disk drives Feature Number 1301: 2 TB SATA hard disk drives In January 2011 IBM withdrew from SONAS marketing, the following disk drive option: Feature Number 1310: 10-pack of 450GB 15K RPM SAS hard disk drives
243
The first thing to do is to determine if your bandwidth is an overall bandwidth or a peak bandwidth. If you planned to have many users accessing the SONAS through multiple share access, accessing data independently but in parallel, then you are more interested in an overall bandwidth. If you planned to access your data hosted by your SONAS Storage Solution through few servers running your daily business application which required a huge bandwidth, then you are more focused on a peak bandwidth. As previously described, the interface node default configuration is two GigE connections in failover/backup configuration to the public network. This means a single 1 Gb/s connection for each Interface Node. Moreover you will access by NFS or CIFS protocol which can lead to extra overhead. Maximum packet size for NFS is 32KB for example. First option is to double this bandwidth by changing the configuration in aggregate mode, and then have a 2Gb/s bandwidth (still NFS or CIFS however). To increase the overall bandwidth the simplest way is to add extra Interface Nodes in your SONAS configuration, you can even add an Interface Expansion Rack in order to increase the number of interface nodes to the maximum allowed in a SONAS configuration. If you are more focused on a peak bandwidth, your first option is to add an extra Quad ports GigE connectivity feature. This means a total of six GigE connections to be configured in a single failover/backup configuration (this is the default but does absolutely not increase your bandwidth), three failover/backup configurations which will result to a 3Gb/s bandwidth, or an aggregate configuration which leads to a 6Gb/s bandwidth. Still with NFS and CIFS protocol on top. Another option is to add a 10GigE dual port adapter which can also be configured in failover/backup or in aggregate configuration. This will respectively lead to a 10 Gb/s bandwidth and 20 Gb/s bandwidth, still with NFS and CIFS protocols on top. The last option is to use both additional adapters which means 6 GigE connections and two 10GigE connections. Obviously if you add these extra features for each Interface Node, you will increase mechanically the overall bandwidth. The bandwidth considerations we discussed lead to a draft Interface Nodes configuration. As with Storage Pods, the cache hit ratio parameters will make you do a second iteration in your Interface Node configuration process. The cache hit ratio means that your application can reuse data and take advantage of the SONAS caching ability. In order to increase this caching potential you have two options. First increase the number of Interface Nodes or increase the amount of memory inside each Interface Node.
244
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Depending on the total number of Storage Pods, and Interface Nodes from the previous sections you are able to determine the total number of InfiniBand ports required for your SONAS configuration. Keep in mind that a single Storage Pod require two InfiniBand ports because is partially made of two Storage Nodes. More than 36 nodes in total imply the second base rack model with the 96 ports InfiniBand switch. Here again the aim of SONAS is to be a Scale Out solution, which means extra Storage added if needed and extra Interface Nodes added if needed. So you do not have to be extremely precise and exhaustive when configuring your SONAS. The only requirement is to choose carefully the base rack model, which means the InfiniBand switch, because there is no way to swap base rack model configuration in a non expansive way. You might still order for the 96 ports InfiniBand switch and partially fill it with a single 24 port InfiniBand Board Line and scale out later if needed.
7.6 Tools
There are tools available that can be used to help you analyze your workload and, using workload characteristics, size your SONAS system.
nmon tool
nmon is a free tool to analyze AIX and Linux performances and gives you a huge amount of information all on one screen. Instead of using five or six separate tools, nmon can gather information such as CPU utilization, Memory use, Disks I/O rates, transfers, and read/write ratios, free space on file systems, disk adapters, network I/O rates, transfers, and read/write ratios, Network File System (NFS), and much more, on one screen, as well as dynamically updating it. This nmon tool can also capture the same data into a text file for later analysis and graphing for reports. The output is in a spreadsheet format (.csv). As just described, you can use nmon in order to monitor dynamically your environment, but you can also capture data into a .csv file and other tools such as nmon_analyser or nmon_consolidator to analyze data, and generate graphs or tables. The aim of nmon_analser is to use nmon .csv output files generated during your run for instance, as input and generates an Excel spreadsheet where each tab gather information regarding CPU consumption, memory utilization or disks usage, and describe results with schemes and table. Basically in a big infrastructure you might need to monitor every node, server, and client help. If you need a big picture instead of one screen capture per node, you might want to gather all nmon information for a typical application, typical run, or typical subset of nodes. Instead of nmon_analyzer, you need nmon_consolidator, which is basically the same tool but which consolidates many .csv into a single Excel spreadsheet document. This might be useful also for a virtualized environment, where you might need to monitor resources from a host point of view (Red Hat 5.4 host, VMware ESX, or AIX IBM PowerVM), instead of a virtual machines point of view. In Figure 7-11 and Figure 7-12, you can see a CPU utilization summary both from a single LPAR (Power virtualization with AIX) and the entire system (Power AIX).
245
Links
For more detailed information regarding these tools, see the corresponding website: nmon tool: http://www.ibm.com/developerworks/aix/library/au-analyze_aix/ nmon_analyser http://www.ibm.com/developerworks/wikis/display/Wikiptype/nmonanalyser nmon_consolidator http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmonconsolidator
perfmon tool
Like the nmon tools suite for Linux and AIX systems, you can use the Windows perfmon tool in order to gather and analyze your workload application. Indeed Windows Operating Systems provide the perfmon (perfmon.exe) utility to collect data. Perfmon allows for real-time performance counter visualization or historical reporting. Basically there are various performance indicators or counters that are gathered into objects, such as these: processor memory physical disks network interfaces 246
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Then each object provides individual counters such as these: %processor time for processor Pages read/s for memory %disk write time for physical disks current bandwidth for network interfaces As explained, after you have selected appropriate counters, you can visualize results dynamically, or record these into a Excel spreadsheet for later analysis and report. Unlike nmon you do not need additional tools to analyze. First you need to launch the perfmon tool, then generate a data collection during the application execution for example, or during the whole day. After the data collection is generated, you can open the log file generated, visualize it and even generated a .csv file. Finally open the generated csv file with Excel and create a scheme and table as described in Figure 7-13 and Figure 7-14.
247
248
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 8.
Installation planning
In this chapter we provide information about the basic installing planning of the SONAS appliance. We do not include considerations for Tivoli Storage Manager, replication, or ILM.
249
250
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Assuming the sizing you have done led to a configuration with one Base rack and two Storage Expansion racks, which will be setup on the same row in your data center. You must ensure first that all SONAS racks have at least 762 mm (30 in.) of free space in front and in the back of them. According to the weight distribution areas, they also have 155+313=468 mm or 313+313=626 mm between them as described in Figure 8-3. In Figure 8-2, you see the detailed SONAS rack dimensions.
251
As just described, the minimum space between a base rack RXA #3 and a Storage Expansion rack RXB is 468 mm (18.4 in.), whereas it is 626 mm (24.6 in.).
Newer Nearline SAS drive technology replaced older SATA drive technology in SONAS in November 2010.
252
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 8-4 Maximum number of Storage according to SAS drives power consumption
Tip: In the future, when the 60 amp power option becomes available for the SONAS Storage Expansion rack, this restriction due to the higher power consumption of SAS drives will be lifted. In Figure 8-5, you can find additional information regarding the power consumption measurements done for heavy usage scenario with fully populated SONAS Racks and SAS drives exclusively.
8.1.3 Noise
Based on acoustics tests performed for a SONAS system, these values apply: 90dB registered for a fully populated Base rack (2851-RXA) system Up to 93 bB in worst scenario with a fully populated Storage Expansion Rack (2851-RXB) The system operating acoustic noise specifications are as follows: Declared Sound Power Level, LwAd is less that 94 dBA @ 1m at 23C However, you can reduce the audible sound level of the components installed in each rack by up to 6 dB with the acoustic doors feature (feature code 6249 of each SONAS rack).
253
See Figure 8-6 for information regarding the temperature and humidity while the system is in use or shut down.
254
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Question # 4
Value
Notes This IP address will be associated to the management node. It has to be on the public network and accessible by the storage administrator. This is the numeric Gateway of your Management console. This is the numeric Subnet Mask of your Management console.
5 6 7 8
Management console gateway Management console subnet mask Host name Root Password mgmt001st001
This is your preassigned Management Node host name. You can specify here the password you want to be set on the management node for root access. By default it is Passw0rd (where P is capitalized and 0 is zero). SONAS need to synchronize all nodes inside the cluster and your Authentication method, you then have to provide at least one NTP Server. A second NTP Server is best for redundancy. Note: The Network Time Protocol (NTP) Server(s) can be either local or on the internet. Note: Only the Management Node require a connection to your NTP server, indeed itself will become the NTP server for the whole cluster. Referring to Time zone list. You have to specify the number corresponding to your location. You need to specify here the total quantity of rack frames in this cluster.
10 11
In Table 8-2, you provide some information regarding the remote configuration of your SONAS in order to enable the Call Home feature.
Table 8-2 Remote configuration Question # 12 13 Field Company Name Address This is the address where your SONAS is located. Example: Bldg. 123, Room 456, 789 N DataCenter Rd, City, State In case of severe issue this is the primary contact that IBM service will call. This is the alternate phone number. Optional. You have to provide the IP Address of the Proxy Server if it is needed to access the internet for Call Home feature. Optional. You have to provide the port of the Proxy Server if it is needed to access the internet for Call Home feature Value Notes
14 15 16
Customer Contact Phone Number Off Shift Customer Contact Phone Number IP Address of Proxy Server (for Call Home) Port of Proxy Server (for Call Home)
17
255
Question # 18
Field Userid for Proxy Server (for Call Home) Password for Proxy Server (for Call Home)
Value
Notes Optional. You have to provide the userid of the Proxy Server if it is needed to access the internet for Call Home feature. Optional. You have to provide the Password of the Proxy Server if it is needed to access the internet for Call Home feature.
19
In Table 8-3 you provide the quorum topology of your SONAS system.
Table 8-3 Quorum topology Question # 20 Field Quorum storage nodes Value Notes 1. Your first action will be to select an odd number of Quorum Nodes, you can use both Interface and Storage Nodes. 2. Valid choices are 3, 5, or 7. 3. If your cluster is composed by more than a single frame, you must spread your quorum nodes in several frames. 4. After you have built the appropriate topology, write the Interface and Storage Node numbers in the table.
21
In Table 8-4 you provide CLI credentials. Your SONAS administrator will use these credentials to connect to the CLI or GUI in order manage your entire SONAS Storage Solution.
Table 8-4 CLI credentials Question # 22 23 Field CLI User ID CLI Password Value Notes Your SONAS administrator will use this ID for GUI or CLI connection, for instance: myuserid This is the password corresponding to the User ID. Example: mypassword
In Table 8-5, enter the locations of the SONAS nodes in your data center.
Table 8-5 Nodes location Question # 24 25 Node Number Management Node Interface Node #1 ... 26 Storage Node ... Rack number/ position Node Serial Number InfiniBand Port Number
The Rack number is actually the number of the rack containing this node whereas the position indicates position (U) where this node is installed in the rack. The Node Serial Number is the serial number of the node. The InfiniBand Port Number is the InfiniBand Switch port number where the node is connected. You do not have to give this information for preinstalled nodes.
256
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Table 8-6 and Table 8-7 you provide information regarding your existing DNS and NAT configuration.
Table 8-6 DNS configuration Question # 27 Field IP Address of Domain Name Services (DNS) Server(s) Value Notes You need to provide here the numeric IP address of one or more Domain Name Services (DNS) Servers you are using inside your network. In order to avoid a bottleneck because of a single DNS server, and then improve performance, you can set up multiple DNS servers in a round-robin configuration. This is the Domain Name of your cluster (such as mycompany.com). Note: This field is not required and can be left blank. If it is left blank then no Domain name will be set for the cluster. This is a list of one or more Domain Names to be used when trying to resolve a shortname (example: mycompany.com, storage.mycompany.com, servers.mycompany.com). Note: This field is not required and can be left blank. If it is left blank then no search string will be set for the cluster.
28
Domain
29
Search String(s)
Table 8-7 NAT configuration Question # 30 Field IP Address Value Notes The numeric IP address requested here is the IP address needed to access the Management and Interface Nodes through the internal private network connections using NAT overloading, meaning that a combination of this IP Address and a unique port number will correspond to each node (Management Node and Interface Nodes only). This IP Address must not be the same as the Management Node IP Address or the Interface Node IP Addresses. This is the Subnet Mask associated with the IP Address. This is the CIDR (/XX) equivalent of the Subnet Mask specified. This is the Default Gateway associated with the IP Address.
31 32 33
The next step is to provide details of your authentication method in Table 8-8. You will have to integrate your SONAS system into your existing authentication environment, which can be Active Directory (AD) or Lightweight Directory Access Protocol (LDAP).
257
Table 8-8 Authentication methods Question # 34 Field Authentication Method Value [ ] Microsoft Active Directory or [ ] LDAP Notes What is the authentication method you are using in your environment?
35
AD Server IP address
In case of an Active Directory configuration you need to provide the numeric IP address of the Active Directory server. This User ID and the Password next will be used to authenticate to the Active Directory server. This is the password associated to the userid. In case of a LDAP configuration, you need to provide the numeric IP address of the remote LDAP server. [ ] Off [ ] SSL (Secure Sockets Layer) [ ] TLS (Transport Layer Security) In case of a LDAP configuration you can choose to use an open (unencrypted) or a secure (encrypted) communication between your SONAS cluster and the LDAP server. In case of secured communication two methods can be used: SSL or TLS. When SSL or TLS is used, a security certificate file must be copied from your LDAP server to the IBM SONAS Management Node. This is the Cluster Name specified in Table 8-1 (example: sonascluster) This is the Domain Name specified in Table 8-1 (example mydomain.com) These are the suffix, rootdn, and rootpw from the /etc/openldap/slapd.conf file on your LDAP server. This information can be found at /etc/openldap/slapd.conf on your LDAP server. If you choose the SSL or TSL method, you need to provide the path on the IBM SONAS Management Node where you will copy the Certificate file.
36 37 38-0
38
39 40 41 42 43
LDAP Cluster Name LDAP Domain Name LDAP Suffix LDAP rootdn LDAP rootpw
44
After your SONAS appliance has been integrated into your existing environment, and the authentication method set up accordingly, you will be able to create exports in order to grant access to SONAS users. But before you create these exports, the protocol information in Table 8-9 is required.
258
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Table 8-9 Protocols access Question # 45 Field Protocols Value [ ] CIFS (Common Internet File System) [ ] FTP (File Transfer Protocol) [ ] NFS (Network File System). Notes These are all supported protocols which can be used in order to accessing data. You need to check one or more according to your needs.
46
Owner
This is the owner of the shared disk space. It can be a username, or a combination of Domain\username. Example: admin1 Example: Domain1\admin1 If you need a CIFS share you need to detail some options. The options are a comma-separated key-value pair list. Valid CIFS options are: browseable=yes comment="Place comment here" Example: -cifs browseable=yes,comment="IBM SONAS" IP Address: Subnet Mask: CIDR Equivalent: Access Options: [ ] ro or [ ] rw [ ] root_squash or [ ] no_root_squash [ ] async or [ ] sync. If you need an NFS share you need to provide some NFS options. If NFS options are not specified, the NFS shared disk will not be accessible by SONAS clients. NFS options include a list of client machines allowed to access the NFS shared drive, and the type of access to be granted to each client machine. Example: -nfs 9.11.0.0/16(rw,no_root_squash,async)
47
CIFS Options
48
In Table 8-10 you will need to provide details of Interface subnet and network information.
Table 8-10 Interface subnet Question # 49 Field Subnet Value Notes Basically this is the public network. This network will be use for communication between SONAS Interface Nodes and your application servers. As an example, if you have three Interface Nodes on a single network, with IP addresses from 9.11.136.101 through 9.11.136.103, then the your subnet will be 9.11.136.0, and the subnet mask 255.255.255.0 (/24 in CIDR format). This is the Subnet Mask associated with the Subnet listed. This is the Subnet Mask listed, converted to CIDR format.
50 51
259
Question # 52
Field VLAN ID
Value
Notes Optional. This is a list of one or more Virtual LAN Identifiers. A VLAN ID must be in the range from 1 to 4095. If you do not use VLANs then leave this field blank. Optional. This is a name assigned to a network group. This allows you to reference a set of Interface Nodes using a meaningful name instead of a list of IP addresses or host names. If you do not use network groups then leave this field blank.
53
Group Name
54
You must complete these tables prior to any SONAS setup. Some of the information in the table is critical, for example, the authentication method. Your SONAS Storage Solution will not work if not properly configured. In this section and the previous one, our main concern was the pre-installation process. In the following sections we assume that your SONAS Storage Solution has been properly preconfigured and setup with all required information. However your SONAS is not yet ready to use, you will have to complete some additional planning steps regarding Storage and Network configuration, and last but not least the authentication method and IP address load balancing configuration.
8.3.1 Storage
SONAS storage consists of disks grouped in sets of 60 SAS, Nearline SAS, or SATA hard drives. Enclosures with Nearline SAS or SATA drives always are configured in RAID 6 array, enclosures with SAS drives always are configured in RAID 5 array. Because of power consumption it is possible to use maximum 360 SAS drives and 480 Nearline SAS or SATA drives per rack. It means that maximum capacity for SATA drives is 14.4 PB and for SAS drives is 2.43 PB.2 SONAS supports up to 2,147,483,647 files with 1 MB block size with maximum supported approximately 60 million files with async replication. The maximum number of files in a SONAS is constrained by the formula: maximum number of files = (total file system space/2) / (inode size + subblock size) For file systems that will be doing parallel file creates, if the total number of free inodes is not greater than 5% of the total number of inodes there is the potential for slowdown in file system access. Take this into consideration when changing your file system.
2
In the future, when a 60 amp power option is available for the SONAS Storage expansion rack, this restriction will be lifted.
260
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Snapshot handling
After the successful completion of async replication, the snapshot created in the source file system is removed. However you have to ensure sufficient storage at the replication source and destination for holding replica of source file tree and associated snapshots. A snapshot is a space efficient copy of a file system at the point when the snapshot is initiated. The space occupied by the snapshot at the time of creation and before any files are written to the file system is a few KB for control structures. There is no additional space required for data in a snapshot prior to the first write to the file system after the creation of the snapshot. As files are updated, the space consumed increases to reflect the main branch copy and also a copy for the snapshot. The cost of this is the actual size of the write rounded up to the size of a file system block for larger files or the size of a sub-block for small files. In addition, there is a cost for additional inode space and indirect block space to keep data pointers to both the main branch and snapshot copies of the data. This cost grows as more files in the snapshot differ from the main branch, but the growth is not linear because the unit of allocation for inodes is chunks in the inode file which are the size of the file system sub-block. After the completion of the async replication, a snapshot of the filesystem containing the replica target is performed. Impact of snapshots to the SONAS capacity depends on the purpose the snapshots are used for a specific use case. If the snapshots are used temporarily for the purpose of creating an external backup and removed afterwards the impact is most likely not significant for configuration planning. In cases where the snapshots are taken frequently for replication or as backup to enable users to do an easy restore the impact cannot be disregarded. But the concrete impact depends on the frequency a snapshot is taken, the length of time when each snapshot exists and the number of the files in the file system are changed by the users as well as the size of the writes/changes.
261
HSM at destination: If the destination system uses HSM of the SONAS storage, consider having enough primary storage at the destination to ensure that the change delta can be replicated over into its primary storage as part of the Disaster Recovery process. If the movement of the data from the destination locations primary to secondary storage is not fast enough, the replication process can outpace this movement, causing a performance bottleneck in completing the disaster recovery cycle. Therefore, the capacity of the destination system to move data to the secondary storage must be sufficiently configured to ensure that enough data has been pre-migrated to the secondary storage to account for the next async replication cycle and the amount of data to be replicated can be achieved without waiting for movement to secondary storage. For example, enough Tivoli Storage Manager managed tape drives will need to be allocated and operational, along with enough media.
262
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
can be changed
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
8 8 8 8 KBKBKBKB
32 KB chunk
32 KB chunk
32 KB chunk
32 KB chunk
32 KB chunk
32 KB chunk
32 KB chunk
32 KB chunk
parity
parity/spare
RAID array
Figure 8-7 How SONAS writes data to disks
In file systems with a mix of variance in the size of files within the file system, using a small block size will have a large impact on performance when accessing large files. In this kind of system it is suggested that you use a block size of 256 KB (8 KB sub-block). Even if only 1% of the files are large, the amount of space taken by the large files usually dominates the amount of space used on disk, and the waste in the sub-block used for small files is usually insignificant. Larger block sizes up to 1MB are often a good choice when the performance of large files accessed sequentially are the dominant workload for this file system. The effect of block size on file system performance largely depends on the application I/O pattern. A larger block size is often beneficial for large sequential read and write workloads. A smaller block size is likely to offer better performance for small file, small random read and write, and metadata-intensive workloads. The efficiency of many algorithms that rely on caching file data in a page pool depends more on the number of blocks cached rather than the absolute amount of data. For a page pool of a given size, a larger file system block size means fewer blocks cached. Therefore, when you create file systems with a block size larger than the default of 256 KB, it is best that you increase the page pool size in proportion to the block size. Data is cached in interface nodes memory, so it is important to plan correctly RAM memory size in interface nodes.
263
Both classes of metadata are replicated in a SONAS system for fault tolerance. The system overhead depends on the number of LUNs and the size of the LUNs assigned to a file system; but are typically on the order of a few hundred MB or less per file system. The metadata in support of usage can be far higher, but is largely a function of usage. The cost of directories is totally a function of usage and file naming structures. A directory costs at least the minimum file size for each directory and more if the number of entries is large. For a 256KB block size file system, the minimum directory is 8 KB. The number of directory entries per directory block varies with customer usage. For example, if the average directory contained 10 entries, the cost of a directory is 800 bytes. This number can be doubled for metadata replication. The cost of inodes is a function of how the file system is configured. By default, SONAS is configured with 50 M inodes preallocated and a maximum allowed inodes value of 100 M. By default, an inode requires 512 bytes of storage. The defaults require 50 GB of storage for inodes (512 * 50 M *2 for replication). If the user actually had 50 M files with an average directory holding 10 files, the cost for directories is about 80 GB. Higher density directories require less space for the same number of files. There might also be a requirement for space for indirect blocks for larger files. These two categories dominate to overhead for a file system, with other minor usages such as recovery logs or message logs.
264
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
With failure of a single disk, if you have not specified multiple failure groups and replication of metadata, SONAS will not be able to continue because it cannot write logs or other critical metadata. If you have specified multiple failure groups and replication of metadata, the failure of multiple disks in the same failure group will put you in the same position. In either of these situations, GPFS will forcibly unmount the file system. It is best to replicate at least metadata between two storage pods, so you have to create two failure groups for two storage pods.
265
MS Active Directory
One method for user authentication is to communicate with a remote Authentication and Authorization server running Microsoft Active Directory software. The Active Directory software provides Authentication and Authorization services. For the cfgad command, you will need to provide information such as the Active Directory Server IP address and cluster name. Basically, this information has been required in Table 8-8 on page 258. Here we need answers to questions #35 to #37. Through the Command Line Interface, run the following cfgad command as shown in Example 8-1.
Example 8-1 cfgad command example
cfgad -as <ActiveDirectoryServerIP> -c <clustername>.<domainname> -u <username> -p <password> Where: <ActiveDirectoryServerIP>: IP Address of the remote Active Directory server as specified in Table 8-8 on page 258, question #35. <clustername>: Cluster Name as specified in Example 8-1 on page 254, question #1. <domainname>: Domain Name as specified in Example 8-1 on page 254, question #2. <username>: Active Directory User ID as specified in Example 8-8 on page 258, question #36. <password>: Active Directory Password as specified in Example 8-8 on page 258, question #37. Example: cli cfgad -as 9.11.136.116 -c sonascluster.mydomain.com -u aduser -p adpassword
266
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
To check if this cluster is now part the Active Directory domain, use the chkauth command as shown in Example 8-2.
Example 8-2 chkauth command example
cli chkauth -c <clustername>.<domainname> -t Where: <clustername>: Cluster Name specified in Table 8-1 on page 254, question #1. <domainname>: Domain Name as specified in Table 8-1 on page 254, question #2. Example: cli chkauth -c sonascluster.mydomain.com -t If the cfgad command was successful, in the output from the chkauth command you will see CHECK SECRETS OF SERVER SUCCEED or a similar message.
LDAP
Another method for user authentication is to communicate with a remote Authentication and Authorization server running Lightweight Directory Access Protocol (LDAP) software. The LDAP software provides Authentication and Authorization services. For the cfgldap command, you will need to provide information such as the LDAP Server IP address and the cluster name. Basically, this information has been required in Table 8-8 on page 258. Here we need answers to questions #38 to #44. Through the Command Line Interface, run the cfgldap command as shown in Example 8-3.
Example 8-3 cfgldap command example
cfgldap -c <cluster name> -d <domain name> -lb <suffix> -ldn <rootdn> -lpw <rootpw> -ls <ldap server> -ssl <ssl method> -v Where: <cluster name>: Cluster Name as specified in Table 8-8 on page 258, Question #39 <domain name>: Domain Name as specified in Table 8-8 on page 258, Question #40 <suffix>: The suffix as specified in Table 8-8 on page 258, Question #41 <rootdn>: The rootdn as specified in Table 8-8 on page 258, Question #42 <rootpw>: The password for access to the remote LDAP server as specified in Table 8-8 on page 258, Question #43 <LDAP Server IP>: IP Address of the remote Active Directory server as specified in Example 8-8, Question #38-0. <ssl method>: SSL method as specified in Table 8-8 on page 258, Question #38 Example: cli cfgldap -c sonascluster -d mydomain.com -lb "dc=sonasldap,dc=com" -ldn "cn=Manager,dc=sonasldap,dc=com" -lpw secret -ls 9.10.11.12 -ssl tls -v To check if this cluster is now part the Active Directory domain, run the chkauth command described in Example 8-2 on page 267.
267
In Table 8-1 on page 254, Question #3, you have been prompted for an available IP Address range. As described in Chapter 2, Hardware architecture on page 41, SONAS is composed of three different networks. One of these is the public network, which will be used for SONAS users or administrators to access interface nodes or management nodes respectively. The other two are Private Network, or management network, which will be used by the management node to handle the whole cluster, and the Data Network, or InfiniBand Network, on top of which the SONAS File System is built. These two last networks, private and data, are not used by SONAS users or administrator, but as they coexist on all nodes with Public Network, ensure that you will not use the same in order to avoid some IP conflicts. There are only three choices for the Private Network range. The default setting for Public IP addresses is the range 172.31.*.* but you might already use this particular range in your existing environment, so the 192.168.*.* range might be more appropriate. Similarly if you are using both the 172.31.*.* and 192.168.*.* ranges, then the range 10.254.*.* must be used as private network instead. In order to determine IP address ranges currently in used on your data center location, ask your Network administrators.
268
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The first SONAS user, running the Linux Operating system, wants to mount an NFS share on their workstation and run a mount command with the sonascluster.mydomain.com DNS hostname as described in the top left corner in Figure 8-8. This request will be caught by the DNS server (step 1), which will then look inside its list of IP addresses and forward the request to the appropriate interface node (step 2). This happens in a round robin way, while sending an acknowledgment to the Linux SONAS user (step 3). The connection between the first SONAS user and one interface node is then established as you can see with the dashed arrow in Figure 8-8.
269
Assuming now a second SONAS user, who needs also to access data hosted on the SONAS Storage Solution with a CIFS protocol from its Windows laptop. That user will run a net use command (or use the map network drive tool) using the same sonascluster.mydomain.com DNS hostname as you can see in Figure 8-9. This second request will be caught here again by the DNS server which, in a round robin way, will assign to this second user the next IP address. Then steps 1 to 3 are repeated as described in Figure 8-9. The final connection between the second SONAS user and the Interface Node is then established; see the new dashed arrow on the right.
Connections between SONAS users and interface nodes remain active until shares are unmounted from SONAS users, or in case of interface node failure. In case of interface node failure, the IP address balancing is handle by the CTDB layer. The CTDB layer in order to handle interface node failure works with a table. Briefly this table is recreated as soon as a new event happens. An event can be Interface Node failure or recovery. Table entries are interface nodes identifiers and Public IP addresses.
270
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Figure 8-10, the SONAS has been configured in such a way that the CTDB have a table with three interface node identifiers and three Public IP addresses for SONAS users.
Figure 8-10 CTDB table with three interface node identifiers and three IPs
In our environment we have three interface nodes #1, #2 and #3 and three IP addresses. The CTDB table has been created with these entries: #1, #2, #3 10.10.10.1, 10.10.10.2, 10.10.10.3 From the CTDB point of view: #1 is responsible for 10.10.10.1. #2 is responsible for 10.10.10.2. #3 is responsible for 10.10.10.3. With your two SONAS users connected as shown in Figure 8-10, only the two first interface nodes are used. The first interface node is using the 10.10.10.1 IP address while the second one is using 10.10.10.2, according to the CTDB table.
271
In case of failure of the first interface node, which was in charge of the 10.10.10.1 IP address, this IP address 10.10.10.1 will then be handled by the last interface node as in Figure 8-11. From the CTDB point of view in the case of failure you now have: #2 is responsible of 10.10.10.2. #3 is responsible of 10.10.10.3 and 10.10.10.1.
Figure 8-11 CTDB table with interface node identifiers and IP mappings after failure
As you can see in Figure 8-11, the first NFS SONAS user now has an active connection to the last interface node. This is basically how the CTDB is handling the IP address balancing. Your DNS is handling the round robin method while the CTDB is in charge of the IP failover.
272
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
However, in the previous example, there is a potential load balancing bottleneck in case of failure of one interface node. Indeed assuming a third user accessing the SONAS through the FTP protocol as described in Figure 8-12, the connection is established with the last dashed arrow on the third interface node. The first NFS user is still connected to the SONAS through the first interface node, while the second CIFS user is connected to the SONAS through the second interface node and the last FTP user is accessing the SONAS through the third interface node (the DNS here again gave the next IP address).
273
You might notice that from here all incoming users will be related to Interface Nodes #1, #2 or #3 in the same way because of the DNS round robin configuration. As an example, you might have four users connected to each Interface Node as described in Figure 8-13.
Figure 8-13 Interface node relationships showing CTDB round robin assignment
274
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The bottleneck that we mentioned earlier appears if one interface node fails. Indeed the IP address handled by this failing interface node will migrate, as will all users and their workload, to another interface node according to the CTDB table. You will then have one interface node handling a single IP address and four user workloads (second interface node) and the third interface node handling two IP addresses and eight user workloads as described in Figure 8-14.
Figure 8-14 Interface node assignment and workload distribution according to the CTDB table
The original overall SONAS users workload was equally load balanced between the three Interface Nodes, 33% of the workload each, after the interface node crash and with the previous CTDB configuration, the workload is now 33% on the second Interface Node and 66% on the third Interface Node. In order to avoid this situation, a simple configuration might be to create more IP addresses than interface nodes available. Basically, in our example, six IP addresses, two per interface node, might be more appropriate as shown in Figure 8-15.
275
Figure 8-15 CTDB with more IP addresses than interface nodes assigned
In that case, the original the original CTDB table is: #1 is responsible of 10.10.10.1 and 10.10.10.4 #2 is responsible of 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.6 In case of failure, the failing interface node previously in charge of two IP addresses, will off load his first IP address on the second Interface Node and his second IP address on the third Interface Node. Indeed bellow is the new CTDB table: #2 is responsible of 10.10.10.1 and 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.4 and 10.10.10.6
276
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The results of this are a 50-50% workload spread into the two remaining Interface Nodes after the crash as described in Figure 8-16.
After the first Interface Node is back again, it will be a new event and the new CTDB table is as follows: #1 is responsible of 10.10.10.1 and 10.10.10.4 #2 is responsible of 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.6 This means the traffic will then be load balanced on the three interface nodes again.
277
8.5.1 Redundancy
SONAS, as explained in this book, has been designed to be a High Available Storage Solution. This High Availability relies on hardware redundancy and software high availability with GPFS and CTDB. But as you planned to integrate SONAS into your own existing infrastructure, you have to ensure that all externals services or equipment are also high available. Indeed your SONAS need an Active Directory Server (or LDAP) for Authentication, but is this authentication server redundant? Same question for NTP and DNS servers. From a hardware point of view, do you have redundant power? Are there network switches for the Public Network?
8.5.3 Caveats
If you have planned to migrate your existing environment and business applications to a SONAS Storage Solution, be aware that NAS storage are not always the most appropriate options. Indeed if your business application is currently writing or reading data from a locally attached solution (DAS), you will increase in a significant way the latency on a Storage base solution by design. Similarly if your application is performing a huge numbers of write, even small ones, on a locally attached solution, it will quickly overload your network switches. A workaround for these requirements is first to use caching on the client side to reduce the higher bandwidth impact on performance, and to combine IO requests on client side in order to reduce IO size. You can also modify your application in order to be more tolerant in case of packet loss or time-out expiration due to IP protocol, and make it retry.
278
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Chapter 9.
279
9.1 Pre-Installation
At this point, you have completed your IBM SONAS purchase and it has been delivered. You are now ready to integrate your SONAS appliance with the installation: 1. Review the floor plan and pre-installation planning sheet to determine whether all information has been provided. 2. If the pre-installation planning sheet is not complete, contact the Storage Administrator. This information will be required through the rest of the installation, and the install cannot start until the Preinstallation Planning Sheet is done. 3. The IBM authorized service provider will perform all the necessary preliminary planning work including verifying the information in the planning worksheets in order to make sure you are well aware of the specific requirements such as physical environment or networking environment for the SONAS system.
9.2 Installation
Installation of a SONAS appliance requires both hardware installation as well as software installation.
280
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
281
Review the result of the checks and verify that the checks have status of OK. For any problems reported by this command, refer to the Problem Determination and Troubleshooting Guide, GA32-0717.
282
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Consider the following setup: Hardware considerations: The rack contains 1 management node, 6 interface nodes, 2 storage nodes, switches, and InfiniBand connections. Software considerations: AD/LDAP is already configured on an external server, Filesystem and export information already available. The cluster name in this example is: Furby.storage.tucson.ibm.com
As the script proceeds, it asks for the configuration parameters to configure the cluster. These details include the Management Node IP, Internal IP Range, Root Password, Subnet IP Address, NTP Server IP, and more. The interface nodes and the storage nodes are then powered on. The script checks for these nodes and identifies them as per the configuration fed during the installation procedure.
283
Figure 9-2 shows the detection of interface nodes and storage nodes when powered on.
Figure 9-2 The script detecting the interface node and storage node
284
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 9-3 and Figure 9-4 show the assignment of ID for the interface nodes and storage nodes in the cluster. It also involves the step to assign the nodes to be Quorum nodes or not.
Figure 9-3 Identifying the sequence of the interface nodes and assigning quorum nodes
285
Figure 9-4 Identifying sequence of Storage nodes and assigning quorum nodes
The next panel in Figure 9-5 shows the configuration of the cluster where each of the interface nodes and storage nodes are added as a part of the cluster and the cluster nodes are prepared to communicate with each other.
286
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The panel in Figure 9-6 shows the end of the script after which the cluster has been successfully configured.
Figure 9-6 Cluster now being created and first_time_install script completes
287
The Health of the system is then checked. The IBM authorized service provider then logs in into the management node and runs the health check commands. The verify_hardware_wellness script checks for the connectivity between the management node, interface nodes and the storage nodes. The command cnrsscheck is then run to check for the health of the Ethernet Switches, InfiniBand Switches and the Storage Drawers. and to see if the Nodes have their roles assigned respectively and if they are able to communicate with each other. Example 9-1 shows the command output for our example cluster setup.
Example 9-1 Running script verify_hardware_wellness and cnrsscheck to check the overall Health of cluster created
# verify_hardware_wellness [NFO] [2010-04-21 15:59:06] [NFO] [2010-04-21 15:59:06] 3 minutes. [NFO] [2010-04-21 16:00:54] minutes. [NFO] [2010-04-21 16:04:10] 1 minutes. [NFO] [2010-04-21 16:04:18] [NFO] [2010-04-21 16:04:28] Discovery results: There are 6 interface nodes. There are 2 storage nodes. There is 1 management node.
197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness()
[[email protected] ~]# cnrssccheck --nodes=all --checks=all vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv mgmt001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on mgmt001st001 It might take a few minutes. EthSwCheck ... OK IbSwCheck ... OK NodeCheck ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:07:57+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: mgmt001st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ Ethernet Switch status:
288
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Verify Ethernet Switch Configuration (Frame:1, Slot:41) OK Verify Ethernet Switch Hardware (Frame:1, Slot:41) OK Verify Ethernet Switch Firmware (Frame:1, Slot:41) OK Verify Ethernet Switch Link (Frame:1, Slot:41) OK Verify Ethernet Switch Configuration (Frame:1, Slot:42) OK Verify Ethernet Switch Hardware (Frame:1, Slot:42) OK Verify Ethernet Switch Firmware (Frame:1, Slot:42) OK Verify Ethernet Switch Link (Frame:1, Slot:42) OK ================================================================================ InfiniBand Switch status: Verify InfiniBand Switch Configuration (Frame:1, Slot:35) OK Verify InfiniBand Switch Hardware (Frame:1, Slot:35) OK Verify InfiniBand Switch Firmware (Frame:1, Slot:35) OK Verify InfiniBand Switch Link (Frame:1, Slot:35) OK Verify InfiniBand Switch Configuration (Frame:1, Slot:36) OK Verify InfiniBand Switch Hardware (Frame:1, Slot:36) OK Verify InfiniBand Switch Firmware (Frame:1, Slot:36) OK Verify InfiniBand Switch Link (Frame:1, Slot:36) OK ================================================================================ Node status: Verify Node General OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mgmt001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on strg001st001 It might take a few minutes. FcHbaCheck ... OK DdnCheck ... OK DdnLogCollector ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:10:46+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: strg001st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ DDN Disk Enclosure status: Verify Verify Verify Verify Verify Disk Enclosure Configuration (Frame:1, Slot:1) Disk in Disk Enclosure (Frame:1, Slot:1) Disk Enclosure Hardware (Frame:1, Slot:1) Disk Enclosure Firmware (Frame:1, Slot:1) Array in Disk Enclosure (Frame:1, Slot:1) OK OK OK OK OK
289
================================================================================ FibreChannel HBA status: Verify Fibre Channel HBA Configuration (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Link (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Configuration (Frame:1, Slot:17, Instance:1) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:17, Instance:1) OK Verify Fibre Channel HBA Link (Frame:1, Slot:17, Instance:1) OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg002st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on strg002st001 It might take a few minutes. FcHbaCheck ... OK DdnCheck ... OK DdnLogCollector ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:13:26+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: strg002st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ DDN Disk Enclosure status: Verify Disk Enclosure Configuration (Frame:1, Slot:1) OK Verify Disk in Disk Enclosure (Frame:1, Slot:1) OK Verify Disk Enclosure Hardware (Frame:1, Slot:1) OK Verify Disk Enclosure Firmware (Frame:1, Slot:1) OK Verify Array in Disk Enclosure (Frame:1, Slot:1) OK ================================================================================ FibreChannel HBA status: Verify Fibre Channel HBA Configuration (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Link (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Configuration (Frame:1, Slot:19, Instance:1) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:19, Instance:1) OK Verify Fibre Channel HBA Link (Frame:1, Slot:19, Instance:1) OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg002st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
290
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Commands: All the commands run for the configuration of SONAS are run as root. You can either export the PATH variable to include the CLI path or run the commands from the CLI directory. In our example, we change directory to the CLI directory by running: # cd /opt/IBM/sofs/cli At the end of the hardware installation, the cluster is created. The IBM authorized service provider then creates a CLI user, and adds the cluster to the GUI. See Example 9-2.
Example 9-2 Creating a new CLI user using CLI command mkuser
[[email protected] cli]# mkuser -p Passw0rd cliuser EFSSG0019I The user cliuser has been successfully created. [[email protected] cli]# addcluster -h int001st001 -p Passw0rd EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added You need to enable the license as in Example 9-3, after which the cluster is ready for the rest of the software configuration.
Example 9-3 Enabling License.
[[email protected] cli]# lsnode Hostname IP Description int001st001 172.31.132.1 int002st001 172.31.132.2 int003st001 172.31.132.3 int004st001 172.31.132.4 int005st001 172.31.132.5 int006st001 172.31.132.6 mgmt001st001 172.31.136.2 strg001st001 172.31.134.1 strg002st001 172.31.134.2
-v Role interface interface interface interface interface interface management storage storage
Product Version 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7
Connection stat OK OK OK OK OK OK OK OK OK
291
Attention: The actual command output displayed on the panel has many more fields than are shown in this example. This example has been simplified to ensure that the important information is clear.
[[email protected] cli]# lscurrnode Node ID Node type Node state Management IP Address int001st001 Interface ready 172.31.4.1 int002st001 Interface ready 172.31.4.2 int003st001 Interface ready 172.31.4.3 int004st001 Interface ready 172.31.4.4 int005st001 Interface ready 172.31.4.5 int006st001 Interface ready 172.31.4.6 mgmt001st001 Management ready 172.31.8.1 strg001st001 Storage ready 172.31.6.1 strg002st001 Storage ready 172.31.6.2
InfiniBand IP address 172.31.132.1 172.31.132.2 172.31.132.3 172.31.132.4 172.31.132.5 172.31.132.6 172.31.136.1 172.31.134.1 172.31.134.2
Attention: The actual command output displayed on the panel has many more fields that shown in this example. This example has been simplified to ensure the important information is clear. Column 3 in Example 9-5 displays the state of the node. Verify that the state of each Node is Ready.
Management IP range
This is the network for the management node to send management data to the interface nodes and storage nodes. This is a Private Network not reachable by the outside clients. There is no data transferred in this network but only management related communication such as commands or passing management related information from the management nodes to the interface nodes and storage nodes. You can read more in Chapter 4, Networking considerations on page 141. From the previous Example 9-5 on page 292, you can see that the Management IP takes the range of 172.31.4.* for interface nodes, 172.31.8.* for management node, and 172.31.6.* for the storage node. This is done by the install script while creating the SONAS cluster. Here as we see, the first two part of IP address is constant and then depending on the interface nodes, management node, and storage node, the management IP address is assigned as follows: 292
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Interface node: 172.31.4.* Management node: 172.31.8.* Storage node: 172.31.6.* Here the last part of the IP address is incremented sequentially depending on the number of interface nodes and storage nodes. At the time of writing, only a single management node is supported.
InfiniBand IP range
This is the network range which is used for data transfer between the interface node and storage node. Like the Management IP, this is a Private Network and not reachable by the outside clients. Refer to Chapter 9, Installation and configuration on page 279. From Example 9-5 on page 292, you can see that the InfiniBand IP takes the range of 172.31.132.* for interface nodes, 172.31.136.* for management node and 172.31.134.* for storage node. This is done by the install script while creating the SONAS cluster. Here as we see, the first two part of IP address is constant and then depending on the interface nodes, management node and storage node, the management IP address is assigned as: Interface node: 172.31.132.* Management node: 172.31.136.* Storage node: 172.31.134.*
[[email protected] cli]# cfgcluster Furby Are you sure to initialize the cluster configuration ? Do you really want to perform the operation (yes/no - default no): yes (1/6) - Prepare CIFS configuration (2/6) - Write CIFS configuration on public nodes (3/6) - Write cluster manager configuration on public nodes (4/6) - Import CIFS configuration into registry (5/6) - Write initial configuration for NFS,FTP,HTTP and SCP (6/6) - Restart cluster manager to activate new configuration EFSSG0114I Initialized cluster configuration successfully The command prompts, Do you really want to perform the operation? Type yes and press Enter to continue.
293
Verify that the cluster has been configured by running the lscluster command. This command must display the CTDB clustername you have used to configure the Cluster Manager. The output of the command is shown in Example 9-7. The public Cluster name is Furby.storage.tucson.ibm.com.
Example 9-7 Verifying the cluster details using CLI command lscluster
PrimaryServer strg001st001
SecondaryServ strg002st001
[[email protected] cli]# lsdisk Name File system Failure group Type Pool Status array0_sata_60001ff0732f8548c000000 1 system ready array0_sata_60001ff0732f8568c020002 1 system ready array0_sata_60001ff0732f8588c040004 1 system ready array0_sata_60001ff0732f85a8c060006 1 system ready array0_sata_60001ff0732f85c8c080008 1 system ready array0_sata_60001ff0732f85e8c0a000a 1 system ready array1_sata_60001ff0732f8558c010001 1 system ready array1_sata_60001ff0732f8578c030003 1 system ready array1_sata_60001ff0732f8598c050005 1 system ready array1_sata_60001ff0732f85d8c090009 1 system ready array1_sata_60001ff0732f85f8c0b000b 1 system ready array1_sata_60001ff0732f8608c0f000c 1 system ready
[[email protected] cli]# chdisk array1_sata_60001ff0732f8558c010001,array1_sata_60001ff0732f8578c030003,array1_sata_60001ff0732f 8598c050005,array1_sata_60001ff0732f85d8c090009,array1_sata_60001ff0732f85f8c0b000b, array1_sata_60001ff0732f8608c0f000c --failuregroup 2 You can verify the changed failure groups using the command lsdisk as seen in the previous section 9.5.5, Listing all available disks on page 294.
294
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Example 9-10 displays the output after changing the failure groups.
Example 9-10 Verifying the changed Failure Groups of disks using CLI command lsdisk
[[email protected] cli]# lsdisk Name File system Failure group Type Pool Status array0_sata_60001ff0732f8548c000000 1 system ready array0_sata_60001ff0732f8568c020002 1 system ready array0_sata_60001ff0732f8588c040004 1 system ready array0_sata_60001ff0732f85a8c060006 1 system ready array0_sata_60001ff0732f85c8c080008 1 system ready array0_sata_60001ff0732f85e8c0a000a 1 system ready array1_sata_60001ff0732f8558c010001 2 dataAndMetadata system ready array1_sata_60001ff0732f8578c030003 2 dataAndMetadata system ready array1_sata_60001ff0732f8598c050005 2 dataAndMetadata system ready array1_sata_60001ff0732f85d8c090009 2 dataAndMetadata system ready array1_sata_60001ff0732f85f8c0b000b 2 dataAndMetadata system ready array1_sata_60001ff0732f8608c0f000c 2 dataAndMetadata system ready
[[email protected] cli]# mkfs gpfs0 /ibm/gpfs0 -F array0_sata_60001ff0732f8548c000000,array0_sata_60001ff0732f8568c020002,array0_sata_60001ff0732f 8588c040004,array0_sata_60001ff0732f85a8c060006,array1_sata_60001ff0732f8558c010001,array1_sata_ 60001ff0732f8578c030003,array1_sata_60001ff0732f8598c050005,array1_sata_60001ff0732f85d8c090009 --master -R meta --nodmapi The following disks of gpfs0 will be formatted on node strg001st001: array0_sata_60001ff0732f8548c000000: size 15292432384 KB array0_sata_60001ff0732f8568c020002: size 15292432384 KB array0_sata_60001ff0732f8588c040004: size 15292432384 KB array0_sata_60001ff0732f85a8c060006: size 15292432384 KB array1_sata_60001ff0732f8558c010001: size 15292432384 KB array1_sata_60001ff0732f8578c030003: size 15292432384 KB array1_sata_60001ff0732f8598c050005: size 15292432384 KB array1_sata_60001ff0732f85d8c090009: size 15292432384 KB Formatting file system ... Disks up to size 141 TB can be added to storage pool 'system'. Creating Inode File 0 % complete on Wed Apr 21 16:36:30 2010 1 % complete on Wed Apr 21 16:37:08 2010 2 % complete on Wed Apr 21 16:37:19 2010 3 % complete on Wed Apr 21 16:37:30 2010 5 % complete on Wed Apr 21 16:37:35 2010 9 % complete on Wed Apr 21 16:37:40 2010 13 % complete on Wed Apr 21 16:37:45 2010 18 % complete on Wed Apr 21 16:37:50 2010 23 % complete on Wed Apr 21 16:37:55 2010 27 % complete on Wed Apr 21 16:38:00 2010 295
32 % complete on Wed Apr 21 16:38:05 2010 37 % complete on Wed Apr 21 16:38:10 2010 42 % complete on Wed Apr 21 16:38:15 2010 46 % complete on Wed Apr 21 16:38:20 2010 51 % complete on Wed Apr 21 16:38:25 2010 56 % complete on Wed Apr 21 16:38:30 2010 61 % complete on Wed Apr 21 16:38:35 2010 66 % complete on Wed Apr 21 16:38:40 2010 70 % complete on Wed Apr 21 16:38:45 2010 75 % complete on Wed Apr 21 16:38:50 2010 80 % complete on Wed Apr 21 16:38:55 2010 84 % complete on Wed Apr 21 16:39:00 2010 89 % complete on Wed Apr 21 16:39:05 2010 94 % complete on Wed Apr 21 16:39:10 2010 99 % complete on Wed Apr 21 16:39:15 2010 100 % complete on Wed Apr 21 16:39:16 2010 Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool 'system' 20 % complete on Wed Apr 21 16:39:33 2010 38 % complete on Wed Apr 21 16:39:38 2010 57 % complete on Wed Apr 21 16:39:43 2010 74 % complete on Wed Apr 21 16:39:48 2010 92 % complete on Wed Apr 21 16:39:53 2010 100 % complete on Wed Apr 21 16:39:55 2010 Completed creation of file system /dev/gpfs0. EFSSG0019I The filesystem gpfs0 has been successfully created. EFSSG0038I The filesystem gpfs0 has been successfully mounted. EFSSG0140I Applied master role to file system gpfs0 EFSSG0015I Refreshing data ... In Example 9-11 here, the filesystem gpfs0 is created with replication of the MetaData set and hence uses disks from 2 failure groups. The second failure group was created in the previous Example 9-9 on page 294. The filesystem is also marked as the Master filesystem. Master filesystem is a unique filesystem in the SONAS appliance. This filesystem holds the shared information that is used by the Cluster Manager, CTDB. You can verify the creation of filesystem using the lsfs command. Example 9-12 displays the output for the newly created filesystem.
Example 9-12 Verifying the creation of file system using CLI command lsfs
[[email protected] cli]# lsfs Cluster Devicename Mountpoint Furby.storage.tucson.ibm.com gpfs0 /ibm/gpfs0 Attention: The actual information displayed on the panel has many more fields than that are shown in Example 9-12, and is too large to show in this example. This example has been simplified to ensure the important information is clear.
296
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The command lsdisk shows the list of disks used for the gpfs0 filesystem (Example 9-13).
Example 9-13 Verifying the disks used for the file system created using CLI command lsdisk
lsdisk Name array0_sata_60001ff0732f8548c000000 array0_sata_60001ff0732f8568c020002 array0_sata_60001ff0732f8588c040004 array0_sata_60001ff0732f85a8c060006 array1_sata_60001ff0732f8558c010001 array1_sata_60001ff0732f8578c030003 array1_sata_60001ff0732f8598c050005 array1_sata_60001ff0732f85d8c090009 array0_sata_60001ff0732f85c8c080008 array0_sata_60001ff0732f85e8c0a000a array1_sata_60001ff0732f85f8c0b000b array1_sata_60001ff0732f8608c0f000c
File system gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0
Failure group Type Pool Status 1 dataAndMetadata system ready 1 dataAndMetadata system ready 1 dataAndMetadata system ready 1 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 1 system ready 1 system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready
up up up up up up up up
As you can see, the disk mentioned at creation of filesystem is now part of the filesystem (gpfs0 in the example) and it includes disks from both the failure groups.
[SONAS]$ setnwdns 9.11.136.116 In the second example, the setnwdns command is run with a single DNS server with IP address 9.11.136.116 along with a domain name of storage.ibm.com and single search string as servers.storage.ibm.com is used (see Example 9-15).
Example 9-15 Configuring DNS with DNS server IP, domain name and Search string
[SONAS]$ setnwdns 9.11.136.116 --domain storage.ibm.com --search servers.storage.ibm.com In the third example, the setnwdns command is run with multiple DNS servers having IPs 9.11.136.116 and 9.11.137.101, domain name of storage.ibm.com and multiple search strings such as servers.storage.ibm.com, storage.storage.ibm.com (see Example 9-16).
Example 9-16 Configuring DNS with DNS server IP, domain name and multiple search strings
297
For our example cluster setup, we use the setnwdns with three search string options, storage3.tucson.ibm.com, storage.tucson.ibm.com, and sonasdm.storage.tucson.ibm.com as shown in Example 9-17. Here our DNS Server IPs are 9.11.136.132 and 9.11.136.116.
Example 9-17 Configuring DNS with DNS server IP and multiple Search strings using CLI command mknw
[[email protected] cli]# setnwdns 9.11.136.132,9.11.136.116 --search storage3.tucson.ibm.com,storage.tucson.ibm.com,sonasdm.storage.tucson.ibm.com To verify that the DNS Server IP Address and Domain have been successfully configured, check the content of the resolv.conf file on each management and interface node. Keep in mind, the management node and interface nodes are the only nodes accessible from your network and hence only these nodes are used to set up DNS. Steps to verify the DNS configuration are shown in Example 9-18.
Example 9-18 Verifying that the DNS has been successfully configured
[[email protected]]$ onnode all cat /etc/resolv.conf >> NODE: 172.31.132.1 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.2 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.3 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.4 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.5 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.6 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.136.2 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116
298
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Example 9-18 on page 298, the SONAS setup has one management node and six interface nodes. The DNS server IP used is 9.11.136.116 and three search strings, storage3.tucson.ibm.com, storage.tucson.ibm.com and sonasdm.storage.tucson.ibm.com are used. The management node IP is: 172.31.136.2 and interface node IPs are: 172.31.132.* as described in 9.5.3, Understanding the IP addresses for internal networking on page 292.
[[email protected]]$ mknwnatgateway 9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17 mgmt001st001,int001st001,int002st001,int003st001,int004st001,int005st001,int006st001 EFSSG0086I NAT gateway successfully configured. As you can see in Example 9-19, the Public NAT gateway IP is 9.11.219.245, Interface is ethX0, default gateway is 9.11.136.1, private network IP address is 172.31.128.0/17 and the nodes specified are management nodes and the six interface nodes. This means, all the Management and interface nodes talk to the outside word on their public IP through the NAT Gateway. Confirm the NAT has been configured using the CLI command, lsnwnatgateway as shown in Example 9-20.
Example 9-20 Verifying that the NAT Gateway has been successfully configured using CLI command lsnwnatgateway
[[email protected]]$ lsnwnatgateway Public IP Public interface Default gateway Private network Nodes 9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17, 172.31.136.2,172.31.132.1,172.31.132.2,172.31.132.3,172.31.132.4,172.31.132.5,172.31.132.6 Another way to check that the NAT Gateway has been successfully configured is to check if the management nodes and interface nodes can ping the gateway specified (see Example 9-21).
299
Example 9-21 Verifying that all the Nodes of the cluster can ping the NAT Gateway
onnode all ping -c 2 9.11.137.246/23 >> NODE: 172.31.132.1 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.034 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.022 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.022/0.028/0.034/0.006 ms >> NODE: 172.31.132.2 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.034 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.034/0.034/0.035/0.005 ms >> NODE: 172.31.132.3 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.029 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.023 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.023/0.026/0.029/0.003 ms >> NODE: 172.31.132.4 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.028 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.024 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.024/0.026/0.028/0.002 ms >> NODE: 172.31.132.5 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.022 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.022/0.028/0.035/0.008 ms >> NODE: 172.31.132.6 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.036 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.016 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms 300
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
rtt min/avg/max/mdev = 0.016/0.026/0.036/0.010 ms >> NODE: 172.31.136.2 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.027 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.020 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.020/0.023/0.027/0.006 ms A successful ping shows that the NAT gateway has been successfully configured.
[[email protected]]$ cfgad -as 9.11.136.132 -c Furby.storage.tucson.ibm.com -u Administrator -p Ads0nasdm (1/11) Parsing protocol (2/11) Checking node accessibility and CTDB status (3/11) Confirming cluster configuration (4/11) Detection of AD server and fetching domain information from AD server (5/11) Checking reachability of each node of the cluster to AD server 301
(6/11) Cleaning previous authentication configuration (7/11) Configuration of CIFS for AD (8/11) Joining with AD server (9/11) Configuration of protocols (10/11) Executing the script configADForSofs.sh (11/11) Write auth info into database EFSSG0142I AD server configured successfully Now verify that the cluster is now part of the Active Directory (AD) domain using the chkauth command as shown in Example 9-23.
Example 9-23 Verifying that the Windows AD server has been successfully configured.
[[email protected]]$ chkauth -c Furby.storage.tucson.ibm.com -t Command_Output_Data UID GID Home_Directory Template_Shell CHECK SECRETS OF SERVER SUCCEED
[[email protected]]$ cfgldap -c Furby.storage.tucson.ibm.com -d storage.tucson.ibm.com -lb dc=sonasldap,dc=com -ldn cn=Manager,dc=sonasldap,dc=com -lpw secret -ls sonaspb29 -ssl tls -v Now verify that the cluster is now part of the LDAP sever using the chkauth command as show in Example 9-25.
Example 9-25 Verifying that the LDAP server has been successfully configured
[[email protected]]$chkauth -c Furby.storage.tucson.ibm.com -t Command_Output_Data UID GID Home_Directory Template_Shell CHECK SECRETS OF SERVER SUCCEED
302
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Example 9-26 Configuring the Data Path IP using the CLI command mknw
[[email protected]]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 add 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15 Verify that the Data Path IP Address has been successfully configured using the CLI command lsnw as shown in Example 9-27.
Example 9-27 Verifying that the Network is successfully configured using CLI command lsnw
The previous command is used with no VLAN. You can also run with VLAN option as shown in Example 9-28. In the example, 101 is the identification number of the VLAN.
Example 9-28 Configuring the Data Path IP with VLAN using the CLI command mknw
[[email protected]]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 --vlan 101 add 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15 Verify that the command is successful by running CLI lsnw command. Example 9-29 shows sample output.
Example 9-29 Verifying that the network is successfully configured using CLI command lsnw
[[email protected]]$ mknwgroup int int001st001,int002st001,int003st001,int004st001,int005st001,int006st001 Verify that the command is successful by running CLI command lsnwgroup as seen in Example 9-31.
Example 9-31 Verifying that the Data Path IP Group has been successfully configured using CLI command lsnwgroup
303
[[email protected]]$ attachnw 9.11.136.0/23 ethX0 -g int Verify that the command is successful by running the CLI command lsnw as shown in Example 9-33.
Example 9-33 Verify that the Data Path IP has been successfully attached using CLI command lsnw
[[email protected]]$ lsnw -r
Network VLAN ID Network Groups IP-Addresses Routes 9.11.136.0/23 int 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15
304
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Example 9-34 Creating Data export using CLI command mkexport [[email protected]]$ mkexport shared /ibm/gpfs0/shared --nfs "*(rw,no_root_squash,async)" --ftp --cifs browseable=yes,comment="IBM SONAS" --owner "STORAGE3\eebbenall" EFSSG0019I The export shared has been successfully created.
Verify that the exports are created correctly using the lsexport command as shown in Example 9-35.
Example 9-35 Verifying that the export has been successfully created using CLI command lsexport [[email protected]]$ lsexport -v Name Path Protocol shared /ibm/gpfs0/shared FTP shared /ibm/gpfs0/shared NFS shared /ibm/gpfs0/shared CIFS Active true true true Timestamp Options 4/14/10 6:13 PM 4/14/10 6:13 PM *=(rw,no_root_squash,async,fsid=693494140) 4/14/10 6:13 PM browseable,comment=IBM SONAS
$ chgrp "STORAGE3\domain users" /ibm/gpfs0/shared 2. Use a Windows workstation on your network to modify the ACLs in order to provide the appropriate authorization. The following sub-steps can be used as a guide: a. Access the shared folder using Windows Explorer. Owner: This procedure must be used by the owner. b. Right-click the folder, and select Sharing and Security... c. Use the functions on the Sharing tab and/or the Security tab to set the appropriate authorization. 3. If a Windows workstation is not available for modifying the ACLs, use the following sub-steps to manually edit the ACLs: VI editor: This requires manual editing of the ACL file using the VI editor, so it must only be used by those who are familiar with the VI editor. You need to be root in order to execute this command.
305
4. Run the command shown in Example 9-37 using the VI editor to modify ACLs. Specify that you want to use VI as the editor in the following way: $ export EDITOR=/bin/vi
Example 9-37 View GPFS ACLs to give access to users using GPFS command mmgetacl $ export EDITOR=/bin/vi Type mmeditacl /ibm/gpfs0/shared and press Enter. The following screen is displayed: #NFSv4 ACL #owner: STORAGE3\eebenall #group: STORAGE3\domain users special:owner@:rwxc:allow (X)RbEAD/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
5. Now change the ACLs by adding the text shown in Example 9-38 in bold.
Example 9-38 Adding new group to export using the GPFS command mmeditacl #NFSv4 ACL #owner: STORAGE3\eebenall #group: STORAGE3\domain users group:STORAGE3\domain admins:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
306
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
6. Verify that the new user/group has been added by running the command mmgetacl for the directory whose ACLs were changed. The output (see output in Example 9-39) must include the newly added user/group shown in Example 9-37 on page 306.
Example 9-39 Verifying that the new group was successfully added to export using GPFS command mmgetacl $ mmgetacl /ibm/gpfs0/shared #NFSv4 ACL #owner: STORAGE3\administrator #group: STORAGE3\domain users group:domain1\domain admins:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED
b. If an Alert is displayed, warning you about an invalid security certificate, click OK.
307
308
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
A new window appears as shown in Figure 9-10. Click Get Certificate, and click Confirm Security Exception.
d. At the Integrated Solutions Console login panel, log in with User ID root and the root password as shown in Figure 9-11.
Figure 9-11 Login into Management GUI Interface with root user ID and Password
e. If you are asked if you want Firefox to remember this password, click Never for This Site. f. The first time you log into the GUI, you will be asked to accept the software license agreement. Follow the instructions on the panel to accept the software license agreement.
Chapter 9. Installation and configuration
309
g. Click Health Summary. h. Click Alert Log. The Alert Log will be displayed. i. Review the Alert Log entries. Figure 9-12 shows an example of Alert Log.
Attention: It is normal for one or more informational entries (entries with a severity of info) to be in the log following the installation. These entries can be ignored. j. If any problems are logged, click the Event ID for more information. The Information Center will be displayed with information about the Event ID. k. If a Firefox prevented this site from opening a pop-up window. message is displayed, click Preferences, and click Allow pop-ups for localhost. l. Resolve any problems by referring to the Problem Determination guide in the Information Center. If unable to resolve a problem, contact your next level of support. m. When any problems have been resolved, clear the System Log by clicking System Log then clicking Clear System Log. n. When you are finished using the SONAS GUI, click Logout. The Scale Out File Services login panel will be displayed. o. Close the browser by clicking X. The Linux desktop will be displayed. p. Log out by selecting System Log Out root. 310
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
3. Connect the Ethernet network cables. Cables: Connecting the customer Ethernet cables is a customer responsibility. 4. Connect each cable to an available Ethernet port on the interface node. 5. If the rack contains another interface node, repeat the steps in this section until all interface nodes in the rack have been cabled. 6. If you are installing more than one rack, repeat the steps in this section until all interface nodes in all of the racks you are installing have been cabled. The IBM SONAS system is now ready for use.
311
312
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
10
Chapter 10.
SONAS administration
In this chapter we provide information about how you use the GUI and CLI to administer your SONAS. Daily administrator tasks are discussed and examples provided.
313
314
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
When logged in, you will be able to see the panel as shown in Figure 10-2.
The left frame of the GUI in Figure 10-2 shows the collapsed view of all the categories existing in the GUI. Figure 10-3 illustrates the various areas on the GUI navigation panel. To the left we have the main navigation pane which allows us to select the component we want to view or the task we want to perform. On the top we see the currently logged-in administrative user name and just below that we find the navigation tabs that allow you to switch between multiple open tasks. Underneath we have a panel that contains context-sensitive help, minimize, and maximize buttons at the top right. W e then have action buttons and table selection, sorting and filtering controls. Below that we see a table list of objects. At the bottom right is a refresh button that shows the time the data was last collected and will refresh the displayed data when pressed. Clicking an individual object brings up a detailed display of information for that object.
315
Main Navigation
Logged in User Tab Navigation Help, min/maximize Panel Action Buttons Table select, sort, filter
The left frame expanded view of all the tasks are as shown in Figure 10-4. As seen on the URL bar, you provide the MGMT GUI IP address or Management Node hostname along with right path to access GUI as mentioned previously. When logged in, on the main page, around the top center, you can see the CLI user name who is currently logged in to the GUI. On the right corner, you will also see link to logout from GUI. The left frame is a list of categories which provide links to perform any task on the cluster. Click the links on the left to open the corresponding panel on the right. Next, we describe the categories at a high level.
316
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-4 Expanded view of the left panel with all tasks available in SONAS GUI
The GUI categories are divided into several tasks seen as links in the left frame. Here is a brief description of these tasks: 1. Health Center: This panel shows the health of the cluster, its nodes. It gives a topological view of the cluster and its components. It also provides the logs of system and alert logs of the system. It provides additional features such as Call home. 2. Clusters: This panel allows you to manage the Cluster, the Interface Nodes, Storage Nodes. 3. Files: All file system related tasks can be performed in this panel. 4. Storage: Storage at the back-end can be managed using the tasks available. 5. Performance and Reports: SONAS GUI provides you with elegant reports and graphs of various parameters that you can measure such as File system utilization, Disk utilization, and others. 6. SONAS Console Settings: In this section, you can enable threshold limits for Utilization Monitoring. You can also view the tasks scheduled on the SONAS system. Also, in case of any notification required for crossing any threshold values, you can set notification to send emails. Managing this can be done using this panel. 7. Settings: In this Panel, you can manage users and also enable tracing.
Chapter 10. SONAS administration
317
In the next section, we discuss each of the categories and underlying tasks.
Health Summary
This category allows you to check the health of the cluster including the Interface Nodes, Storage Nodes and the Management Nodes. It consists of 3 panels: 1. Topology: This panel displays a graphical representation of the SONAS Software system topology. It provides information about the Management Nodes, Interface Nodes, Storage Nodes. It includes the state of the data networks and Storage Blocks. It also shows information related to the filesystem, such as number of Filesystems existing, number Mounted, number of Exports and more (Figure 10-5). You can click each component and see further details about each.
318
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2. Alert Log: The Alert log panel displays the alert events that are generated by the SONAS Software. Each page has around 50 logs displayed. Severity of event can be Info, Warning or Critical. They are displayed in Blue, Yellow and Red respectively. You can filter logs in the table depending on the severity, time period of logs and source. Source of logs is the host on which the event occurred on. See Figure 10-6.
319
3. System Log: This panel displays system log events that are generated by the SONAS Software, which includes management console messages, system utilization incidents, status changes and syslog events. Each page displays around 50 logs. System logs are of 3 levels, Information (INFO), Warnings (WARNING) and Severe (SEVERE). You can filter the logs by the log level, component, host and more. Figure 10-7 shows how the System log panel in the GUI looks.
Figure 10-7 System Logs in the GUI for the SONAS cluster
Clusters
This panel allows you to administer the cluster including the Interface Nodes and storage Nodes. It allows you to modify the cluster configuration parameters. Each panel and its tasks are discussed in the following section: 1. Clusters: a. Add/Delete cluster to Management Node: The GUI allows you to manage not just the cluster it is a part of, but also other clusters. You can also delete the cluster from the GUI n order to stop managing it. You can add the cluster you want to manage using the Add cluster option in the Select Action drop-down box. This opens a new panel (in the same window) in which you need to add IP Address of one of the nodes of the cluster and its password. The cluster is identified and is added into the GUI. Figure 10-8 on page 321 shows how you can add the cluster.
320
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-8 Cluster Page on the GUI. Select Action to add cluster in the GUI
You can also Delete the cluster previously added to the GUI, by selecting it using the check box present before the name of the cluster and clicking the Delete cluster option in the Select Action drop box. It will ask you for your confirmation before deleting it. Figure 10-9 shows how you can delete the cluster added to the GUI.
Figure 10-9 Cluster Page on the GUI. Select Action to delete cluster in the GUI
321
b. View Cluster Status: This panel displays the clusters that have been added to the GUI. See Figure 10-10.
c. Nodes: This panel is one of the tabs on the lower side of the clusters panel. Here, you can view the status of connection for all the nodes such as Management Node, Interface Nodes and Storage Nodes. Figure 10-11 shows the view of the Nodes Panel.
Clicking the links of the Nodes shown in blue in Figure 10-12 will take you to the respective Interface Node, Storage Node Panel explained in points 2 and 3 of the section Clusters on page 320. d. File Systems: This panel displays all the filesystem on the SONAS appliance. It shows other information of the filesystem such as the mount point, the size of the filesystem, free space, used space and more. See Figure 10-12. This is a read-only panel for viewing.
322
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
e. Storage Pools: The storage pool panel displays the information of the various Storage Pools existing. It displays which filesystem belongs to the Storage pool and its capacity used. See Figure 10-13. This is just a read only panel for viewing. You cannot modify any Storage Pool parameters.
f. Services: The Services panel shows the various Services that are configured on the SONAS appliance. It also shows its status whether Active or Inactive. The services that are supported are FTP, CIFS, NFS, HTTP and SCP. These services are required to configure the data exports on SONAS. End users can access data stored in SONAS using these data exports. Hence, services need to be configured in order to share the data to be able to be accessed by using one of the services. You cannot modify the status of the services from this panel. See Figure 10-14.
323
g. General Options: This option allows to view and modify the Cluster configuration. It allows you to modify some of its Global options as well as node specific parameters. You can also view the cluster details such as cluster name, cluster id, primary and secondary servers and many more. See Figure 10-15.
h. Interface Details: This panel allows you to view and modify the Cluster Managers (CTDB) configuration details. The panel in Figure 10-16 is where you can see the Netbios Name, Workgroup name, if the Cluster Manager manages winbind and more. This panel is read only.
324
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The Advance Options button, on the panel shown in Figure 10-17, allows you to view and modify the CTDB configuration parameters. You can modify the reclock path and other Advanced Options of the CTDB. CTDB manages many services and has a configurable parameter for each. You can modify each to allow CTDB to manage or to not manage. A few of the parameters are as follows: CTDB_MANAGES_VSFTPD, CTDB_MANAGES_NFS CTDB_MANAGES_WINBIND CTDB_MANAGES_HTTPD CTDB_MANAGES_SCP
By default, these values are set to yes and CTDB manages them. You can modify it to no if you want CTDB to not manage these services. In case CTDB is not managing the services, anytime the service goes down, CTDB will not notify by going unhealthy. It will remain in the OK state. In order to monitor the service, set this value to yes.
Figure 10-17 Advanced Options under Interface details seen as CTDB information
325
2. Interface Nodes: The Interface Nodes panel allows you to view the node status. It displays the public IP for each node, the active IP address it is servicing, CTDB status and more. You can also carry out operations on the node such as, Suspend, Resume node, Restart the node or Recover CTDB. To do so, you need to select the Node on which you want to perform the action and then select the button respectively. You must also select the Cluster whose Interface Nodes you want to check by selecting the cluster from the Active cluster drop-down menu. Figure 10-18 shows the Interface Nodes Panel.
Figure 10-18 Interface Node details and operations that can be performed on them
3. Storage Nodes: This panel displays the information of the Storage Nodes. It displays the IP address of the Storage Nodes, their Connection Status, GPFS status and more. It also allows you to Start and Stop the Storage Nodes using the Buttons, Start and Stop. You need to select the Storage Nodes which you want to Start/Stop and click the Start or Stop button respectively. You must also select the Cluster whose Storage Nodes you want to check by selecting the cluster from the Active cluster drop-down menu. You will notice that the first Storage Node is highlighted by default. Below in the same pane, you can see the details of the Storage Node such as Hostname, Operating System, Host IP Address, Controller Information and more. Figure 10-19 shows the Storage Node GUI panel.
326
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-19 Storage Node details and operations that can be performed on them
Files
This panel allows you to carry out file system related tasks. You can create File Systems, Filesets, Exports, Snapshots and many more. You must select the Cluster on which you want to perform the tasks by selecting the cluster from the Active cluster drop-down menu. Each of the tasks that can be performed will be described in the following section: 1. File Systems: This panel has four sections as described here: a. File System: This section allows you to create File Systems. The underlying File system that a SONAS appliance creates is a GPFS clustered File System. If there is a filesystem already existing, it displays the basic information about the filesystem such as Name, Mount Point, Size, Usage and more. You can also perform operations such as Mount, Unmount and Remove a Filesystem. The buttons on that panel help perform these tasks. See Figure 10-20. In case the filesystem extends to next page, you can click the arrow button to move to the next page and back. The table also has a refresh Button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individually, select all or select inverse the exports in this table.
327
Figure 10-20 File system list and operations that can be performed on them
b. File System Configuration: This section displays the configuration details of the highlighted File System. It shows information about Device Number, ACL Type, Number of Inodes, Replication details, Quota details, Mount information and more. It also allows you to modify the ACL Type, Locking Type, Number of Inodes for the File System. Click the Apply button to apply the new configuration parameters. See Figure 10-21.
328
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
c. File System Disks: This section displays the disks used for the filesystem. It also displays the disk usage type. You can add disks to the file system by clicking the Add a disk to the file system button, or remove disks from the File System by selecting the disk and clicking the Remove button. See Figure 10-22.
d. File System Usage: This section displays the File System Usage information such as the number of Free Inodes, Used Inodes, the Storage pool usage, and details. See Figure 10-23.
329
2. Exports: This panel displays the exports created on the SONAS appliance for clients to access the data. It also allows you to create new Exports, Delete Exports, Modify exports and more. You can also modify the configuration parameters for the protocols such as CIFS and NFS. Next we describe each of the sections: a. Exports: This section displays all the exports that are created along with details such as the Sharename, Path of Directory, Protocols configured for the export. You can add a new export by using the Add button. For existing exports, you can carry out operations such as, Modify the Export, Remove protocols, Activate or Deactivate an export and Remove the export by selecting the export you want to perform an operation on and click the respective button. Figure 10-24 shows the panel with some existing exports as examples. As you can see there are four pages of exports. You can click the arrow button to move to the next page and back. The table also has a Refresh button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individual exports, select all or select inverse the exports in this table. By default, the first export is highlighted and protocol details of the exports are displayed in the lower section, explained in detail.
Figure 10-24 Exports existing in the cluster and operations you can perform on them
b. CIFS Export Configuration: This section displays the CIFS export configuration details of the highlighted export. As seen in Figure 10-24, the first export is highlighted by default. You can select other exports from the table. This panel displays the configured parameters such as Comment, Browsable options and Read-only option for the CIFS export and also allows you to modify them. You can also use the Add, Modify and Remove buttons to add, modify and remove the Advanced Options, if any. Click Apply to apply new configuration parameters. Figure 10-25 explains the panel.
330
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
c. NFS Export Configuration: This section displays the list of NFS clients configured to access the NFS Exports and their Options. You can modify existing client details using the edit link in the table, remove the client using the remove link and also add new client using the Add Client button. Click the Apply button to apply the changes. See Figure 10-26.
3. Policies: This panel displays and also allows you to set Policies for the file systems existing. Policy is a rule that you can apply to your File System. It is discussed in detail in Call Home test on page 434. The Policies panel has two sections: a. Policies List: This section allows you to view the policies set for the filesystem available. By default the first filesystem is highlighted and its policy details are shown in the lower section of the panel. You can set default policy to a filesystem by clicking the Set Default Policy button. Figure 10-27 shows the Policy Panel.
331
Figure 10-27 Policies listed for the file systems in the cluster
In the previous example, currently there is no policy set for the filesystem. b. Policy Details: This section shows the Policy details for the filesystem. There is a policy editor that exists which is a text box where you can write new policies for the file system. You can apply the policy using the Apply Policy button or set the policy by clicking the Set Policy button on the right of the editor. You can also load policies using the Load Policy button. See Figure 10-28.
332
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
4. File Sets: This panel displays the File Sets existing in the file system. You can choose the file system whose File Sets you want to view by choosing the filesystem from the Active file system drop-down list along with the active cluster from the Active Cluster drop-down menu. The table then displays all the File Sets that exist for the filesystem. The root fileset is created by the system. This is the default one and is created when you create the first filesystem. In the table, you can also view the details of the filesets such as Name, Path linked to, Status and more, in the lower section of the table. You need to highlight or select the fileset whose details you want to see. By default the first File Set is highlighted and in the lower section of the panel, you can view and modify other details of the File Set. You can view other File Set details by clicking and highlighting the one you want to view. Figure 10-29 shows the list of all the file sets and information about the file set which is highlighted. In our example, we have just the root file set listed.
Figure 10-29 Listing Filesets in the cluster and displaying information of the fileset
333
5. You can use the Create a File Set button to create a new one. You can also delete existing file sets or unlink the existing file sets by selecting the file sets that you want operate on and clicking the Delete or Unlink button respectively. Quota: The clustered file system allows enabling quota and assigning quotas to the users, groups on filesets and file systems. There are soft limits and hard limits for disk space and for number of i-nodes. There is also Grace time available when setting quotas. These concepts are described here: Soft Limit Disk: The soft limit defines a level of disk space and files below which the user, group of users or file set can safely operate. Specify soft limits for disk space in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Hard Limit Disk: The hard limit defines the maximum disk space and files the user, group of users or file set can accumulate. Specify hard limits for disk space in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Soft Limit I-nodes: The i-node soft limit defines the number of i-nodes below which a user, group of users or file set can safely operate. Specify soft limits for i-nodes in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Hard Limit I-nodes: The i-node hard limit defines the maximum number of i-nodes that a user, group of users, or file set can accumulate. Specify hard limits for i-nodes in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Grace Time: Grace time allows the user, group of users, or file set to exceed the soft limit for a specified period of time (the default is one week). If usage is not reduced to a level below the soft limit during that time, the quota system interprets the soft limit as the hard limit and no further allocation is allowed. The user, group of users, or file set can reset this condition by reducing usage enough to fall below the soft limit. Figure 10-30 shows the screen capture of how Quota looks in the GUI. On a SONAS appliance, as of now the GUI only allows read only access to quota which means, you can only view the quota but not enable or set quota. In our example, this is the default quota displayed for the file systems for the user root.
334
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
6. Snapshots: This panel displays the Snapshots existing in the file system. You can choose the file system whose File Sets you want to view by choosing the filesystem from the Active file system drop-down list along with the active cluster from the Active Cluster drop-down menu. In the table that lists the snapshots, you can also see other details such as Name, Status, Creation Time Stamp, and more. 7. You can remove an already existing snapshot from the cluster. To do this, select the snapshot you want to remove and click the Remove button. You can also create snapshots using the Create a new Snapshot of the active cluster and filesystem button. By default, the first snapshot is selected and highlighted. In the lower section of the panel, you can see the details of the snapshot. You can choose another snapshot from the list to see its corresponding details (Figure 10-31).
Figure 10-31 Snapshot lists that exist in cluster and its details
335
Storage
This panel allows you to view the Storage Disks and Pool details. You can perform certain operations on them such as Remove Disk, Suspend or Resume Disks. You can also view the Storage Pools available and its usage details. You must select the Cluster on which you want to perform the tasks by selecting the cluster from the Active cluster drop-down menu. We describe each of the tasks that can be performed in the following section: 1. Disks: This panel displays the disks that are available in the SONAS appliance and its information such as Usage, Filesystem it is attached to, Status, Failure Group, Storage Pool it belongs to and more. The table also has a refresh Button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individually, select all or select inverse the exports in this table. You can filter the table using filter parameters. By default, the first disk is highlighted and below in the lower end of the pane, other details of the disks are displayed. This includes Volume ID, Sector Size, List of disk servers they reside on and more. See Figure 10-32.
Figure 10-32 List of Storage disks in the cluster and their details
336
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2. Storage Pools: This panel displays the Storage Pool list for a file system. The main table displays the filesystems existing in the cluster. It also displays the Pool Usage and i-node Usage for the filesystem. By default, the first filesystem in the list is highlighted and for this filesystem in the lower section of the panel, you can see the Storage Pool related details such as Number of Free i-Nodes, Maximum I-Nodes, Allocated i-Nodes. It also displays the size of the pool. You can also see Size, Free Blocks and Fragment details of the NSD or disks in the system. See Figure 10-33.
Figure 10-33 Storage pools existing in the cluster and their details
337
Figure 10-34 System Utilization details for the Nodes in the cluster.
338
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2. File System Utilization: This panel generates charts that illustrate the utilization of the specified file system. The table shows the filesystem that exist in the cluster. It also displays other details such as Clustername, Disk usage for filesystem, and more. At the start, no chart is displayed until you select the Filesystem, Duration and click the Generate Charts button. See Figure 10-35.
Figure 10-35 File system Utilization for the file systems in the cluster
339
Each of the tasks that can be performed will be described in the following section: 1. Utilization Thresholds: This panel list all thresholds for various utilization monitors per cluster. A corresponding log message is generated for a monitor if the warning respectively the error level value is exceeded by the values measured the last recurrences times. The table displays the details of all the thresholds added such as their Warning level, Error level and more. You can remove a threshold previously added by selecting it and clicking the Remove button. You can add new Thresholds using the Add Threshold button. See Figure 10-36. Generation of charts is also explained in detail for system utilization in 10.9.1, System utilization on page 411 and file system utilization in 10.9.2, File System utilization on page 413.
2. Scheduled Tasks: This panel allows you to view and manage the tasks. SONAS has a list of predefined task for the management node. A predefined task can be a GUI task or a cron task. GUI tasks can be scheduled only one time and only run on the management node. Whereas cron tasks can be scheduled multiple times and for the different clusters managed by the management node. Cron tasks are predefined to run either on all nodes of the selected cluster or on the recovery master node only. You can add new tasks and remove or execute existing tasks too. This panel has two sections. First part is the table that lists all the tasks that are already scheduled. The lower section is the details about each tasks. The two sections are explained next: a. Tasks List: This is the upper section of the pane. It lists the tasks that are already scheduled in the form of a table. It includes the Task Name, Schedule, Execution Node, Status of last run and more. You can execute or remove any task in the list by selecting the task and clicking the Execute or Remove button respectively. You can also add a new task using the Add Task button. See Figure 10-37. You can select any other tasks to see its details as displayed in the next section. You can also select single or multiple nodes and filter out using filter parameters. You can select the arrow button to view tasks in the next page if any.
340
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
b. Task Details: By default, the first task is highlighted in the table. You can change the selection by clicking any other tasks from the table. Upon selecting the task, its details is shown in the lower section of the pane. The details include Task name, Description, Task Parameter, Schedule time and more. See Figure 10-38.
341
3. Notification Settings: This panel allows you to define notification settings for the selected cluster. Choose the Default option in the drop-down menu to apply the settings as default values for all clusters. The panel, as you can see in Figure 10-39, has a lot of options that you can choose to set notifications for. You can set it for Utilization monitoring, GUI, Syslg events, Quota Checking and many more. You must also fill out the General E-mail Settings section of the panel with email addresses and details, so that upon any event generated in case any threshold has been reached, the respective users will receive a notification email. Describe Header or/and Footer to the email if required. To finish, complete the SNMP Settings section with the required details and make sure that you click the Apply button to save your settings.
4. Notification Recipients: This panel lists all the recipients who are configured to receive notification emails in case certain threshold that you are monitoring has been crossed. Select the cluster from the Active Cluster drop-down menu. The table lists the Name, email ID, Status and more. You can remove an existing user added. See Figure 10-40.
342
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
5. Contact Information: The internal contact information is used as reference data only. You can enter the data for the internal contact who has been chosen to address any SONAS questions or issues. The details you must add are: Customer name, Main phone contact, Site phone contact, E-mail contact, Location, Comment. You can do so using the Edit button. See Figure 10-41.
343
Settings
This panel allows you to manage the Console settings and tracing. It also allows you to manage users allowed to access SONAS using the Management Interface (GUI). The two sections are described briefly next: 1. Console Logging and Tracing: This panel allows you to view and modify the configuration properties of the console server diagnostic trace services. Changes to the configuration take affect after clicking OK. See Figure 10-42.
2. Console User Authority: Configuration for adding, updating, and removing Console users. See Figure 10-43.
344
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-44 CLI user logging in to the Management Node from a Linux client
345
Example 10-1 contains a list of commands that are available for a CLI user.
Example 10-1 Command list for a CLI user
[Furby.storage.tucson.ibm.com]$ help Known commands: addcluster Adds an existing cluster to the management. attachnw Attach a given network to a given interface of a network group. backupmanagementnodeBackup the managament node cfgad configures AD server into the already installed CTDB/SMABA cluster.Previously configured authentication server settings will be erased cfgbackupfs Configure file system to TSM server association cfgcluster Creates the initial cluster configuration cfghsm Configure HSM on each client facing node cfgldap configure LDAP server against an existing preconfigured cluster. cfgnt4 configure NT4 server against an existing preconfigured cluster. cfgsfu Configures user mapping service for already configured AD cfgtsmnode Configure tsm node. chdisk Change a disk. chexport Modifies the protocols and their settings of an existing export. chfs Changes a new filesystem. chfset Change a fileset. chkauth Check authentication settings of a cluster. chkpolicy validates placement rules or get details of management rules of a policy on a specified cluster for specified device chnw Change a Network Configuration for a sub-net and assign multiple IP addresses and routes chnwgroup Adds or removes nodes to/from a given network group. chuser Modifies settings of an existing user. confrepl Configure asynchronous replication. dblservice stop services for an existing preconfigured server. detachnw Detach a given network from a given interface of a network group. eblservice start services for an existing preconfigured server. enablelicense Enable the license agreement flag initnode Shutdown or reboot a node linkfset Links a fileset lsauth List authentication settings of a cluster. lsbackup List information about backup runs lsbackupfs List file system to tsm server and backup node associations lscfg Displays the current configuration data for a GPFS cluster. lscluster Lists the information of all managed clusters. lsdisk Lists all discs. lsexport Lists all exports. lsfs Lists all filesystems on a given device in a cluster. lsfset Lists all filesets for a given device in a cluster. lshist Lists system utilization values lshsm Lists configured hsm file systems cluster lslog Lists all log entries for a cluster. lsnode Lists all Nodes. lsnw List all public network configurations for the current cluster lsnwdns List all DNS configurations for the current cluster lsnwgroup List all network group configurations for the current cluster lsnwinterface List all network interfaces lsnwnatgateway List all NAT gateway configurations for the current cluster lsnwntp List all NTP configurations for the current cluster lspolicy Lists all policies 346
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
lspool Lists all pools. lsquota Lists all quotas. lsrepl List result of the asynchronous replications. lsservice Lists services lssnapshot Lists all snapshots. lstask Lists all (background) tasks for the management node. lstsmnode Lists defined tsm nodes in the cluster lsuser Lists all users of this mangement node. mkexport Creates a new export using one or more protocols. mkfs Creates a new filesystem. mkfset Creates a fileset mknw Create a new Network Configuration for a sub-net and assign multiple IP addresses and routes mknwbond Makes a network bond from slave interfaces mknwgroup Create a group of nodes to which a network configuration can be attached. See also the commands mknw and attachnw. mknwnatgateway Makes a CTDB NAT gateway mkpolicy Makes a new policy into database mkpolicyrule Appends a rule to already existing policy mksnapshot creates a snapshot from a filesystem mktask Schedule a prefedined task for mkuser Creates a new user for this management node. mountfs Mount a filesystem. querybackup Query backup summary restripefs Rebalances or restores the replication of all files in a file system. resumenode Resumes an interface node. rmbackupfs Remove file system to TSM server association rmcluster Removes the cluster from the management (will not delete cluster). rmexport Removes the given export. rmfs Removes the given filesystem. rmfset Removes a fileset rmlog Removes all log entries from database rmnw Remove an existing public network configuration rmnwbond Deletes a regular bond interface. rmnwgroup Remove an existing group of nodes. maybe attached public network configuration must be detached in advance rmnwnatgateway Unconfigures a CTDB NAT gateway. rmpolicy Removes a policy and all the rules belonging to it rmpolicyrule Removes one or more rules from given policy rmsnapshot Removes a filesystem snapshot rmtask Removes the given scheduled task. rmtsmnode Remove TSM server stanza for node rmuser Removes the user from the management node. rpldisk Replaces current NSD of a filesystem with a free NSD runpolicy Migrates/deletes already existing files on the GPFS file system based on the rules in policy provided setnwdns Sets nameservers setnwntp Sets NTP servers setpolicy sets placement policy rules of a given policy on cluster passed by user. setquota Sets the quota settings. showbackuperrors Shows errors of a backup session showbackuplog Shows the log of the recent backup session. showrestoreerrors Shows errors of a restore session showrestorelog Shows the log of the recent restore session. startbackup Start backup process
347
Start reconcile process Start asynchronous replication. Start restore process Stops a running TSM backup session Stop asynchronous replication. Stops a running TSM restore session Suspends an interface node. Unlink a fileset. Unmount a filesystem.
Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut, head, less, tail, uniq For additional help on a specific command use 'man command'. [Furby.storage.tucson.ibm.com]$ In SONAS there are some tasks that can be done exclusively by a CLI users while some of the tasks you can perform using CLI commands as well from the GUI. The commands shown are a combination of both. Each command has help regarding the usage and can be viewed by using the command: # manpage <command_name> or # <command_name> --help For example, let us look up help for the command mkfs using --help and manpages. See Example 10-2 for complete help output from the command, and Figure 10-45, which shows a snapshot of the manpage help.
Example 10-2 Help or usage for CLI command mkfs taken as example
[Furby.storage.tucson.ibm.com]$ mkfs --help usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi | --nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N <numnodes>][--noverify] [--pool <arg>] [-R <replica>] filesystem The device name of the file system to be created. File system names need not be fully-qualified. mountpoint Specifies the mount point directory of the GPFS file system. -b,--blocksize <blocksize> blocksize -c,--cluster <cluster name or id> define cluster --dmapi enable the DMAPI support for HSM -F,--disks <disks> disks -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. -j,--blockallocationtype <blockallocationtype> blockallocationtype --master master -N,--numnodes <numnodes> numnodes --nodmapi disable the DMAPI support for HSM --noverify noverify --pool <arg> pool -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all
348
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Similarly you can run help for each of the commands available in the CLI and also run the manpage command for each.
349
10.2.3 Tasks that can be performed by the SONAS GUI and SONAS CLI
The following tasks are performed either by the SONAS GUI or the SONAS CLI: 1. Configure protocols and their settings. 2. Add or remove a cluster to/from the management node. 3. Add or remove Network Shared Disks (NSDs). 4. Create or delete a file system. 5. Create or delete exports. 6. Create or delete tasks 7. Create or delete snapshots. 8. Start or stop storage nodes. 9. Change file system parameters. 10.Change cluster parameters. 11.Change disk or NSD status. 12.'Change policies 13.Link or unlink file sets 14.'Mount or unmount a file system. 15.Select the GPFS cluster. 16.Show node status. 17.'Show cluster status. 18.'Show system utilization (CPU, RAM, and so on). 19.'Show snapshots. 20.'Show file system utilization.
350
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
21.'Show NSD status. 22.'Show file system status. 23.'Show or filter quotas. 24.'Show storage pools. 25.Show policies. 26.Show file sets. 27.Show the event log. 28.Show tasks.
[Furby.storage.tucson.ibm.com]$ addcluster --help usage: addcluster -h <host> -p <password> -h,--host <host> host -p,--password <password> password [Furby.storage.tucson.ibm.com]$ addcluster -h int001st001 -p Passw0rd EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added
[Furby.storage.tucson.ibm.com]$ lscluster ClusterId Name PrimaryServer SecondaryServer 12402779238924957906 Furby.storage.tucson.ibm.com strg001st001 strg002st001
351
Product Version Connection status GPFS status CTDB status 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 OK OK OK OK OK active active active active active active active active
Interface Nodes
This section describes the Interface Nodes commands: 1. Suspend Node: This command suspends the Interface Node and BANS the CTDB on it and disables the Node. A banned node does not participate in the cluster and does not host any records for the CTDB. Its IP address has been taken over by an other node and no services are hosted. Using the GUI: Refer to the point 2 from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: Use the CLI suspendnode command. Example 10-6 shows the usage and command output.
352
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Example 10-6 Command usage and output for CLI command suspendnode
[Furby.storage.tucson.ibm.com]$ suspendnode --help usage: suspendnode nodeName [-c <cluster name or id>] nodeName Specifies the name or ip of the node for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ suspendnode int002st001 -c Furby.storage.tucson.ibm.com EFSSG0204I The node(s) are suspended successfully! 2. Resume Node: The resumenode command resumes the suspended interface node. It unbans the CTDB on that node and enables the node. The resumed node participates in the cluster and hosts records for the clustered trivial database (CTDB). It takes back its IP address and starts hosting services. Using the GUI: Refer to the point 2 from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: Use the CLI resumenode command. Example 10-7 shows the syntax and command output.
Example 10-7 Command usage and output for CLI command resumenode
[Furby.storage.tucson.ibm.com]$ resumenode --help usage: resumenode Node [-c <cluster name or id>] Node Specifies the name of the node for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ resumenode int002st001 EFSSG0203I The node(s) are resumed successfully!
GUI: Recover Node and Restart Node cannot be done using the CLI. They should be done only using the GUI.
Storage nodes
This section describes the Storage Node commands: 1. Stop Node: This command unmounts the filesystem on that Node and shuts down the GPFS daemon. Using the GUI: Refer to the point 3from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: This task cannot be run using the CLI. There is no command existing to perform this operation. 2. Start Node: This starts the GPFS daemon the storage node selected and mounts the filesystem on that Node. Using the GUI: Refer to the point 3from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: This task cannot be run using the CLI. There is no command existing to perform this operation.
353
To create the File System, click the Create a File System button. A new panel will open which asks you to enter the details such as these: 1. Select NSD: Here, you select the NSD you want to add in the Filesystem. At least One NSD should be defined. In case of replication, at least two should be selected such that the two NSDs should belong to different Failure Groups so that in case of failure on one NSD, the replica will be available for access. Select the NSD by clicking the check box on the left of the table (see Figure 10-47). Click the Next tab in the panel.
354
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-47 Select NSDs from list available in SONAS GUI to create filesystem
2. Basic Information: This is the next tab. Enter the Mount point for the File System and Device name or File System Name you want to create. Choose the Block Size from the list available in the Block Size drop-down menu. You can use the Force option if you do not want GPFS to check if the NSD chosen has been already used by another File System. Its advisable to use this option if you are sure that the NSD is not currently being used by any file system and it is Free (see Figure 10-48 on page 355). Click the next tab in the panel.
3. Locking and access control lists (ACL): This tab is for the ACLs and the Locking type. Currently we support only NFSV4 locking type and NFSV4 ACL type which is already chosen by default in the GUI. The drop-down menu for both Locking Type and ACL type is hence de-activated or disabled. See Figure 10-49. Click the next tab.
355
4. Replication: This tab allows you to choose if you want replication enabled. In case you enable replication, you need to select at least 2 NSDs as mentioned before. Also, the two NSDs should belong to two different failure groups. The Enable Replication Support enables replication support for all files and metadata in the file system. This setting cannot be changed after the file system has been created. The value for both the maximum data and metadata replicas is set to 2. To set replication to true, select the Enable Replication check box. See Figure 10-50. Click the next tab.
5. Automount: This tab allows you to set Automount to True which means, after every node restart, the file system will be automatically mounted on the nodes. If set to False, or not selected, the file system need to be manually mounted on the nodes. See Figure 10-51. Click the next tab.
356
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
6. Limits: In this tab you need to enter the number of nodes you want the file system to be mounted on and the maximum number of files that the filesystem can hold. Enter the Number of Nodes in the text box available. This is estimated number of nodes that will mount the file system. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value cannot be changed after the file system has been created. When you create a GPFS file system, consider over estimating the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations. Although a large estimate consumes additional memory, under estimating the data structure allocation can reduce the efficiency of a node when it processes some parallel requests such as the allotment of disk space to a file. If you cannot predict the number of nodes that will mount the file system, use the default value. If you are planning to add nodes to your system, you should specify a number larger than the default. However, do not make estimates that are not realistic, because specifying an excessive number of nodes might have an adverse affect on buffer operations. Enter the number of Maximum number of Files in the text box available. This will be the maximum number of files that will be allowed to be created on this file system. See Figure 10-52. Click the next tab.
Figure 10-52 Setting inode limits and maximum number of files for the new filesystem
357
7. Miscellaneous: Using this tab you can enable other options such as Quota, DMAPI, atime, and mtime. Check boxes are provided which need to be selected in case you want to select the option. Uncheck if you do not want to select the option. See Figure 10-53.
8. Final Step: Go through each tab again to verify that all the necessary parameters are selected. After you have confirmed all the parameters for the filesystem, click the OK button, which is located is on the lower end of the Create File system panel. When clicked, the task begins and a Tasks Progress window appears that displays the task being performed and its details. When done, at end of each task, there should be a Green check mark (). If any error occurs, there will be a Red cross (x) and an error message will appear. Check the error, correct it, and retry. When the task is completed, click the Close button to close the window. See Figure 10-54.
358
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Using the CLI: You can create a Filesystem using the mkfs CLI command. The NSD name is mandatory and you need to enter at least one NSD. Select -R (replication) to none, if you do not want to enable Replication. In case you enable replication, you need to enter at least two NSDs where both these NSDs belong to different Failure Groups. The block size and replication factors chosen affect file system performance. Example 10-8 shows the help and usage of the command. For the example, the block size was left to the default 256 KB. Also, replication was not enabled.
Example 10-8 mkfs command example
[Furby.storage.tucson.ibm.com]$ mkfs --help usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi | --nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N <numnodes>][--noverify] [--pool <arg>] [-R <replica>] filesystem The device name of the file system to be created. File system names need not be fully-qualified. mountpoint Specifies the mount point directory of the GPFS file system. -b,--blocksize <blocksize> blocksize -c,--cluster <cluster name or id> define cluster --dmapi enable the DMAPI support for HSM -F,--disks <disks> disks -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. -j,--blockallocationtype <blockallocationtype> blockallocationtype --master master -N,--numnodes <numnodes> numnodes --nodmapi disable the DMAPI support for HSM --noverify noverify --pool <arg> pool -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all [Furby.storage.tucson.ibm.com]# mkfs gpfs1 --nodmapi -F array0_sata_60001ff0732f85c8c080008 -R none --noverify The following disks of gpfs1 will be formatted on node strg001st001: array0_sata_60001ff0732f85c8c080008: size 15292432384 KB Formatting file system ... Disks up to size 125 TB can be added to storage pool 'system'. Creating Inode File 3 % complete on Fri Apr 23 09:54:04 2010 5 % complete on Fri Apr 23 09:54:09 2010 7 % complete on Fri Apr 23 09:54:14 2010 9 % complete on Fri Apr 23 09:54:19 2010 11 % complete on Fri Apr 23 09:54:24 2010 13 % complete on Fri Apr 23 09:54:29 2010 15 % complete on Fri Apr 23 09:54:34 2010 17 % complete on Fri Apr 23 09:54:39 2010 19 % complete on Fri Apr 23 09:54:44 2010 21 % complete on Fri Apr 23 09:54:49 2010 23 % complete on Fri Apr 23 09:54:54 2010 25 % complete on Fri Apr 23 09:54:59 2010 27 % complete on Fri Apr 23 09:55:04 2010 29 % complete on Fri Apr 23 09:55:09 2010 31 % complete on Fri Apr 23 09:55:15 2010
Chapter 10. SONAS administration
359
33 % complete on Fri Apr 23 09:55:20 2010 35 % complete on Fri Apr 23 09:55:25 2010 37 % complete on Fri Apr 23 09:55:30 2010 39 % complete on Fri Apr 23 09:55:35 2010 41 % complete on Fri Apr 23 09:55:40 2010 43 % complete on Fri Apr 23 09:55:45 2010 45 % complete on Fri Apr 23 09:55:50 2010 47 % complete on Fri Apr 23 09:55:55 2010 48 % complete on Fri Apr 23 09:56:00 2010 50 % complete on Fri Apr 23 09:56:05 2010 52 % complete on Fri Apr 23 09:56:10 2010 54 % complete on Fri Apr 23 09:56:15 2010 56 % complete on Fri Apr 23 09:56:20 2010 58 % complete on Fri Apr 23 09:56:25 2010 60 % complete on Fri Apr 23 09:56:30 2010 62 % complete on Fri Apr 23 09:56:35 2010 64 % complete on Fri Apr 23 09:56:40 2010 66 % complete on Fri Apr 23 09:56:45 2010 67 % complete on Fri Apr 23 09:56:50 2010 69 % complete on Fri Apr 23 09:56:55 2010 71 % complete on Fri Apr 23 09:57:00 2010 73 % complete on Fri Apr 23 09:57:05 2010 75 % complete on Fri Apr 23 09:57:10 2010 77 % complete on Fri Apr 23 09:57:15 2010 79 % complete on Fri Apr 23 09:57:20 2010 81 % complete on Fri Apr 23 09:57:25 2010 82 % complete on Fri Apr 23 09:57:30 2010 84 % complete on Fri Apr 23 09:57:35 2010 86 % complete on Fri Apr 23 09:57:40 2010 88 % complete on Fri Apr 23 09:57:45 2010 90 % complete on Fri Apr 23 09:57:50 2010 92 % complete on Fri Apr 23 09:57:55 2010 94 % complete on Fri Apr 23 09:58:00 2010 96 % complete on Fri Apr 23 09:58:05 2010 97 % complete on Fri Apr 23 09:58:10 2010 99 % complete on Fri Apr 23 09:58:15 2010 100 % complete on Fri Apr 23 09:58:16 2010 Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool 'system' 60 % complete on Fri Apr 23 09:58:31 2010 100 % complete on Fri Apr 23 09:58:34 2010 Completed creation of file system /dev/gpfs1. EFSSG0019I The filesystem gpfs1 has been successfully created. EFSSG0038I The filesystem gpfs1 has been successfully mounted. EFSSG0015I Refreshing data ... [Furby.storage.tucson.ibm.com]#
360
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[Furby.storage.tucson.ibm.com]$ lsfs --help usage: lsfs [-c <cluster name or id>] [-d <arg>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -d,--device <arg> define device -r,--refresh refresh list -Y format output as delimited text
[Furby.storage.tucson.ibm.com]$ lsfs Cluster Devicename Mountpoint Type Remote device Quota Def. quota Blocksize Locking type ACL type Inodes Data replicas Metadata replicas Replication policy Dmapi Block allocation type Version Last update Master Humboldt.storage.tucson.ibm.com gpfs0 /ibm/gpfs0 local local user;group;fileset 256K nfs4 nfs4 100.000M 1 2 whenpossible F scatter 11.05 4/23/10 5:15 PM YES Humboldt.storage.tucson.ibm.com gpfs2 /ibm/gpfs2 local local user;group;fileset 64K nfs4 nfs4 14.934M 1 1 whenpossible F scatter 11.05 4/23/10 5:15 PM NO
Figure 10-55 Select to mount the file system on all or selective nodes
361
Choose to mount on selected nodes then requires you to select the nodes on which you want to mount the file system. The window seen is similar to Figure 10-56.
When done, click OK in the same window. The filesystem is then mounted on the nodes specified. The task progress window will display the progress, and when successful, will have Green check marks (). If any error, the error message will be shown and the window wall show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-57.
If successful, close the window by clicking the Close button. The window will disappear and you will be brought to the first page of the File Systems page. The table on the main File System page should now list the filesystem to be mounted on the number of nodes selected. Using the CLI: You can mount the filesystem using the CLI mountfs command. The command allows you to choose to mount the file system on all the nodes or on specific interface nodes. The usage and command output is displayed in Example 10-10. In the example, the filesystem gpfs1 is mounted on all nodes and hence the -n option is omitted.
Example 10-10 Command usage and output for the CLI command mountfs
[Furby.storage.tucson.ibm.com]$ mountfs --help usage: mountfs filesystem [-c <cluster name or id>] [-n <nodes>] filesystem Identifies the file system name of the file system. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster -n,--nodes <nodes> nodes [Furby.storage.tucson.ibm.com]$ mountfs gpfs2 EFSSG0038I The filesystem gpfs2 has been successfully mounted.
362
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Choose to mount on selected nodes then requires you to select the nodes on which you want to mount the file system. The window is shown Figure 10-59.
When done, click OK in the same window. The file system is then unmounted on the nodes specified. The task progress window will display the progress, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-60.
363
After operations have completed, close the window by clicking the Close button. The window will disappear and you will be brought to the first page of the File Systems page. The table on the main File System page should now list the filesystem to be unmounted on the number of nodes selected. Using the CLI: You can unmount the filesystem using the CLI unmountfs command. The command allows you to choose to unmount the file system on all the nodes or on specific interface nodes. The usage and command output is displayed in Example 10-11. The filesystem gpfs1 is unmounted on all nodes and hence -n option is omitted.
Example 10-11 Command usage and output for the CLI command unmountfs.
[Furby.storage.tucson.ibm.com]$ unmountfs --help usage: unmountfs filesystem [-c <cluster name or id>] [-n <nodes>] filesystem Specifies the name of the filesystem for identification. -c,--cluster <cluster name or id> define cluster -n,--nodes <nodes> nodes [Furby.storage.tucson.ibm.com]$ unmountfs gpfs2 EFSSG0039I The filesystem gpfs2 has been successfully unmounted.
364
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The three check boxes, Enable Quota, Suppress atime and Exact mtime are the ones that need the filesystem to be unmounted. In Figure 10-61 these check boxes are shown with a red asterisk (*) Upon modifying the parameters, you need to click the button OK for the task to progress. The task bar will show you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-62 on page 365. Click the Close button to close the window.
365
2. Modifying the Disks for the File System: You can add or remove disks added to the filesystem. The file system should have at least one disk. a. Adding New Disks: You can add more by clicking the Add Disk to the file system button. A new window appears listing the free disks, you can choose which disk to add. Choose the disk type. You can also specify the Failure Group and Storage pool of the disk when adding. When done, click OK. See Figure 10-63.
The task progress bar appears, showing you the progress of operation, and when successful, it will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem, and retry. See Figure 10-64. The new disk will be successfully added. Click the Close button to close the window.
366
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
b. Remove Disks: You can also remove the disk by selecting the disk you want to delete and clicking the Remove button from the panel in the lower section of the File Systems page as shown in Figure 10-65.
Figure 10-65 Select the disk to be removed from the list of disks for the file system selected
On clicking the Remove button, a new window will appear which asks for confirmation to remove the disk as shown in Figure 10-66.
To confirm, click the OK button. The task progress bar will show you the progress of the operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-68. The new disk will be successfully removed. Click the Close button to close the window.
Using the CLI: You can change the file system parameters using the command chfs. Example 10-12 describes the usage and shows command output of chfs used to add new disk to the filesystem
367
Example 10-12 Command usage and output by adding disk to change properties
[Furby.storage.tucson.ibm.com]$ chfs --help usage: chfs filesystem [--add <disks> | --noverify | --pool <arg>] [--atime <{exact|suppress}>] [-c <cluster name or id>] [--force | --remove <disks>] [-i <maxinodes>] [--master | --nomaster] [--mtime <{exact|rough}>][-q <{enable|disable}>] [-R <replica>] filesystem The device name of the file system to be changed. File system names need not be fully-qualified. --add <disks> Adds disks to the file system. --atime <{exact|suppress}> If set to exact the file system will stamp access times on every access to a file or directory. Otherwise access times will not be recorded. -c,--cluster <cluster name or id> define cluster --force enforce disk removal without calling back the user -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. --master master --mtime <{exact|rough}> If set to exact the file or directory modification times will be updated immediately. Otherwise modification times will be updated after a several second delay. --nomaster nomaster --noverify noverify --pool <arg> pool -q,--quota <{enable|disable}> Enables or disables quotas for this file system. -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all --remove <disks> Removes disks from the file system. [Furby.storage.tucson.ibm.com]$ chfs gpfs2 --add array0_sata_60001ff0732f85c8c080008 The following disks of gpfs2 will be formatted on node strg001st001: array0_sata_60001ff0732f85c8c080008: size 15292432384 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' 9 % complete on Mon Apr 26 12:14:19 2010 10 % complete on Mon Apr 26 12:14:24 2010 18 % complete on Mon Apr 26 12:14:29 2010 26 % complete on Mon Apr 26 12:14:34 2010 27 % complete on Mon Apr 26 12:14:39 2010 35 % complete on Mon Apr 26 12:14:44 2010 43 % complete on Mon Apr 26 12:14:49 2010 44 % complete on Mon Apr 26 12:14:55 2010 52 % complete on Mon Apr 26 12:15:00 2010 53 % complete on Mon Apr 26 12:15:05 2010 61 % complete on Mon Apr 26 12:15:10 2010 62 % complete on Mon Apr 26 12:15:15 2010 70 % complete on Mon Apr 26 12:15:20 2010 71 % complete on Mon Apr 26 12:15:25 2010 77 % complete on Mon Apr 26 12:15:30 2010 83 % complete on Mon Apr 26 12:15:35 2010 90 % complete on Mon Apr 26 12:15:40 2010 95 % complete on Mon Apr 26 12:15:45 2010 100 % complete on Mon Apr 26 12:15:49 2010 Completed adding disks to file system gpfs2. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem gpfs2 has been successfully changed. EFSSG0015I Refreshing data ...
368
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
After you have confirmed, the operation is carried out. The task progress bar shows you the progress of operation and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-69. The new disk will be successfully removed. Click the Close button to close the window.
Using the CLI: You can delete and existing file system from the cluster using CLI rmfs command. The command usage and output is shown in Example 10-13.
Example 10-13 Command usage and output for removing the file system.
[Furby.storage.tucson.ibm.com]$ rmfs --help usage: rmfs filesystem [-c <cluster name or id>] [--force] filesystem The device name of the file system to contain the new fileset. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster --force enforce operation without calling back the user [Furby.storage.tucson.ibm.com]$ rmfs gpfs2 Do you really want to perform the operation (yes/no - default no): yes All data on following disks of gpfs2 will be destroyed: array1_sata_60001ff0732f85f8c0b000b Completed deletion of file system /dev/gpfs2.
369
370
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Example 10-14 shows the usage and the command output for the lsquota command.
Example 10-14 Command usage and output for CLI command lsquota.
[Furby.storage.tucson.ibm.com]$ lsquota --help usage: lsquota [-c <cluster name or id>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text
[Furby.storage.tucson.ibm.com]$ lsquota Cluster Device SL(usage) HL(usage) Used(usage) SL(inode) HL(inode) Used(inode) Furby.storage.tucson.ibm.com gpfs0 ----16 kB ----1 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com gpfs0 ----832 kB ----23 Furby.storage.tucson.ibm.com gpfs0 ----832 kB ----23 Furby.storage.tucson.ibm.com gpfs0 ----816 kB ----22 Tip: The actual command output displayed on the panel has many more fields than shown in this example, which has been simplified to keep the important information clear. 2. Set Quota Using the GUI: You cannot set quota from the GUI. GUI shows only a read only representation for the Quota management. Using the CLI: You can set the quota for the filesystem using the CLI command setquota. This command sets the quota for an user, a group, or a file set. Soft limits are subject to reporting only; hard limits will be enforced by the file system. Disk area size (terabytes), or p (petabytes). These values are not case sensitive. The effective quotas are passed in kilobytes and matched to block sizes. I-node limits accept only k and m suffixes. The maximal value for i-node limits is 2 GB. Warning: The setting of the quota does not update the database, because the refresh takes too much time. If you want to see the result immediately with the lsquota command, invoke it using the -r option (lsquota -r). Example 10-15 shows the command usage and the output for the CLI command setquota. In the example, we setquota for hard and soft limits for disk usage for a user eebenall from the domain Storage3 and the file system gpfs0.
Example 10-15 Command usage and output of CLI command setquota
[Furby.storage.tucson.ibm.com]$ setquota --help usage: setquota device [-c <cluster name or id>] [-g <arg>] [-h <arg>] [-H <arg>] [-j <arg>] [-S <arg>] [-s <arg>] [-u <arg>] device The mount point or device of the filesystem. -c,--cluster <cluster name or id> define cluster -g,--group <arg> name of the group -h,--hard <arg> hardlimit of the disk usage in bytes, KB, MB, GB, TB or PB -H,--hardinode <arg> hardlimit of the inodes in bytes, KB or MB -j,--fileset <arg> name of the fileset
Chapter 10. SONAS administration
371
softlimit of the inodes in bytes, KB or MB softlimit of the disk usage in bytes, KB, MB, GB, TB or name of the user
accepted postfixes: 'k' : kiloByte, 'm' :MegaByte, 'g' : GigaByte, 't' : TeraByte, 'p' : PetaByte [Furby.storage.tucson.ibm.com]$ setquota gpfs0 -u STORAGE3\\eebenall -h 400g EFSSG0040I The quota has been successfully set. -s 200g
[Furby.storage.tucson.ibm.com]$ lsfset --help usage: lsfset device [-c <cluster name or id>] [-r] [-Y] device The device name of the file system to contain the fileset. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 4/21/10 4:39 PM root fileset 4/26/10 5:10 PM 1 newfileset Unlinked -4/26/10 10:33 AM this is a test fileset 4/26/10 5:10 PM
372
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2. Create Filesets: Using the GUI: You can create a fileset by clicking the Create a Fileset button on the main page of the Filesets. This will open a new window asking for the details of the filesets such as Name and an optional Comment. Click OK when done. The task will create a fileset. The newly created fileset will be displayed in the table of all the filesets. You can click it to see details. At this point, the fileset is not linked to any directory. It cannot be used to store data. You need to link the fileset similar to mounting a file system before using it to store data. Figure 10-70 shows the dialog box for creating a fileset.
The task progress bar shows you the progress of operation and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-71. The new disk will be successfully added. Click the Close button to close the window.
Using the CLI: You can create the fileset using the mkfset CLI command. This command constructs a new file set using the specified name. The new file set is empty except for a root directory, and does not appear in the directory namespace until the linkfset command is issued to link the fileset. The command usage and output is shown in Example 10-17. In the example, we create a new fileset called newfileset in the gfs0 file system. This fset is not yet linked and hence in the Path column you see no value. We can check that the fileset is created successfully by checking the lsfset command. The example also shows the command output of lsfset. In this example, the new fileset used is newfileset create on filesystem gpfs0.
Example 10-17 Command usage and output for CLI command mkfset and lsfset
[Furby.storage.tucson.ibm.com]$ mkfset --help usage: mkfset device filesetName [-c <cluster name or id>] [-t <comment>] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified.
373
filesetName Specifies the name of the newly created fileset. -c,--cluster <cluster name or id> define cluster -t <comment> comment [Furby.storage.tucson.ibm.com]$ mkfset gpfs0 newfileset -t This is a new Fileset EFSSG0070I Fileset newfileset created successfully! [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 2:06 AM 1 newfileset Unlinked -5/5/10 2:06 AM this is a new fileset 5/5/10 2:06 AM 3. Link file sets: When the file sets are linked, a junction is created. The junction is a special directory entry, much like a POSIX hard link, that connects a name in a directory of one file set, the parent, to the root directory of a child file set. From the users viewpoint, a junction always appears as if it were a directory, but the user is not allowed to issue the unlink or rmdir commands on a junction. Instead, the unlinkfset command must be used to remove a junction. As a prerequisite, the file system must be mounted and the junction path must be under the mount point of the file system. Using the GUI: When you create a fileset it is not linked by default. You need to manually link it to a directory which is not existing. In the GUI, when you click the new fileset that you have created in the table, the section below in the panel for file sets, displays the information about the file set. In our example, we have created a new file set called newfileset which is not yet linked. The lower section displays the details such as Name, Status, and more. Along with this, if the fileset is not yet linked, the Link button is enabled, and you can click it. A new window opens asking for the path. Click OK when done and the fileset will then be linked to this directory. See Figure 10-72.
Figure 10-72 Details of the fileset created is seen. The fileset is currently not linked
In case the fileset is already linked, the Unlink button will be enabled and the text box for the path and the Link button will be disabled.
374
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In our example, we now link the file set to a path /ibm/gpfs0/redbook. The file set newfileset is now linked to this path. See Figure 10-73, which shows the dialog box that opens to enter the path to link the file set.
The task bar for the progress of the task appears. Click Close when the task is completed successfully. The details for the file set are shown in Figure 10-74.
Using the CLI: You can link the file set using the CLI linkfset command. The command will link the file set to the directory specified. This directory is the junctionPath in the command. In the example, we also run lsfset to confirm the file set is linked. See Example 10-18. The fileset used is newfileset created on filesystem gpfs0.
Example 10-18 Linking fileset using CLI command linkfset. lsfset verifies the link
[Furby.storage.tucson.ibm.com]$ linkfset --help usage: linkfset device filesetName [junctionPath] [-c <cluster name or id>] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. junctionPath Specifies the name of the junction. The name must not refer to an existing file system object. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ linkfset gpfs0 newfileset /ibm/gpfs0/redbook EFSSG0078I Fileset newfileset successfully linked! [Furby.storage.tucson.ibm.com]$ [[email protected] ~]# lsfset gpfs0
375
ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 3:10 AM 1 newfileset Linked /ibm/gpfs0/redbook 5/5/10 2:06 AM this is a new fileset 5/5/10 3:10 AM 4. Unlink file sets: Using the GUI: You can unlink the file set by clicking the Unlink button. From the table that lists all the file set, click the file set you want to unlink. Upon clicking the file set, you will see the file set details below the table. In the details of the file set, you have the Unlink button. See Figure 10-74, which displays the details of file set and the Unlink button. When you click this button, a new window opens asking for confirmation. See Figure 10-75.
Click OK to confirm. The task bar for the progress of the task appears. Click Close when task completed successfully. The fileset will be successfully unlinked. Using the CLI: You can unlink the file set using the unlinkfset CLI command. The command unlinks a linked file set. The specified file set must exists in the specified file system. See the command usage and output in Example 10-19. The example also shows the output for command lsfset confirming that the fileset was unlinked. In example the fileset used is newfileset created on filesystem gpfs0.
Example 10-19 Command usage and output for unlinking file set using CLI command unlinkfset and lsfset to verify
[Furby.storage.tucson.ibm.com]$ unlinkfset --help usage: unlinkfset device filesetName [-c <cluster name or id>] [-f] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. -c,--cluster <cluster name or id> define cluster -f force [Furby.storage.tucson.ibm.com]$ unlinkfset gpfs0 newfileset EFSSG0075I Fileset newfileset unlinked successfully! [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 3:26 AM 1 newfileset Unlinked -5/5/10 2:06 AM this is a new fileset 5/5/10 3:26 AM
376
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
5. Remove Filesets: Using the GUI: You can remove the fileset by selecting the fileset you want to delete and click the Delete button. The tasks opens a new window asking for confirmation before deleting (see Figure 10-76).
Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-77. The new disk will be successfully added. Click the Close button to close the window.
Using the CLI: You can delete a fileset using the CLI command rmfset. The command asks for confirmation and then on confirmation, deletes the file set specified. The rmfset command fails if the file set is currently linked into the namespace. By default, the rmfset command fails if the file set contains any contents except for an empty root directory. The root file set cannot be deleted. Example 10-20 shows the command usage and output for deleting a fileset. In this example the fileset used is newfileset created on filesystem gpfs0.
Example 10-20 rmfset command example
[Furby.storage.tucson.ibm.com]$ rmfset --help rmfset usage: rmfset device filesetName [-c <cluster name or id>] [-f] [--force] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. -c,--cluster <cluster name or id> define cluster -f Forces the deletion of the file set. All file set contents are deleted. Any child file sets are first unlinked. --force enforce operation without calling back the user
377
Do you really want to perform the operation (yes/no - default no): yes EFSSG0073I Fileset newfileset removed successfully!
[Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset
378
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-78 Panel to create a new export. Provide the sharename, pathname, owner and services
A new page will open which will ask you for protocol related information. Each of them are described here: 1. FTP: FTP does not take any parameters during its configuration. Proceed to click Next as shown in Figure 10-79.
Figure 10-79 Panel that shows that directory in path given is created but has default ACLs
379
Attention: The warning message here is because the folder in the directory path mentioned does not exist. However, this directory is created by this operation. This warning message informs you that the directory has been created and has the default ACLs that need to be modified if required. 2. NFS Export: NFS exports are accessed by per clients or hosts and not users. Hence, you need to mention which hosts or clients can access the NFS exports. On the new page that opens, you need to add the client details in the Client settings section as follows: Client Name: Add name of host who can access the export. You can individual hostnames or * for all clients/host. Read Only: Check this box if you want the clients to have Read Only access. Unchecking will give the clients both read and write access. Sync: Check this box if you want replies to the requests only after the changes are committed to stable storage. Root Squash: This option, maps requests from uid/gid 0 to the anonymous uid/gid. Click the Add Client button to successfully add the client. When added, it is added to the table that displays all clients for the NFS export. Now click the Next button. See Figure 10-80.
Figure 10-80 NFS configuration panel. Add client and other properties
380
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
3. CIFS Export: The CIFS configuration parameters follow: Comment: This can be any user defined comment. Browsable: This check box, if checked, allows the export to be visible in the net view command and in the browse list. ACL / Access Rights: If checked, the export has only read-only access. See Figure 10-81.
Click the Next button to proceed. The next page is the Final page, which asks for confirmation before configuring the exports. See Figure 10-82.
Click the Finish button to confirm. Click Back to go back and make some changes. Click Cancel to cancel the creation of exports. This will bring you to the main page of Exports. After you have confirmed and clicked to finish, the task is carried out. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-83. The new disk will be successfully removed. Click the Close button to close the window.
381
The newly created exports will be added to the table on the main page of the exports. Using the CLI: You can create an export using the mkexport CLI command. This command takes the name of the sharename and the directory path of the share you want to create. You can create FTP, CIFS and NFS share with this command. FTP share does not need any parameters. CIFS and NFS take some parameters. Using the command you can also create an inactive share. Inactive shares are when the creation of the share is complete however the share cannot be used by the end users. By default, the share is active. You can also add owner which will give the required ACLs to the user to access the share. The command usage and output is shown in Example 10-21. In this example, FTP, CIFS, and NFS share are created.
Example 10-21 Command usage and output for creating export using CLI command mkexport
[Furby.storage.tucson.ibm.com]$ mkexport --help usage: mkexport sharename path [-c <cluster name or id>] --cifs <CIFS options> | --ftp | --http | --nfs <NFS client definition> | --scp [--inactive][--owner <owner>] sharename Specifies the name of the newly created export. path Specifies the name of the path which will be share. -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp enable FTP protocol --http enable HTTP protocol --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --owner <owner> directory owner --scp
[Furby.storage.tucson.ibm.com]$ mkexport shared /ibm/gpfs0/shared --ftp --nfs "*(rw,no_root_squash,async)" --cifs browseable=yes,comment="IBM SONAS" --owner "SONASDM\eebanell
382
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
You can also create an inactive share using the --inactive option in the mkexport command. You cannot do this from the GUI.
[Furby.storage.tucson.ibm.com]$ lsexport --help usage: lsexport [-c <cluster name or id>] [-r] [-v] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -v,--verbose extended list -Y format output as delimited text
[Furby.storage.tucson.ibm.com]$ lsexport Name Path Protocol Active Timestamp 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 FTP true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 HTTP true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 NFS true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 CIFS true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 SCP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 FTP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 HTTP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 NFS true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 CIFS true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 SCP true 4/28/10 11:03 AM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 FTP true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 HTTP true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 NFS true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 CIFS true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 SCP true 4/28/10 18.38 PM
383
Figure 10-84 Panel to add new protocols to the export already existing
As you see, the protocols that are already added are disabled. Also, sharename and path are disabled so that you cannot change it. You can click the protocols that you want to add and click Next. The same procedure as creating the export is followed from this step on. Provide details for protocol that you add as in the example, for NFS protocol. FTP takes none. Click the Next button to continue till you finish. For detailed steps, see Creating exports on page 378. 2. Change Protocol Parameters: You can change parameters for both NFS and CIFS. On the main page of Exports under the Files category you can see the table that displays all the existing exports. If you click any export, in the section lower on that same page, you can see protocol information for that export. You can see details for only CIFS and NFS protocols. a. NFS details: You can change NFS details by adding more clients or removing existing. You can also edit an existing client and add more options as seen in Figure 10-85.
Figure 10-85 Modifying NFS configuration by editing clients or adding new clients
You can click the edit link to change options of the client added, you can remove the client added using the remove link. You can also add new client using the Add Client button. When you want to edit or add a new client, a new window opens asking for details of the client as shown in Figure 10-86.
384
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
For a new client, you need to add Client name, check or uncheck read-only, root-squash, sync and, others as required. For an existing client, the name field will be disabled because it is an existing one. To remove the client, click Remove link. b. CIFS details: You can modify a CIFS export parameters by editing the Details such as comment, Browsable option, and ACLs. You can also add, modify or remove advanced options for a CIFS share using the Advanced Option - Add, Modify, and Delete buttons. See Figure 10-87.
Using the CLI: You can modify an existing share or export using the chexport CLI command. Using the CLI command, unlike the GUI, you can remove or add protocols using the same command. Each has different options to use. In this section, adding new protocols will be discussed. You can add new protocols by adding the --cifs, --ftp and --nfs options and the protocol definitions. The command usage and output are shown in Example 10-23. For this example, the existing export was a CIFS export. FTP and NFS are added using chexport.
Example 10-23 Command usage and output for adding new protocols to existing share
[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options]
Chapter 10. SONAS administration
385
--ftp <arg> --http <arg> --inactive --nfs <NFS client definition> --nfsadd <NFS clients> --nfsremove <NFS clients> --scp <arg>
FTP HTTP share is inactive enable NFS protocol [using clients(NFSoption)] add NFS clients remove NFS clients SCP
[Furby.storage.tucson.ibm.com]$ chexport shared --ftp --nfs "*(rw,no_root_squash,async)" EFSSG0022I Protocol FTP is configured for share shared. EFSSG0034I NFS Export shared is configured, added client(s): *, removed client(s): None. You can add more or remove clients to the NFS protocols, or modify CIFS options using the [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] and the [--cifs <CIFS options>] options. In the Example 10-24, a new client is added to the NFS export.
Example 10-24 Command output to add new NFS clients to existing NFS share
[Furby.storage.tucson.ibm.com]$ chexport shared --nfsadd "9.1.2.3(rw,no_root_squash,async)" EFSSG0034I NFS Export shared is configured, added client(s): 9.1.2.3, removed client(s): None.
On clicking the OK button on that window, the task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-89. The new disk will be successfully removed. Click the Close button to close the window. The export is successfully modified. 386
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Using the CLI: You can remove a protocol from an existing share using the command chexport and off option for a protocol. The command usage and output is shown as follows in Example 10-25. In this example the existing share or export is configured for CIFS, FTP, and NFS. The command removes FTP and NFS.
Example 10-25 Command usage and output to remove protocols from existing share
[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP [Furby.storage.tucson.ibm.com]$ chexport shared --ftp off --nfs off EFSSG0023I Protocol FTP is removed from share shared. EFSSG0023I Protocol NFS is removed from share shared.
387
[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP
[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP
388
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Click the OK button, which will proceed to remove the export or share. After it has been removed, it will not be seen in the table that shows existing exports. Using the CLI: You can remove an existing export using the rmexport CLI command. The commands asks for your confirmation. If entered, the command removes all the configuration details of the export from all nodes. See the command usage and output in Example 10-28.
Example 10-28 Command usage and output to remove an existing export. [Furby.storage.tucson.ibm.com]$ rmexport --help usage: rmexport sharename [-c <cluster name or id>] [--force] sharename Specifies the name of the export for identification. -c,--cluster <cluster name or id> define cluster --force enforce operation without calling back the user
[Furby.storage.tucson.ibm.com]$ rmexport shared Do you really want to perform the operation (yes/no - default no): yes EFSSG0021I The export shared has been successfully removed.
389
CIFS
CIFS export needs to be mounted before accessing. A CIFS share can be accessed using both Windows and UNIX machine. 1. Accessing CIFS using Windows: To mount a CIFS share from Windows, right-click My computers and click Map a Network Drive as shown in Figure 10-91.
A new window opens that asks you to enter the Drive and path details. Choose a drive letter from the drop-down list. Enter the path for the share you want to access in the following format: \\cluster_name\sharename where cluster_name is the name of the cluster you want to access and sharename is the name of the share that you want to mount.
390
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In our example as seen in Figure 10-92, cluster_name, we specify as IP: 9.11.137.219 and sharename is shared. We mount the share on the X drive.
Click the different user name link on the previous window and enter the Windows user name and password. This user must have access or ACLs set to access this share. In our example, the user is: STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-93.
Figure 10-93 Adding user name and Password to access the share
Click Finish. The share should be mounted successfully. You can then access the share by accessing My Computer and the X drive which you just mounted.
391
Double-click the Drive and you will be able to see the contents of the share as shown in Figure 10-94.
2. Accessing CIFS using UNIX: Mount the CIFS share using the mount.cifs command as shown in Figure 10-95. In our example, we use the client, sonaspb44 which is a Linux client. We create a directory, cifs_export, in the /mnt directory, where we mount the share. The cluster is Furby.storage.tucson.ibm.com and share is shared. The user we have used to access is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-95.
Figure 10-95 Command to mount and access the data from UNIX
392
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
NFS
NFS shares are to be mounted too for accessing data. Following we show how to mount on UNIX clients. In our example, we use the Linux client sonaspb44 and have created a directory, nfs_export in the /mnt directory, where we mount the NFS export. The cluster is Furby.storage.tucson.ibm.com and share is shared. See Figure 10-96.
FTP
FTP shares can be accessed by both Windows and UNIX. Use the ftp command to access the export. You can also use external FTP client applications on windows to access the share. Next we explain access from both Windows and UNIX. 1. Accessing FTP from Windows: You can use any FTP client to access data from the FTP export. We use the command prompt to display the same. In our example, cluster is Furby.storage.tucson.ibm.com and share is shared. See Figure 10-97. On running, FTP, you are prompted to enter the user and password. In this example, the user is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-97. You then need to run a cd at the FTP prompt to the sharename which you want to access. As shown next, we do an ftp> cd shared to access the FTP export, shared.
393
2. Accessing FTP from UNIX: You can access the FTP data by running the FTP command from the UNIX client. In our example, the cluster is Furby.storage.tucson.ibm.com and share is shared. We use a Linux client sonaspb44. On running FTP, you are prompted to enter the user and password. In this example, the user is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-98. You then need to run a cd at the FTP prompt to the sharename which you want to access. As shown next, we do an ftp> cd shared to access the FTP export, shared.
Figure 10-98 Accessing the FTP share from the Linux Client
394
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Using the CLI: You can list the disks in the cluster using the CLI command lsdisk. This command lists the existing disks along with the information such as the file system it is attached to, the failure group, storage pool, type of disk, many more. The command usage and output are shown in Example 10-29.
Example 10-29 Command usage and help to list the disks in the cluster
[Furby.storage.tucson.ibm.com]$ lsdisk --help usage: lsdisk [-c <cluster name or id>] [-d <arg>] [-r] [-v] [-Y] -c,--cluster <cluster name or id> define cluster -d,--device <arg> define device -r,--refresh refresh list -v,--verbose extra columns -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type gpfs1nsd gpfs0 4004 dataAndMetadata gpfs2nsd gpfs0 4004 dataAndMetadata gpfs3nsd gpfs0 4004 dataAndMetadata gpfs4nsd gpfs1 4004 dataAndMetadata gpfs5nsd 4004
Availability up up up up
Timestamp 4/28/10 3:03 4/28/10 3:03 4/28/10 3:03 4/28/10 3:03 4/28/10 4:42
AM AM AM AM AM
Suspending disks
Using the GUI: You can suspend disks using the Suspend button. Select the disk you want to suspend and click the Suspend button. The operation opens a new window asking for your confirmation before suspending the disk. See Figure 10-99.
Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-100. The new disk will be successfully removed. Click the Close button to close the window.
395
When suspended, the disk appears in the table with status Suspended as shown in Figure 10-101.
Resuming disks
Using the GUI: You can Resume disks using the Resume button. Select the suspended disk you want to resume and click the Resume button. The operation opens a new window which asks for confirmation. See Figure 10-102.
Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-103. The new disk will be successfully removed. Click the Close button to close the window.
The disk that was suspended before will be have status ready as shown in Figure 10-104 on page 397.
396
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-104 Panel shows that the disk has been successfully resumed
[Furby.storage.tucson.ibm.com]$ chdisk --help usage: chdisk disks [-c <cluster name or id>] [--failuregroup <failuregroup>] [--pool <pool>] [--usagetype <usagetype>] disks The name of the device -c,--cluster <cluster name or id> define cluster --failuregroup <failuregroup> failure group --pool <pool> pool name --usagetype <usagetype> usage type Each of parameters that can be changed is explained in detail. 1. Failure Group: You can change the Failure Group of a disk by using the option --failuregroup along with the command chdisk. 2. Storage Pool: You can change the Failure Group of a disk by using the option --storagepool along with the command chdisk. 3. Usage Type: You can change the Failure Group of a disk by using the option --usagetype along with the command chdisk. In Example 10-31 we change each of the parameters for one of the disks array1_sata_60001ff0732f85f8c0b000b. The example also shows the state of the disk before changing and the disk whose information is changed in bold.
Example 10-31 Command output for CLI command lsdisk and using chdisk to change failure group of disk [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type Timestamp Pool Status Availability
397
array0_sata_60001ff0732f8548c000000 4/26/10 3:03 AM array0_sata_60001ff0732f8568c020002 4/26/10 3:03 AM array0_sata_60001ff0732f8588c040004 4/26/10 3:03 AM array0_sata_60001ff0732f85a8c060006 4/26/10 3:03 AM array1_sata_60001ff0732f8558c010001 4/26/10 3:03 AM array1_sata_60001ff0732f8578c030003 4/26/10 3:03 AM array1_sata_60001ff0732f8598c050005 4/26/10 3:03 AM array1_sata_60001ff0732f85d8c090009 4/26/10 3:03 AM array0_sata_60001ff0732f85e8c0a000a 4/26/10 3:03 AM array1_sata_60001ff0732f8608c0f000c 4/26/10 3:03 AM array0_sata_60001ff0732f85c8c080008 4/23/10 10:00 AM array1_sata_60001ff0732f85f8c0b000b 4/24/10 3:05 AM
gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 tms0 tms0
1 1 1 1 2 2 2 2 1 2 1 2
dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system
ready up ready up ready up ready up ready up ready up ready up ready up ready up ready up ready
[Furby.storage.tucson.ibm.com]$ chdisk array1_sata_60001ff0732f85f8c0b000b --failuregroup 200 --pool newpool --usagetype descOnly EFSSG0122I The disk(s) are changed successfully! [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type Pool Status Timestamp array0_sata_60001ff0732f8548c000000 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f8568c020002 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f8588c040004 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85a8c060006 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8558c010001 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8578c030003 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8598c050005 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f85d8c090009 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85e8c0a000a tms0 1 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8608c0f000c tms0 2 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85c8c080008 1 dataAndMetadata system ready 4/23/10 10:00 AM array1_sata_60001ff0732f85f8c0b000b 200 descOnly newpool ready 4/28/10 10:14 PM
Availability up up up up up up up up up up
398
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[Furby.storage.tucson.ibm.com]$ cli help Known commands: addcluster Adds an existing cluster to the management. addnode Adds a new cluster node. attachnw Attach a given network to a given interface of a network group. backupmanagementnodeBackup the managament node cfgad configures AD server into the already installed CTDB/SMABA cluster.Previously configured authentication server settings will be erased cfgbackupfs Configure file system to TSM server association cfgcluster Creates the initial cluster configuration cfghsm Configure HSM on each client facing node cfgldap configure LDAP server against an existing preconfigured cluster. cfgnt4 configure NT4 server against an existing preconfigured cluster. cfgsfu Configures user mapping service for already configured AD cfgtsmnode Configure tsm node.
Chapter 10. SONAS administration
399
chavailnode Change an available node. chcurrnode Changes current node chdisk Change a disk. chexport Modifies the protocols and their settings of an existing export. chfs Changes a new filesystem. chfset Change a fileset. chkauth Check authentication settings of a cluster. chkpolicy validates placement rules or get details of management rules of a policy on a specified cluster for specified device chnw Change a Network Configuration for a sub-net and assign multiple IP addresses and routes chnwgroup Adds or removes nodes to/from a given network group. chservice Change the configuration of a protocol service chuser Modifies settings of an existing user. confrepl Configure asynchronous replication. dblservice stop services for an existing preconfigured server. detachnw Detach a given network from a given interface of a network group. eblservice start services for an existing preconfigured server. enablelicense Enable the license agreement flag initnode Shutdown or reboot a node linkfset Links a fileset lsauth List authentication settings of a cluster. lsavailnode List available nodes. lsbackup List information about backup runs lsbackupfs List file system to tsm server and backup node associations lscfg Displays the current configuration data for a GPFS cluster. lscluster Lists the information of all managed clusters. lscurrnode List current nodes. lsdisk Lists all discs. lsexport Lists all exports. lsfs Lists all filesystems on a given device in a cluster. lsfset Lists all filesets for a given device in a cluster. lshist Lists system utilization values lshsm Lists configured hsm file systems cluster lslog Lists all log entries for a cluster. lsnode Lists all Nodes. lsnw List all public network configurations for the current cluster lsnwdns List all DNS configurations for the current cluster lsnwgroup List all network group configurations for the current cluster lsnwinterface List all network interfaces lsnwnatgateway List all NAT gateway configurations for the current cluster lsnwntp List all NTP configurations for the current cluster lspolicy Lists all policies lspool Lists all pools. lsquota Lists all quotas. lsrepl List result of the asynchronous replications. lsservice Lists services lssnapshot Lists all snapshots. lstask Lists all (background) tasks for the management node. lstsmnode Lists defined tsm nodes in the cluster lsuser Lists all users of this mangement node. mkavailnode Add an available node to the database. mkcurrnode Makes current node mkexport Creates a new export using one or more protocols. mkfs Creates a new filesystem.
400
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
mkfset Creates a fileset mknw Create a new Network Configuration for a sub-net and assign multiple IP addresses and routes mknwbond Makes a network bond from slave interfaces mknwgroup Create a group of nodes to which a network configuration can be attached. See also the commands mknw and attachnw. mknwnatgateway Makes a CTDB NAT gateway mkpolicy Makes a new policy into database mkpolicyrule Appends a rule to already existing policy mkservice Configure services mksnapshot creates a snapshot from a filesystem mktask Schedule a prefedined task for mkuser Creates a new user for this management node. mountfs Mount a filesystem. querybackup Query backup summary restripefs Rebalances or restores the replication of all files in a file system. resumenode Resumes an interface node. rmbackupfs Remove file system to TSM server association rmcluster Removes the cluster from the management (will not delete cluster). rmexport Removes the given export. rmfs Removes the given filesystem. rmfset Removes a fileset rmlog Removes all log entries from database rmnode Removes a node from the cluster. rmnw Remove an existing public network configuration rmnwbond Deletes a regular bond interface. rmnwgroup Remove an existing group of nodes. A maybe attached public network configuration must be detached in advance rmnwnatgateway Unconfigures a CTDB NAT gateway. rmpolicy Removes a policy and all the rules belonging to it rmpolicyrule Removes one or more rules from given policy rmsnapshot Removes a filesystem snapshot rmtask Removes the given scheduled task. rmtsmnode Remove TSM server stanza for node rmuser Removes the user from the management node. rpldisk Replaces current NSD of a filesystem with a free NSD runpolicy Migrates/deletes already existing files on the GPFS file system based on the rules in policy provided setnwdns Sets nameservers setnwntp Sets NTP servers setpolicy sets placement policy rules of a given policy on cluster passed by user. setquota Sets the quota settings. showbackuperrors Shows errors of a backup session showbackuplog Shows the log of the recent backup session. showrestoreerrors Shows errors of a restore session showrestorelog Shows the log of the recent restore session. startbackup Start backup process startreconcile Start reconcile process startrepl Start asynchronous replication. startrestore Start restore process stopbackup Stops a running TSM backup session stoprepl Stop asynchronous replication. stoprestore Stops a running TSM restore session suspendnode Suspends an interface node. unlinkfset Unlink a fileset.
401
unmountfs
Unmount a filesystem.
Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut, head, less, tail, uniq For additional help on a specific command use 'man command'. To get more help on each of the commands, the administrator can check the manpage by running man <command_name> or <command_name> --help as shown. In the example the command mkuser is used. As mentioned previously, the CLI user as of now has no roles defined. As of now, a CLI user can run all the administrative commands to manage the cluster, storage, filesystems and exports. The administrator can also look into the logs and utilizations charts for information about the health of the cluster and its components. 2. SONAS GUI user: The SONAS GUI user must be added into the GUI by the root user. After an install, the root user is automatically added into the GUI. Log in as the root user and password. Click the link Console User Authority link under the Settings category of the GUI. This will open up a page on the left, which will have a table that lists all the GUI users who can access the GUI and their roles. See point 2 under Settings on page 344. The tasks that you can perform in that panel are, add a new user or remove a GUI user. You can do this using the Add and Remove buttons respectively, as explained next. Add user: Add a user by clicking the Add button. A new page asking for the user details will open. Type in the user name. This user should be an existing CLI user already created using the mkuser command before. You need also specify the role for the user. You can have different roles such as these: Administrator : This user will have all the administrator rights and can perform all the operations such as the CLI user. Storage Administrator: This user will have rights to manage the storage. Tasks to operate on the storage can all be done by this user. Operator: Operator would have only read access. This user can view the logs, health status and overall topology of the cluster. System administrator: This user can administer the system as a whole. Click OK when done. Figure 10-105 shows the panel to add a new user.
Figure 10-105 Panel to add user and user roles to the Users
After a user is added, the table will display the newly added user as shown in Figure 10-106.
402
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Remove User: Select the user to delete and click the Remove button. The user will be successfully deleted from the GUI. CLI: Deleting a user from the GUI does not delete a user from the CLI. The CLI user still exists. Logout: A user selected can be logged out using the Logout button. Depending on the Role given to the user, the GUI user will have different access permissions and can perform different operations.
Authentication is the process of verifying the identity of the user. Users confirms that they are indeed the users they are claiming to be. This is typically accomplished by verifying the user ID and password from the authentication server. Authorization is the process of determining if the users are allowed to access. The users might have permissions to access certain files but might not have permissions to access others. This is typically done by ACLs.
The fIle system ACLs supported in the current SONAS are GPFS ACLs, which are NFSV4 ACLs. The directories and exports need to be given the right ACLs for the users to be able to access. As of now you can give the owner the rights or permissions to an export, by specifying the owner option while creating one from both the GUI or CLI. If you want to give other users access, you need to modify the ACL file in GPFS for the directory or export using the GPFS mmeditacl command. You can view ACLs by using the GPFS mmgetacl command.
403
ACLs: Right now, you need to use the GPFS command to view or edit ACLs. This command requires root access. Example 10-33 shows how you can provide ACLs to a directory or export.
Example 10-33 Viewing current ACLs for an export using GPFS command mmgetacl
export EDITOR=/bin/vi $ mmgetacl /ibm/gpfs0/Sales #NFSv4 ACL #owner:root #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Example 10-34 adds ACLs for another user. Consider in the example, we are giving Read-Write access to the Windows AD user David, for an already existing export named Sales in the /ibm/gpfs0 filesystem.
Example 10-34 Adding ACL for giving user DAVID access to the export
$ mmeditacl /ibm/gpfs0/Sales #NFSv4 ACL #owner:root #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED user:STORAGE3\david:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Save the file and when you quit, click yes when asked to confirm the ACLs. The new ACLs will then be written for the user and the export.
404
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Depending on the users you want to give access, you can add them in the ACLs file. You can also give group access in a similar way as before and add users to the group.
[Furby.storage.tucson.ibm.com]$ stopmgtsrv --help usage: stopmgtsrv stop the management service [Furby.storage.tucson.ibm.com]$ stopmgtsrv EFSSG0008I Stop of management service initiated by root 2. Start Management Service: Using the GUI: You cannot start the Management Service using the GUI. Using the CLI: You can start the Management Service using the startmgtsrv CLI command. Command usage and output is shown in Example 10-36.
Example 10-36 Command usage and output for starting Management or CLI service
[Furby.storage.tucson.ibm.com]$ startmgtsrv --help usage: startmgtsrv [-f | --force] start the management service -f, --force restart gui if already running [Furby.storage.tucson.ibm.com]$ startmgtsrv EFSSG0007I Start of management service initiated by root After the service has started, you can verify by running the CLI help command on the CLI or access the GUI. CLI help should display all the commands that are available for the CLI user. The GUI should prompt for user ID and password.
405
If you are unable to access the GUI or CLI commands, either restart the Management service using the command startmgtsrv with the --force option. This restarts the Management Service. Command output is as shown in Example 10-37.
Example 10-37 CLI command starmgtsrv to start CLI and Management service forcefully
[Furby.storage.tucson.ibm.com]$ startmgtsrv --force EFSSG0008I Stop of management service initiated by root EFSSG0007I Start of management service initiated by root
[Furby.storage.tucson.ibm.com]$ lsservice --help usage: lsservice [-c <cluster name or id>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol yes yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes In the example, you can see that all the services are configured. This means, all the configuration files for the services are up to date on each node of the cluster. Under the column Is Active, you can see if the service is active or inactive. Active denotes that the service is up and running. Exports can be accessed using that service. Users or clients can access the data exported using that protocol or service. Inactive means that the service is not running and hence all data connections will break.
406
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
2. Enable Service: Using the GUI: You cannot enable service using the GUI. Using the CLI: You can enable service using the eblservice CLI command. The command usage and output is as in Example 10-39. To enable service, you need to pass the clustername or clusterid as mandatory parameters and also names of services as a comma separated list, which you want to enable. To enable all, you can pass all. The command asks for confirmation. You can also use the --force option to force the operation and override the confirmation. In our example, we have the services FTP and NFS disabled. We enable them using the eblservice command.
Example 10-39 Example showing usage and command output for CLI command eblservice
[Furby.storage.tucson.ibm.com]$ eblservice --help usage: eblservice -c <cluster name or id> [--force] [-s <services>] -c,--cluster <cluster name or id> define cluster --force enforce operation without prompting for confirmation -s,--services <services> services
[Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol yes yes NFS NFS protocol no yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes [Furby.storage.tucson.ibm.com]$ eblservice -c st002.vsofs1.com -s ftp,nfs --force [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol yes yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes
3. Disable service: Using the GUI: You cannot enable service using the GUI. Using the CLI: You can disable service using the CLI dblservice command. The command usage and output is as in Example 10-40. To disable service, you need to pass the names of services, as a comma separated list, which you want to disable. To disable all, you can pass all. The command asks for your confirmation. You can also skip the confirmation by using the --force option which forces the disabling of service. You can also confirm using the command lsservice as shown in Example 10-40. CIFS and SCP need to be always running. CIFS is required to be running for CTDB to be healthy. SCP is the SSH service and cannot be stopped as all the internal communication between the nodes is done using SSH. In case you pass CIFS and SCP, they will not be stopped and a warning message is issued. The other services will be stopped.
407
Example 10-40 Usage for CLI command dblservice and output when disabling FTP only
[Furby.storage.tucson.ibm.com]$ dblservice --help usage: dblservice [-c <cluster name or id>] [--force] -s <services> -c,--cluster <cluster name or id> define cluster --force enforce operation without prompting for confirmation -s,--services <services> services
[Furby.storage.tucson.ibm.com]$ dblservice -s ftp Warning: Proceeding with this operation results in a temporary interruption of file services Do you really want to perform the operation (yes/no - default no): yes EFSSG0192I The FTP service is stopping! EFSSG0194I The FTP service is stopped! [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes The second Example 10-41 shows disabling all the services with the --force option. You can also see the warning message for CIFS and SCP in this case.
Example 10-41 Example where all services are disabled - CIFS and SCP show warning message
[Furby.storage.tucson.ibm.com]$ dblservice -s all --force EFSSG0192I The NFS service is stopping! EFSSG0192I The HTTP service is stopping! EFSSG0193C Disable SCP services failed. Cause: Never stop scp/sshd service. We didn't stop scp/sshd service but other passed services were stopped. EFSSG0192I The FTP service is stopping! EFSSG0193C Disable CIFS services failed. Cause: Never stop cifs service. We didn't stop cifs service but other passed services were stopped. EFSSG0109C Disable services failed on cluster st002.vsofs1.com. Cause: SCP : Never stop scp/sshd service. We didn't stop scp/sshd service but other passed services were stopped.CIFS : Never stop cifs service. We didn't stop cifs service but other passed services were stopped.
[Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol no yes NFS NFS protocol no yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes
4. Change service configuration: Using the GUI: You can change the configuration for each configured service using the GUI. As seen in point 1.f under Clusters on page 320 you can see the table containing list of services. Each of these services are a link you can click, upon which a new window opens allowing you to change different configuration parameters for the service.
408
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
FTP: A new window as in Figure 10-107 shows the various parameters that you can change for the FTP configuration. Click Apply when done. The new configuration data is written into the CTDB registry and also the FTP configuration files on each node.
HTTP: HTTP requires you install a HTTP certificate. When you click the HTTP link, you see a window as in Figure 10-108. You can install an existing certificate or generate a new one.
409
Upload an Existing Certificate: You can upload an existing certificate which is a .crt or .key file. Click the Upload Certificate button to upload a new certificate. A new window as shown in Figure 10-109 opens up. It asks you for the path for the certificate. Click Browse and search for the certificate file. Click the Upload button to upload the file. This window then closes. Click the Install Certificate button as shown in Figure 10-108.
Generate a New Certificate: To generate a new certificate fill out all the text boxes as shown in Figure 10-108. Click the Generate and Install certificate button. This will generate a new certificate and installs it.
NFS: NFS as of now does not have any configuration parameters to modify. CIFS: As shown in Figure 10-110, you can see the different parameters that you can change for CIFS. As you can see in the figure, you can change some common parameters and also some Advanced Options. You can Add, Modify, or Remove advanced parameters using the respective buttons in the panel. Click the Apply button when done. The configuration will be successfully written on all nodes.
SCP: When you select the SCP protocol by clicking its link, a new window opens. Figure 10-111 shows the different parameters you can modify for SCP service or protocol. SCP protocol also provides SFTP method for data access. You can allow or disallow SFTP by using the check box for it. Click Apply to apply the changes.
410
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Using the CLI: At this time, you cannot change the configuration parameters from the SONAS CLI.
Select the Measurement duration from the drop-down menu. This list allows you to select a duration of time for which you want to measure the utilization of the system. You can choose durations such as, Daily, Weekly, Monthly, 6 monthly, 18 monthly as shown in Figure 10-113.
411
After you are done selecting the Node variable whose Utilization you want to check and duration of check, click the Generate Charts button, and the chart is generated. The figures show two examples: one displaying charts of Daily - Memory Usage for the Management Node (Figure 10-114) and the other Weekly - Disk I/O for Interface Node 2 (Figure 10-115).
412
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 10-115 Weekly Disk I/O Utilization charts for Interface Node 2
The previous examples only show some of the available options. You can also generate charts for all nodes or select nodes.
413
After you are done with selecting the Node and duration, click Generate Charts button. The chart is generated. In our figures we have just a single filesystem. We generate charts for Daily - File System Usage (Figure 10-117) and the other Weekly - File system (Figure 10-118).
414
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
415
Choose the warning, error level and reoccurrences you want to track as shown in Figure 10-120.
When done, click OK. A new threshold will be added to the list. You need to configure the recipients in order to receive email notifications. Click the link Notification Settings under the SONAS Console settings in the left pane in the Management GUI.
416
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Using the CLI: The tasks can be scheduled using the CLI command mktask. The command takes some input values such as cluster name, seconds, minutes, hours and other time values for the task to run. Along with this, there is an option called parameter. This parameter is optional and only valid for a cron task. The GUI tasks currently do not have any parameters. An error is returned to the caller if this option is denoted for a GUI task. The parameter variable is a space separated parameter. The command usage and output for adding both GUI and CRON is as shown in Example 10-42. The following CRON tasks are available in SONAS: MkSnapshotCron: Parameter: The cron job expects two parameters in the following order: clusterName: The name of the cluster the file system belongs to filesystem: The file system description (for example, /gpfs/office) StartReplCron: Parameter: The cron job expects two parameters in the following order: source_path: The directory that shall be replicated target_path: The directory to which the data shall be copied StartBackupTSM: Parameter: The cron job expects one parameter clusterName: The cluster of the file systems which must be backed up StartReconcileHSM: Parameter: The cron job expects three parameters in the following order: clusterName: The cluster of the file systems which must be backed up filesystem: The file system to be reconciled node: The node on which the file system is to be reconciled BackupTDB: Parameter: The cron job expects one parameter target_path: The directory to which the backup shall be copied For more information about how to add these parameters for these CRON tasks, refer to the manpage for the command mktask. Example 10-42 shows the adding of the MkSnapshotCron task. This task is a CRON task. This task takes 2 parameters, Clustername and Filesystem name. For our example, we have clustername as Furby.storage.tucson.ib.com and filesystem as gpfs0. In the second example in Example 10-42, we add a task that is a GUI task.
Example 10-42 Command usage and output in adding CRON and GUI tasks using CLI command mktask
[Furby.storage.tucson.ibm.com]$ mktask --help usage: mktask name [-c <cluster name or id>] [--dayOfMonth <dayOfMonthdef>] [--dayOfWeek <dayOfWeekdef>] [--hour <hourdef>] [--minute <minutedef>] [--month <monthdef>] [-p <parameter>] [--second <seconddef>] name Specifies the name of the newly created task. -c,--cluster <cluster name or id> define cluster --dayOfMonth <dayOfMonthdef> define the scheduler option for the dayOfMonth --dayOfWeek <dayOfWeekdef> define the scheduler option for the dayOfWeek
Chapter 10. SONAS administration
417
--hour <hourdef> --minute <minutedef> --month <monthdef> -p,--parameter <parameter> --second <seconddef>
for the minute for the minute for the month to the scheduled cron task for the second
[Furby.storage.tucson.ibm.com]$ mktask MkSnapshotCron --parameter "Furby.storage.tucson.ibm.com gpfs0" --minute 10 --hour 2 --dayOfMonth */3 EFSSG0019I The task MkSnapshotCron has been successfully created.
[Furby.storage.tucson.ibm.com]$ mktask FTP_REFRESH --minute 2 --hour 5 --second 40 EFSSG0019I The task FTP_REFRESH has been successfully created.
Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-122. The new disk will be successfully removed. Click the Close button to close the window.
Using the CLI: You can remove the task added using the CLI command rmtask. This command deletes the command from the list of tasks to be scheduled by the system. An error is returned to the caller if a task that does not exist is denoted. The command usage and output is shown in Example 10-43. In the first example, we delete a CRON task added, MkSnapshotCron and in the second example we delete the GUI task FTP_REFRESH.
418
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
[Furby.storage.tucson.ibm.com]$ rmtask --help usage: rmtask name [-c <cluster name or id>] name Specifies the name of the task for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ rmtask MkSnapshotCron EFSSG0021I The task MkSnapshotCron has been successfully removed. [Furby.storage.tucson.ibm.com]$ rmtask FTP_REFRESH EFSSG0021I The task FTP_REFRESH has been successfully removed.
Figure 10-123 Panel to modify a CRON task- the Schedule and Task parameter can be modified
419
Figure 10-124 shows the GUI task. As you can see, the Schedule for the task can be modified. Click Apply when done to apply the changes. In this example, we have considered modifying the GUI task FTP_REFRESH.
10.11.1 Topology
From the GUI, In the Health Summary section on the left panel, you can reach the topology feature. This will display a graphical representation of the varied SONAS architectural components. Indeed you will find there information regarding Management and Interface Nodes status, but also the public and data networks, Storage Pods, File System, and exports.
420
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Overview
The first view as shown in Figure 10-125, will give you a big picture of your system, for more details on a specific area you will have to expand this area.
In the Topology view, all components of your SONAS system are described. When moving your cursor over the selected component, you will see a tooltip as shown in Figure 10-125. Then, for more information about one of these components, click the appropriate link.
421
Layer 1 shown in Figure 10-126 gives a short status overview of the main system components, and the view is updated every 5 seconds. You can click the Layer 1 icon to drill down to level 2 displays.
Component Status Summary (click to open Level 3 Status view) Component Title Component Icon Click to display Level 2 or Level 3 Details view Click to display Level 2 view
Component Details
Layer 2, shown in Figure 10-127, display details about the Interface Nodes and Storage building blocks and is updated every 5 seconds. Clicking a Layer 2 icon brings up the Layer 3 view.
Displays Interface Nodes based on logical internal name (i.e. 001 represents int001st001)
422
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Layer 3, an example of which is shown in Figure 10-128, gives the deepest level of accuracy. Modal dialog windows are opened to display the details that are updated every 30 seconds, or can be refreshed manually by clicking the refresh icon.
All Interface, Management and Storage Node Details have the following same tabs: Hardware, Operating System, Network, NAS Services and status.
Interface Node
For instance, if you need more information regarding the interface nodes because of the warning message in the previous figure, click the Interface Nodes (6) link in that figure and you will see information as shown in Figure 10-129.
The new windows show you an overview of the Interface Nodes configuration of your SONAS Storage Solution. We can see here that the Warning message propagated in the global overview is actually not a warning message for a particular node, but for all of them. Here again in order to have more details for a given interface node, click the chosen target and you will see all information related as described in Figure 10-130.
423
In the next Operating System section, you will find details regarding the Computer System details, the Operating System Details, or the Local File System as shown in Figure 10-131.
424
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
If you need information about the Network status on that particular Interface Node, then choose the Network section and you will find there all information regarding all network bonding interface configured on the selected Interface Node as shown in Figure 10-132.
Similarly if you need input regarding the NAS services or the Interface Node status, then choose the appropriate tabs as described in Figure 10-133 and Figure 10-134.
425
The NAS Services section shows you all Exports such as CIFS, NFS, HTTP, FTP and SCP status, or service status such as CTDB or GPFS for instance; and the Status section gather all previous information section with more details. Whereas the three first sections are static, indeed you will find there only configuration information, the two last ones are dynamic and the warning icon seen on higher level, interface nodes level or topology level, refers only to the Status section (NAS services issues would also be included in Status section), to the first line with the degraded Level to be more precise. After this issue fixed, the warning icon will disappear.
Management Node
Back to the Topology overview, if you are more interested in Management Node information, then click the Management Node link and you will see the same windows and hierarchy as described sooner for the Interface Nodes. Exact same section and tab except the NAS Service section which does not exist anymore and is replaced by the Management Section as you can see in Figure 10-135.
426
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Interface Network
From the Topology overview, you can also get information from the Interface Network by clicking the Interface Network link. There you will find information regarding the Public IP addresses in use and the authentication method as described in Figure 10-136 and Figure 10-137.
Data Network
Again, from the Topology overview, if you need to pick up some information regarding the data network, or InfiniBand network, then click the Data Network link and you will see something similar to Figure 10-138. In the first tab, Network, you will find information regarding State, IP dress or throughput for each InfiniBand connection filtered by Interface, Management, and Storage Nodes in left tabs. The second tab, Status, gathers information similar to the Status tab for each individual interface node in the Interface Node Topology.
427
428
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
If you click the Storage Pod Icon, you will see another familiar window that enumerates all components of this Storage Pod, as described in Figure 10-140.
The First tab describes Storage Components of the Storage Pod. In our case we have a single Storage Controller, but you can have up to two Storage Controllers and two Storage Expansion units. The First tab of this Storage components view shows storage details, more precisely controller details in our example. The status tab shows you the same kind of details you might see in the status tab from the Interface Node in Figure 10-134 on page 426. We have shown information related to the Storage part of the Storage Pod. If you are looking for information related to Storage Node, you have a dedicated tab for both Storage Nodes inside the Storage Pod. If you click the Storage Node name tabs, you will find more detailed information as shown in Figure 10-141 and Figure 10-142.
429
For these two Storage Nodes, you can find similar information we presented previously for Interface Nodes. The only difference is regarding the Storage tab where you might find information regarding the SONAS File System as shown in Figure 10-143.
Figure 10-143 SONAS File System information for each Storage Node
This Storage Building Block view, where you can find any information regarding Storage Pod used in your SONAS File System is the latest hardware component of the SONAS Storage Solution, but from the overview windows you might also find some information related to the File System and the exports shares.
430
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
File System
Indeed, from the overview window, you might request File System information by clicking the File system component. You will then see a window as shown in Figure 10-144.
The previous windows show you typical File System information such as the device name, the mount point, the size and available space left for instance. Each SONAS File System created will result of one entry in this table.
Shares
As for SONAS file System, you might request to have information about the shares you created from these file system. To have such information from the topology overview, click the appropriate component and you will see details as shown in Figure 10-145.
In the previous windows you can see the status, the name, directory associated to your share, but more important the protocol from which SONAS user can access this share. In our example the share is accessible by FTP, NFS and CIFS. These two last components complete the Topology view from the Health Center. Following sections will describe System logs, Call Home and SNMP features.
431
The Alert log panel displays specific information warning and critical events from the syslog and displays them in a summarized view. As SONAS administrator you should have a look first at this log when looking for problems. Each page has around 50 logs displayed, one per event which can be an Info, Warning or Critical message, they are displayed in Blue, Yellow and Red respectively. You can filter logs in the table depending on the severity, time period of logs and source. Source of logs is the host on which the event occurred on as shown in Figure 10-147.
432
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The System Log panel displays system log events that are generated by the SONAS Software, which includes management console messages, system utilization incidents, status changes and syslog events. Figure 10-146 on page 432 shows how the System log panel in the GUI looks.
433
The Denali method use CIM providers which are also used by the System Checkout method, whereas SNMP traps are converted also into CIM providers. All these methods provide inputs to the GUI Health Center Event Log as described in the previous section. Depending on the severity of this issue, it can raise an Electronic Customer Care (ECC) Call Home. The Call Home feature has been designed to start first with hardware events based on unique error codes. This Call Home feature is configured as part of first time installation. It is used to send hardware events to IBM support. Call Homes are based only on Denali and System Checkout errors, but SNMP traps do not initiate Call Home. The valid machine models that will call home are: 2851-SI1 Interface Nodes 2851-SM1 Management Nodes 2851-SS1 Storage Nodes 2851-DR1Storage Controller 2851-I36 36 Port InfiniBand Switch 2851-I96 96 Port InfiniBand Switch Note that there will be no Call Homes against a 2851-DE1 Storage Expansion unit, because any errors from it will call home against its parent 2851-DR1 Storage Controller Unit. Similarly, any errors against the Ethernet switches will call home against the 2851-SM1 management node. Figure 10-148 shows an example of Call Home, which initiates a Error ID-based Call Home using a 8-character hex value as defined in the RAS Error Code Mapping File.
434
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
11
Chapter 11.
Migration overview
In this chapter we discuss how to migrate your existing file server or NAS filer into the SONAS system. Migration of data on file systems is more complex then migration of data on block devices.There is no universal tool or method for file migration. We cover the following topics: Migration of user authentication and ACLs Migration of files and directories Migration of CIFS shares and NFS exports
435
An NFS V4 ACL consists of a list of ACL entries, the GPFS representation of NFS V4 ACL entries are three lines each, due to the increased number of available permissions beyond the traditional rwxc.
436
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The first line has several parts separated by colons (:). The first two parts identify the user or group and the name of the user or group. The third part displays a rwxc translation of the permissions that appear on the subsequent two lines. The fourth part is the ACL type. NFS V4 provides both an allow and deny type: allow deny Means to allow (or permit) those permissions that have been selected with an X. Means to not allow (or deny) those permissions that have been selected with an X.
The fifth, optional, and final part is a list of flags indicating inheritance. Valid flag values are: FileInherit DirInherit Indicates that the ACL entry should be included in the initial ACL for files created in this directory. indicates that the ACL entry should be included in the initial ACL for subdirectories created in this directory (as well as the current directory). Indicates that the current ACL entry should NOT apply to the directory, but SHOULD be included in the initial ACL for objects created in this directory.
InheritOnly
As in traditional ACLs, users and groups are identified by specifying the type and name. For example, group:staff or user:bin. NFS V4 provides for a set of special names that are not associated with a specific local UID or GID. These special names are identified with the keyword special followed by the NFS V4 name. These names are recognized by the fact that they end with the character @. For example, special:owner@ refers to the owner of the file, special:group@ the owning group, and special:everyone@ applies to all users.
NFS protocol
The Network File System (NFS) protocol specifies how computers can access files over the network in a similar manner to how files are accessed locally. NFS is now an open standard and is implemented in most major operating systems. There are multiple versions of NFS, NFSv4 is the most current and emerging version and the most widespread in use is NFSv3. SONAS supports NFSv3 as a file sharing protocol for data access and the SONAS filesystem implements NFSv4 ACLs. The NFS protocol is a client server protocol where the NFS client accesses data from a NFS server. The NFS server, and SONAS acts as an NFS server, exports directories. NFS allows parameters such as read only, read write and root squash to be specified for a specific export. The NFS client mounts exported directories using the mount command. Security in NFS is managed as follows: Authentication, the process of verifying if the NFS client machine is allowed to access the NFS server, in NFS is performed on the IP address of the NFS client. NFS client IP addresses are defined on the NFS server when configuring the export.
437
Authorization, or verifying if the user can access a specific file, is done based on the user and group of the originating NFS client and this is matched against the file ACLs. As the user on the NFS client is passed as is to the NFS server, a NFS client root user will have root access on the NFS server; to avoid a NFS client gaining root access to the NFS server you can specify the root_squash option.
FTP protocol
File Transfer Protocol (FTP) is a protocol to copy files from one computer to another over a TCP/IP connection. FTP is a client server architecture where the FTP client accesses files from the FTP server. Most current operating systems support the FTP protocol natively and so do most web browsers. FTP supports user authentication and anonymous users. SONAS supports FTP authentication through the SONAS AD/LDAP servers. File access authorization is done with ACL support. SONAS supports enforcement of ACLs and the retrieval of POSIX attributes but ACLs cannot be modified using FTP.
CIFS protocol
The protocol used in Windows environments to share files is the Server Message Block (SMB) protocol, sometimes called the Common Internet File System (CIFS) protocol. The SMB protocol originated in IBM and was later enhanced by Microsoft and was renamed to CIFS. Among the services that Windows file and print servers provides are browse lists, authentication, file serving and print serving. Print serving is out of the scope of our discussion. Browse lists offer a service to clients that need to find a share using the Windows net use command or Windows Network Neighborhood. The file serving function in CIFS comprises the following functions: Basic server function Basic client function Distributed File System (Dfs) Offline files/Client side caching Encrypted File System (EFS) Backup and restore Anti-virus software Quotas The protocol also includes authentication and authorization and related functions such as: NT Domains NT Domain trusts Active Directory Permissions and Access Control Lists Group policies User profile and logon scripts Folder redirection Logon hours Software distribution, RIS and Intellimirror Desktop configuration control Simple file serving function in SONAS is relatively straightforward, however, duplicating some of the more advanced function available on Windows servers can be more difficult to set up. SONAS uses the CIFS component to serve files. Authentication is provided through LDAP or AD with or without the Microsoft SFU component. Authorization is supported using ACLs are enforced on files and directories, for users with up to 1020 group memberships. Windows tools can be used to modify ACLs. ACL inheritance is similar, not identical, to Microsoft Windows and SACLs are not supported.
438
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
439
In the diagram above the data flows twice through the network, read from old file server and write to new SONAS. Depending on the type of the file server it might be possible to run the migration software tool on the old file server system and eliminate one network hop. The amount of time to copy the data over is affected by multiple factors such as these: The amount of data to migrate, the more the data the longer the time it will take to migrate the data. The network bandwidth available for the migration, the greater the bandwidth the shorter the time. One way is to dedicate network links for the migration process. The data mover system will have a greater bandwidth requirements as it will have to read the data from the source system and write it out again to the destination system so it will need twice the network bandwidth as the SONAS appliance. One way to reduce contention is to use two different adapters, one to the source filer and a separate one to the SONAS system. The utilization of the file server. Contention for file server resources might slow down the migration. The file server might be still in production use, therefore, evaluate file server disk and server utilization before the migration. Average file size might impact migration times as smaller files will have more metadata overhead to manage for a given amount of data and so will take longer to migrate. Disk fragmentation on the source file server might slow down the reading of large sequential files. 440
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Applications and users typically need access to a whole export or share or a whole subdirectory tree in an export or share. In general the application does not allow to have access to a subset of directories or shares. Consequently, the migration of files needs downtime. During the migration process some files are already migrated while others are not and there are no mechanisms for synchronization between migration and user or application access to the file. Consequently applications and users cannot access data while files are being migrated. The data migration process can be executed in different ways: Migration of a file server in a single step Needs long downtime for larger file servers Requires downtime for all applications/users IP address of old file server can be replaced by IP address of SONAS in DNS server File access path does not change from an application/user point of view
Migration of a file server one share/export after the other Shorter downtime as above Requires downtime for some apps/users DNS update does not work Old file server and SONAS run in parallel Applications and users must use new access path, once files are migrated, and this requires client side changes Migration of a file server one subdirectory after the other Requires a shorter downtime than the case above Same considerations as for migration by share/export The use of tools that allow incremental resyncronization of changes from source to target open up additional possibilities and we show the two options: Stopping the client applications, copying the files from source system to destination and then redirecting clients to the SONAS target. This approach requires potentially large downtimes to copy all the data. Copy the data to the SONAS target while clients access the data. After most of the data has been copied the client applications are stopped and only the modified data is copied. This approach reduces the downtime to a synchronization of the updated data since the last copy was performed. It requires that the file copy tool that you use support incremental file resynch.
441
Installations using the CIFS client access can use standard Windows tools for file migration such as xcopy or robocopy. SONAS ACLs are not fully interoperable with Windows ACLs. If you have complex ACL structures, for example, structures that contain large numbers of users and groups or nested groups, an expert assessment of the ACL structure is strongly preferable, and a proof of concept might be needed to verify differences and develop a migration strategy. If you have a mixture of NFS v3 and CIFS access to your file systems you must decide wether to use Windows or UNIX copy tools as only one tool can be used for the migration. As Windows metadata tends to be more complex than the UNIX metadata we suggest that you use the Windows migration tools for migration and then verify if UNIX metadata is copied correctly. Additional challenges might be present when you must migrate entities such as sparse files, hard and soft links and shortcuts. For example, using a program that does not support sparse files to read a sparse file that occupies 10MB of disk space and represents 1 Gb of space will cause 1 Gb of data to be transferred over the network. You need to evaluate these cases individually to decide how to proceed. ACLs: The migration of ACLs makes sense only when the destination system will operate within the same security context as the source system, meaning that they will use the same AD or LDAP server.
robocopy
richcopy
secure copy
442
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
rsync
The rsync tool is a UNIX application to synchronize files and directories between different servers. It minimizes data transfer between the sites as it can find and trisomic only file differences, this can be useful when performing data migration in incremental steps. The rsync tool supports compression and encrypted transmission of data and it offers bandwidth throttling to limit both bandwidth usage and load on the source system. It supports the copying of links, devices, owners, groups, and permissions and ACLs; it can exclude files from copy and copy links and sparse files. The rsync tool should be used to transport data between NFS shares.
net rpc share Samba offers a utility called net that when used as net rpc share migrate files can be used to copy files and directories with full preservation of ACLs and DOS file attributes. To use this utility to migrate files from an existing Windows file server to a SONAS system you need a separate data mover system running Linux with samba. This migration approach can be used to transport data between CIFS shares. For more information about the commands, see: http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/NetCommand.html There are other tools for file and directory copy but they are outside the scope of our discussion. Whatever tool is chosen you should test the migration process and the resulting migrated files and directories before performing the real migration and switchover. Special care should be placed in verifying the permissions and ACL migration. Tools such as Brocade VMF/StorageX have been discontinued. For information about the F5 file virtualization solution, see: http://www.f5.com/solutions/technology-alliances/infrastructure/ibm.html There are various products on the market that can perform transparent file migration from a source file server to a destination file server such as SONAS. These products act as a virtualization layer that sits between the client application and the file servers, migrates data in the background while redirecting user access to the data. The F5 intelligent file virtualization solutions enable you to perform seamless migrations between file servers and NAS devices such as SONAS. No client reconfiguration is required and the migration process that runs in the background does not impact user access to data. For more information, see: http://www.f5.com/solutions/storage/data-migration/ The AutoVirt file virtualization software offers a policy-based file migration function that can help you schedule and automate file migration tasks and then perform the file migration activities transparently in the background while applications continue to access the data. For more information, see: http://www.autovirt.com/ The SAMBA suite offers a set of tools that can assist in migration to a Linux SAMBA implementation. The net rpc vampire utility can be used to migrate one or more NT4 or later domain controllers to a SAMBA domain controller running on Linux, the vampire utility acts as a backup domain controller and replicates all definitions from the primary domain controller. SAMBA also offers the net rpc migrate utility that can be used in multiple ways as illustrated: net rpc share migrate all migrates shares from remote to a destination server. net rpc share migrate files migrates files and directories from remote to a destination server. net rpc share migrate security migrates share-ACLs from remote to destination server. net rpc share migrate shares migrates shares definitions from remote to destination server.
Chapter 11. Migration overview
443
444
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Network bandwidth
With this information you can start to estimate aspects such as the amount of time it will physically take to migrate the data, in what timeframe this can be done and what impact it will have on users.
445
and might require multiple service interruptions to introduce and remove the migration appliance. This approach is not applicable in SONAS as it is an appliance and the data to migrate comes from external appliances. A file system migration uses tools to copy files and file permissions from a source file server to a target file server. The advantages are that there are multiple free software tools such as xcopy and rsync to do this and they are relatively easy to set up, require little or no new hardware. The disadvantages are that the migration of ACLs needs administrative account rights for the duration of the migration, it is generally slower then block-level migration as the throughput is gated by the network and the migration server and you must plan for mapping CIFS to NFS shares. File level backup and restore is also a viable migration option. It has to be a file-level backup, so NDMP backups are not an option as they are full backups and are written in an appliance specific format. Also the file level backups have to come from the same operating system type as the target system, so in the case of SONAS the source system should be a UNIX or Linux system. The advantages of this approach are that it is fast, the backup environment most likely already in place and there are minimal issues due to ACLs and file attributes. The possible disadvantages include that restores from these backups need to be tested before the migration date, tapes might get clogged up by the migration so scheduled backups might be at risk and also network congestion if there is no dedicated backup network. The diagram in Figure 11-2 shows the components and flows for a file-system copy migration:
3 6
Authentication server with AD or LDAP Windows CIFS client
4
Source file server
Windows server for CIFS w/robocopy Linux server for NFS w/rsync
2
Target SONAS
First note that all components share and access one common authentication service (3) that runs one of the protocols supported in a SONAS environment. UNIX and Linux clients (6) are connected to the source file server (1). We have a server to migrate each file sharing protocol, so for UNIX file systems, we use the Linux server with robocopy (5) and for Windows filesystems, we use the Windows server with robocopy (4). The UNIX server (5) and Windows server (4) connect to both the source file server (1) and to the SONAS target server (2) over the customer LAN network. The robocopy or rsync utilities running on these servers will read file data and metadata from the source file server (1) and copy them file by file to the target SONAS (2). The migration steps in this scenario are as follows: 1. 2. 3. 4. Copy one group of shares or exports at a time, from source filer to SONAS Shutdown all clients using those shares or exports Copy any files that have been changed since last copy Remap the clients to access the SONAS and restart the clients
446
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
447
Assume that after test runs in our environment we measured the following throughputs: 63 MB/s or 504 Mb/s on a 10 Gb link 48MB/s or ~380 Mb/s on a 1 Gb link In this case the migration of the new data would last about 1,10 h on the 10 Gb link and 1,44 h on the 1 Gb link. In our migration test this would translate to about 1- 1.5h migration time for 244GB data and the maximum amount of data changes per day of about 1.3 TB, assuming a 6 hour migration window and a 63MB/sec data rate on the 10 Gb link. Continuing with the example above, in addition to the 244 GB/day change to the mailbox, users do also archiving of the changes. Assuming that the complete archive file will also be migrated this will result in the following duration: 10 Gb link: (500MB+1500MB)*2000/63MB/s = ~17h 1 Gb link:(500MB+1500MB)*2000/48MB/s = ~23h In this case the migration would run longer then the allocated window. You now have two options: Split the migration load to two separate migration servers or run the migration tool more frequently as most tools only migrate the delta between the source and the target file. As mentioned before, the right measure will probably only be determined by test runs.
448
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
12
Chapter 12.
449
450
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Enter the administrator user and password and click the Log in button and you will be connected to the SONAS admin GUI as shown in Figure 12-2. Note that the default userid and password for a newly installed SONAS appliance is root and Passw0rd.
Help: You can access SONAS help information from the SONAS at the following URL: https://9.11.102.6:1081/help
# ssh [email protected] [email protected]'s password: Last login: Mon Aug 3 13:37:00 2009 from 9.145.111.26
451
[[email protected] ~]# mkuser my_admin -p segreta EFSSG0019I The user my_admin has been successfully created. We can list the users with the lsuser command as shown in Example 12-3.
Example 12-3 List users via CLI
[[email protected] ~]# lsuser Name ID GECOS Directory Shell cluster 901 /home/cluster /usr/local/bin/rbash cliuser 902 /home/cliuser /usr/local/bin/rbash my_admin 903 /home/my_admin /usr/local/bin/rbash
To add a user created via the CLI to the GUI, for example, my_admin, you use the Add button in the Console User Authority window as shown in Figure 12-3.
452
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
You will see a panel as shown in Figure 12-4. Enter the administrator name and select the administrator role to grant this administrator maximum privileges and click OK.
SONAS offers multiple GUI administrative roles to limit the administrator working scope within the GUI. The following roles are available: administrator The administrator has access to all features and functions provided by the GUI. It is the only role that can manage GUI users and roles, and is the default when adding a user with the CLI. The operator can do the following: Check cluster health, view cluster configuration, verify system and file system utilization and manage thresholds and notifications settings. The export administrator is allowed to create and manage shares, plus perform the tasks the operator can execute. The storage administrator is allowed to manage disks and storage pools, plus perform the tasks the operator can execute. The system administrator is allowed to manage nodes and tasks, plus perform the tasks the operator can execute.
operator
Roles: These user roles only limit the working scope of the user within the GUI. This limitation does not apply to the CLI, which means the user has full access to all CLI commands.
453
The Topology view offers a high level overview of the SONAS appliance, it highlights errors and problems and allows you to quickly drill down to get more detail on individual components. The Topology view offers an overview of the following components: Networks: interface and data networks Nodes: interface, management, and storage File systems and exports
454
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Figure 12-6 we see that the interface node is in critical status as it is flagged with a red circle with an x inside. To expand the interface node, click the blue Interface Nodes link or on the plus (+) sign at the bottom right of the interface nodes display and you will see the interface node list and the current status as shown in Figure 12-6.
To see the reason for the critical error status for a specific node, click the node entry in the list and you get a status display of all events as shown in Figure 12-7.
455
The first line shows that the problem originated from a critical SNMP error, after having corrected the error situation you can mark it as resolved by right-clicking the error line and and clicking the Mark Selected Errors as Resolved box as shown in Figure 12-8.
From the Topology view you can display and easily drill down to SONAS appliance information, for example, to view the filesystem information, click the Filesystems link in the Topology view as shown in Figure 12-9.
You will see a display such similar to the one shown in Figure 12-10.
If you click the new window sign as shown in Figure 12-11 you will see the SONAS filesystem configuration window.
456
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The system log contains operating system messages and events. It is accessed from Health Summary System Log and a sample is shown in Figure 12-14.
457
You can report on CPU, memory, network and disk variables and generate reports from a daily basis up to 3 years. To generate a disk I/O report for strg001st001 select the storage node, select Disk I/O as Measurement Variable and select Monthly Chart for Measurement Duration and click the Generate Charts button and you will get a chart as illustrated in Figure 12-16.
458
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Filesystem utilization
Select Performance and Reports Filesystem Utilization and you will see a list of SONAS filesystems as illustrated in Figure 12-17.
You can generate space usage charts by selecting a filesystem and a duration such as Monthly chart, click Generate Charts and you will get a chart as shown in Figure 12-18.
459
The next step is to configure notification recipients. Select SONAS Console Settings Notification Recipients Add Recipient and you are presented with the panel shown in Add recipients panel (Figure 12-20).
460
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The notification recipients screen is now updated with the email recipient as shown in Figure 12-21.
You can monitor specific utilization thresholds by going to SONAS Console Settings Utilization Thresholds and will see a panel as illustrated in Figure 12-22 and clicking the Add Thresholds button:
461
You are prompted for a threshold to monitor from the following list: File system usage GPFS usage CPU usage Memory usage Network errors Specify warning and error levels and also recurrences of the event as shown in Figure 12-23 and click OK.
462
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Click the Create a File System button and you will be presented with the panel shown in Figure 12-25.
On this panel you will see multiple tabs. The Select NSDs tab allows you to choose what Network Shared Disks (NSDs) to use. Basic - to select mount point, block size and device name Locking and ACLs Replication settings Automount settings Limits - maximum nodes Miscellanies - for quota management settings Choose one or more NSDs and then select the Basic tab and specify mount point and device name as shown in Figure 12-26. Accept the default for all remaining options.
Now click the OK button at the bottom of the screen (not shown in our example). A progress indicator is displayed as shown in Figure 12-27. Click Close to close the progress indicator.
463
After completion you will see the filesystems list screen with the new redbook filesystem as shown in Figure 12-28.
Tip: To display additional information about a given filesystem, click the filesystem name in the list. The name will be highlighted and the detailed filesystem information for the selected filesystem will be shown.
464
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Note that the redbooks filesystem is not mounted, as 0 nodes appears in the Mounted on Host column as shown in Figure 12-28. Select the redbooks filesystem entry and click the Mount button and you will be presented with a box asking where to mount the filesystem. Select Mount on all nodes and click OK as shown in Figure 12-29.
The file system will now be mounted on all interface nodes and on the management node.
[sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs4nsd 1 gpfs5nsd 1 gpfs6nsd 1
Availability up up up
Timestamp 4/21/10 11:22 4/21/10 11:22 4/21/10 11:22 4/21/10 10:50 4/21/10 10:50 4/21/10 11:58
PM PM PM PM PM PM
To create a new filesystem using the gpfs5nsd disk, use the mkfs command as shown in Example 12-5.
Example 12-5 Create the file redbook2 filesystem
mkfs redbook2 /ibm/redbook2 -F "gpfs5nsd" --noverify -R none To list the new filesystems you can use the lsfs command as shown in Example 12-6.
Example 12-6 List all filesystems
[sonas02.virtual.com]$ lsfs Cluster Devicename Mountpoint Type Remote device Quota Def. quota Blocksize Locking type Data replicas Metadata replicas Replication policy Dmapi Block allocation type Version Last update sonas02.virtual.com gpfs0 /ibm/gpfs0 local local user;group;fileset 64K nfs4 nfs4 1 whenpossible F cluster 11.05 4/22/10 1:34 AM sonas02.virtual.com redbook /ibm/redbook local local user;group;fileset 256K nfs4 nfs4 1 whenpossible T scatter 11.05 4/22/10 1:34 AM sonas02.virtual.com redbook2 /ibm/redbook2 local local user;group;fileset 256K nfs4 nfs4 1 whenpossible T scatter 11.05 4/22/10 1:34 AM ACL type Inodes 33.536K 1 33.792K 1 33.792K 1
lsfs command: The lsfs command returns a subset of the information available in the SONAS GUI. Information not available in the lsfs command includes if and where mounted, and space utilization. To get this information from the command line, you need to run GPFS commands as root.
465
To make the filesystem available you mount it on all interface nodes using the mountfs command as shown in Example 12-7.
Example 12-7 Mount filesystem redbook2
[sonas02.virtual.com]$ mountfs redbook2 EFSSG0038I The filesystem redbook2 has been successfully mounted. The file system can also be unmounted as shown in Example 12-8.
Example 12-8 Unmount filesystem redbook2
[sonas02.virtual.com]$ unmountfs redbook2 EFSSG0039I The filesystem redbook2 has been successfully unmounted.
466
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
To create a new export for the redbook filesystem click the Add button and you will see the first screen of the export configuration wizard as shown in Figure 12-31. Select an export name and directory path and select the protocols you want to configure and click the Next> button.
You are presented with a NFS configuration screen shown in Figure 12-32. Add a client called * that represents all hostnames or IP addresses used by the clients. Unselect the read only and root squash attributes and click the Add Client button. When all users have been added click the Next button.
467
You now re presented with the CIFS configuration screen shown in Figure 12-33. Accept the defaults and click the Next button.
468
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
On the last screen click the Finish button to finalize the configuration. Close the task progress window that will appear and you will see the exports list screen shown in Figure 12-34.
Tip: To display additional information about a given export, click the export name in the list. The name will be highlighted and the detailed export information for the selected export will be shown below.
To list the newly created export use the lsexport command as shown Example 12-10.
Example 12-10 List all defined exports [sonas02.virtual.com]$ lsexport -v Name Path Protocol my_redbook /ibm/redbook/export1 NFS my_redbook /ibm/redbook/export1 CIFS my_redbook2 /ibm/redbook2/export1 CIFS Active true true true Timestamp Options 4/22/10 3:05 AM *=(rw,no_root_squash,fsid=1490980542) 4/22/10 3:05 AM browseable 4/22/10 3:05 AM browseable
Tip: The SONAS CLI does not show all export attributes, for example, the owner value is not shown. To determine the owner, use the GUI or the root account.
469
Open the My Computer and verify that you can see the mapped network drive called my_redbook as shown in Figure 12-36.
470
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Verify that the share is mounted again using the net use command as shown in Example 12-12.
Example 12-12 Listing an export using CLI C:\Documents and Settings\administrator.ADS>net use New connections will not be remembered. Status Local Remote Network ------------------------------------------------------------------------------OK Z: \\sonas02.virtual.com\my_redbook Microsoft Windows Network The command completed successfully.
471
To verify what is exported for your client and can be mounted you can use the smbclient -L command as shown in Example 12-14.
Example 12-14 Listing available exports [root@tsm001st010 ~]# smbclient -L sonas02.virtual.com -U "virtual\administrator" Enter virtual\administrator's password: Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20] Sharename Type Comment -----------------IPC$ IPC IPC Service ("IBM SONAS Cluster") my_redbook Disk my_redbook2 Disk Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20] Server --------Workgroup --------Comment ------Master -------
To create a snapshot select the name of an active, that is mounted, file system from the list. We select the filesystem called redbook and then push the Create a new... button. Accept the default snapshot name in the panel shown in Figure 12-38 and click the OK button. By accepting the default snapshot name the snapshots will be visible in the Windows previous versions tab for Windows client systems.
472
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure 12-39 on page 473 shows the current status and list of snapshots for a specific filesystem, redbook in our case.
To list all available snapshots for the redbook filesystem you use the lssnapshot command as shown in Example 12-16.
Example 12-16 List all snapshots with the CLI
[sonas02.virtual.com]$ lssnapshot -d redbook Cluster ID Device name Path 720576040429430977 redbook @GMT-2010.04.22-03.14.07 720576040429430977 redbook @GMT-2010.04.22-03.06.14 720576040429430977 redbook @GMT-2010.04.22-02.55.37 Status Valid Valid Valid Creation Used (metadata) Used (data) ID 22.04.2010 05:14:09.000 256 0 3 22.04.2010 05:10:41.000 256 0 2 22.04.2010 05:05:56.000 256 0 1 Timestamp 20100422051411 20100422051411 20100422051411
473
Tip: To access and view snapshots in a NFS share you must export the root directory for the filesystem as snapshots are stored in a hidden directory called .snapshots in the root directory. To view snapshots from a Linux client, connect to the Linux client, mount the file system from a root export and list the directories.
Example 12-17 Mount the filesystem and list the snapshots
[root@tsm001st010 sonas02]# mount -t nfs 10.0.1.121:/ibm/redbook /sonas02/my_redbook [root@tsm001st010 sonas02]# df Filesystem 1K-blocks /dev/sda1 14877060 tmpfs 540324 10.0.1.121:/ibm/redbook 1048576 [root@tsm001st010 total 129 dr-xr-xr-x 5 root drwxr-xr-x 4 root drwxr-xr-x 4 root drwxr-xr-x 4 root drwxr-xr-x 4 root
Used Available Use% Mounted on 3479016 10630140 25% / 0 540324 0% /dev/shm 155904 892672 15% /sonas02/my_redbook
export1]# ls -la /sonas02/my_redbook/.snapshots/ root root root root root 8192 32768 32768 32768 32768 Apr Apr Apr Apr Apr 22 22 22 22 22 05:14 02:32 02:32 02:32 02:32 . .. @GMT-2010.04.22-02.55.37 @GMT-2010.04.22-03.06.14 @GMT-2010.04.22-03.14.07
474
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Platform
Locked?
------No No No No
475
We now associate the three Tivoli Storage Manager agent nodes to the redhum target node as shown in Example 12-19.
Example 12-19 Grant Tivoli Storage Manager proxy node tsm: SLTTSM2>grant proxy target=redhum agent=redhum1,redhum2,redhum3 ANR0140I GRANT PROXYNODE: success. Node REDHUM1 is granted proxy authority to node REDHUM. ANR0140I GRANT PROXYNODE: success. Node REDHUM2 is granted proxy authority to node REDHUM. ANR0140I GRANT PROXYNODE: success. Node REDHUM3 is granted proxy authority to node REDHUM.
Now connect to the SONAS CLI and define the Tivoli Storage Manager server configuration information to the Tivoli Storage Manager client by using the cfgtsmnode command as shown in Example 12-20.
Example 12-20 Configure Tivoli Storage Manager server to SONAS [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum1 redhum int001st001 redhum1 EFSSG0150I The tsm node was configured successfully. [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum2 redhum int002st001 redhum2 EFSSG0150I The tsm node was configured successfully. [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum3 redhum int003st001 redhum3 EFSSG0150I The tsm node was configured successfully.
You can list the Tivoli Storage Manager server configuration with the lstsmnode command as shown in Example 12-21.
Example 12-21 List Tivoli Storage Manager client configuration [Humboldt.storage.tucson.ibm.com]$ lstsmnode Node name Virtual node name TSM server alias int001st001 redhum slttsm2 int002st001 redhum slttsm2 int003st001 redhum slttsm2 int004st001 server_a int005st001 server_a int006st001 server_a TSM server name 9.11.136.30 9.11.136.30 9.11.136.30 node.domain.company.COM node.domain.company.COM node.domain.company.COM TSM node name redhum1 redhum2 redhum3
We are now ready to perform Tivoli Storage Manager backup and restore operations using the cfgbackupfs command as shown in Example 12-22. After configuring the filesystem backup we list configured filesystem backups with the lsbackupfs command.
Example 12-22 Configure and list filesystem backup to Tivoli Storage Manager
[Humboldt.storage.tucson.ibm.com]$ cfgbackupfs tms0 slttsm2 int002st001,int003st001 EFSSG0143I TSM server-file system association successfully added EFSSG0019I The task StartBackupTSM has been successfully created. [Humboldt.storage.tucson.ibm.com]$ lsbackupfs -validate File system TSM server List of nodes Status Start time End time Message Validation Last update tms0 slttsm2 int002st001,int003st001 NOT_STARTED N/A N/A Node is OK.,Node is OK. 4/23/10 4:51 PM
476
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
To start a backup you use the startbackup command and specify a filesystem as shown in Example 12-23. You can then list the backup status with the lsbackupfs command and verify the status.
Example 12-23 Start a TSM backup [Humboldt.storage.tucson.ibm.com]$ startbackup tms0 EFSSG0300I The filesystem tms0 backup started. [Humboldt.storage.tucson.ibm.com]$ lsbackupfs File system TSM server List of nodes Status Start time End time Message Last update tms0 slttsm2 int002st001,int003st001 BACKUP_RUNNING 4/23/10 4:55 PM N/A log:/var/log/cnlog/cnbackup/cnbackup_tms0_20100423165524.log, on host: int002st001 4/23/10 4:55 PM
477
478
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
13
Chapter 13.
479
You can proceed to restart the management service with the startmgtsrv command as shown in Figure 13-2: [SONAS]$ startmgtsrv EFSSG0007I Start of management service initiated by cliuser1
Figure 13-2 Starting the management service
480
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
CTDB can be unhealthy for many reasons. In case it is monitoring the services and GPFS, it will go to unhealthy state if any of these services is down or have some issues. In case, CTDB is unhealthy, check for the logs. The Management GUI system logs will give certain idea of what is wrong. You can also collect latest logs from all the nodes by running the command: #cndump This command needs root access. It collects all the logs from the nodes and creates a compressed zip file. It will take some time to complete and when done, will also show the path where it is stored. When you uncompress the file, you will see a directory for each node, like Management Node, each Interface Node and each Storage Node. Inside each you will find the directories with log information and more from each node.
481
482
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In case, you see that any filesystem is unmounted, and there are exports created on the exports, mount the filesystem so that CTDB becomes healthy. You can check for the exports information by running the command, #lsexports For additional information about these CLI commands, see Chapter 10, SONAS administration on page 313. If you have some exports created for testing and do not want to have the filesystem mounted any longer, you might not want to have CTDB monitor the GPFS filesystem. However, remember that anytime a filesystem is unmounted, whether needed or not, CTDB will not notify you by changing its status. Change this value at your own risk.
#mmlsfs gpfs1 -z flag value description ---- ---------------- -----------------------------------------------------z yes Is DMAPI enabled? In above example, consider filesystem gpfs1, is not mounting. Here, the value of -z option is set to yes. In this case, the GFPS filesystem is waiting for a DMAPI application and will only mount when it becomes available. If you do not have any DMAPI applications running and do not want GPFS to wait on any DMAPI application, you need to set this -z option to no. This value is set to yes by default and DMAPI is enabled. Remember to create a filesystem with --nodmapi option when you create filesystem using the CLI command mkfs if you do not want to enable it. If already set to yes you can use command mmchfs and change the value for -z option.
483
484
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Appendix A.
485
CTDB
In this section we discuss the various features of Clustered Trivial Data Base (CTDB).
Introduction to Samba
Samba is software that can be run on a platform other than Microsoft Windows, for example, UNIX, Linux, IBM System 390, OpenVMS, and other operating systems. Samba uses the TCP/IP protocol that is installed on the host server. When correctly configured, it allows that host to interact with a Microsoft Windows client or server as if it is a Windows file and print server. Thus, on a single Linux NAS server, Samba provides the mechanism to access data from a Windows client. Samba stores its information in small databases called TDB. Each TDB has metadata information of the POSIX to CIFS semantic and vice versa. The local TDB file also contains the messaging and locking details for files and information about open files that are accessed by many clients. All this is for a single NAS server. For a clustered file system like SONAS, which provides a clustered NAS environment to allow many clients to access data from multiple nodes in the cluster, this becomes a bit tricky. Samba process running on the local nodes does not know about the locking information held by the samba process running locally on the other nodes. Taking an example, say a file, file1 has been accessed through two different nodes by two clients. These two nodes do not know about the locks and hence do now know that each of them have accessed the same file. In this case, if both the nodes write to the file, the file content might be corrupted or last save might be the latest file stored. There was no way to co-ordinate the samba processes (smbd) that run on different nodes. To have consistency in the data access and writes, there must be a way in which the samba processes (smbd) running on each node can communicate with each other and share the information to avoid shared data corruption.
prototypes include extensive modifications to Samba internal data representations to make the information stored in various TDBs node-independent. CTDB also provides failover mechanism to ensure data is not lost if any node goes down while serving data. It does this with the use of Virtual IP addresses or Public IP addresses. More on this is explained in detail later. Figure A-1 shows the CTDB implementation.
As you can see, there is a virtual server which encloses all the nodes as though all the samba processes on each node talk to each other and update each other about the locking and other information held by the samba process.
CTDB architecture
The design is particularly aimed at the temporary databases in Samba, which are the databases that get wiped and re-created each time Samba is started. The most important of those databases are the 'brlock.tdb' for byte range locking database and the 'locking.tdb' for open file database. There are a number of other databases that fall into this class, such as 'connections.tdb' and 'sessionid.tdb', but they are of less concern as they are accessed much less frequently. Samba also uses a number of persistent databases, such as the password database, which must be handled in a different manner from the temporary databases.
487
Here is a list of databases that CTDB uses: account_policy.tdb: NT account policy settings such as pw expiration, etc... brlock.tdb: Byte range locks. connections.tdb: Share connections. Used to enforce max connections, etc. gencache.tdb: Generic caching database. group_mapping.tdb: Stores group mapping information; not used when using LDAP back-end locking.tdb: Stores share mode and oplock information. registry.tdb: Windows registry skeleton (connect via regedit.exe). sessionid.tdb: Session information to support utmp = yes capabilities. We mentioned above that the Clustered TDB is a shared TDB and all nodes access the same TDB files. This means, all these databases are shared by all nodes such that each one of them can access and update the records. This means that these databases must be stored in the shared file system, in this case, GPFS. Hence, each time, a record is to be updated, the smbd daemon on every node will update on the shared database and write to the shared disks. Since the shared disks can be over network, it could make it very slow and be a major bottleneck. To make it simpler, each node of the cluster has CTDB daemon ctdbd running and will have a local, old-style tdb stored in a fast local filesystem. The daemons negotiate only the metadata for the TDBs over the network. The actual data read and writes always happens on the local copy. Ideally this filesystem will be in-memory, such as on a small ramdisk, but a fast local disk will also suffice if that is more administratively convenient. This makes the read write approach really fast. The contents of this database on each node will be a subset of the records in the CTDB (clustered tdb). However, for Persistent database, when a node wants to write to a persistent CTDB, it locks the whole database on the network with a transaction, performs its read and write, commits and finally distributes the changes to all the nodes and write locally too. This way the persistent database is consistent. CTDB records typically looks as shown in Example A-1.
Example A-1 CTDB records
typedef struct { char *dptr; size_t dsize; } TDB_DATA; TDB_DATA key, data; All CTDB operations are finally converted into operations based on these TDB records. Each of these records are augmented with an additional header. The header contains the information in Example A-2 and Figure A-2.
Example A-2 TDB header records
(record sequence number) (VNN of data master) (VNN of last accessor) (last accessor count)
488
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
489
LACCESSOR
The LACCESSOR field holds the VNN of the last node to request a copy of the record. Its mainly used to determine if the current data master should hand over ownership of this record to another node.
LACCOUNT
LACOUNT field holds a count of the number of consecutive requests by that node.
RECOVERY MASTER
When a node fails, CTDB performs a process called recovery to re-establish a proper state. The recovery is carried through by the node that holds the role of the RECOVERY MASTER. It collects the most recent copy of all records from the other nodes. Only one node can become the RECOVERY MASTER and this is determined by an election process. This process involves a lock file, called the recovery lock or reclock that is placed in the MASTER file system of the clustered file system. At the end of the election, the newly nominated recovery master holds an lock on the recovery lock file. The RECOVERY MASTER node is also responsible of monitoring the consistency of the cluster and to perform the actual recovery process when required. You can check for the reclock path by using the command shown in Example 13-2. In this example, /ibm/gpfs0 is the MASTER filesystem.
Example 13-2 Checking for the reclock path
The dispatcher daemon will listen for CTDB protocol requests from other nodes, and from the local smbd via a unix domain datagram socket.The dispatcher daemon follows an event driven approach, executing operations asynchronously. This Figure A-3 explains the working mentioned above when DMASTER is as mentioned as LMASTER.
Figure A-3 Fetching sequence for CTDB and contacting DMASTER as directed by LMASTER
Figure A-4 explains when the DMASTER changes and there is another request made to get the VNN of new DMASTER.
491
Figure A-5 also shows working of the dispatcher daemon. When the node wants to write or read data, it gets the VNN of the current DMASTER for the record. It them contacts the dispatcher on the node corresponding to the VNN that is listening to CTDB requests from other nodes, gets the updated copy for the record on its node and updates it locally.
At the time of node failure, the LMASTER gives the VNN of the record that was last updated. If the node that has the latest information is the node that fails, it is OK to loose this information as it is only connection information for files. For persistent database, the information is always available on all nodes and is the up to date copy.
492
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Active-Active Systems: In these systems, all the nodes in the cluster are active. When a node fails, the other nodes take over. The service requests of the failed nodes are transferred to the other and they immediately start servicing requests. The application might see a slight pause in data transfer but as long as the application can handle failed TCP connections and start again, the data transfer does not fail and happens uninterruptedly. Figure A-7 shows the Active-Active failover systems where it can be seen that when Node1 fails, all the requests are passed on to Node2 which is always active. It is done transparent to the users and hence data transfer does not stop as long as the applications can failover a TCP connection.
With the help CTDB, SONAS provides with this feature of Node Failure.
CTDB features
CTDB uniquely identifies each of the nodes in the cluster by the Virtual Node number, VNN. It maps the physical addresses with the VNN. Also, CTDB works with two IP networks. The internal network on the InfiniBand for the CTDB communication between the nodes. This is same as that for the clusters internal network for communication between the nodes. The second type is the public addresses through which the clients access the nodes for data. You can check for the public IPs set for the nodes by running the command in Example A-3 on each node.
493
# ctdb ip Example output: Number of addresses:4 12.1.1.1 0 12.1.1.2 1 12.1.1.3 2 12.1.1.4 3 The configuration of the CTDB is stored in /etc/sysconfig/ctdb on all nodes. The node details which carries the list of all the IP addresses of the nodes in the CTDB cluster is stored in the /etc/ctdb/nodes file. These are the private IP addresses of the nodes. The public addresses of the clustered system is stored in /etc/ctdb/public_addresses file. These addresses are not physically attached to a specific node and is managed by CTDB. It is attached/detached to a physical node at runtime. Each node needs to specify the public_addresses that it services in the /etc/ctdb/public_addresses file. For example, if a cluster has six nodes and six IP addresses, each node should specify all of the six IP addresses in order to be able to service any one at any point of time in case of a failure. If certain IP address is not mentioned, that IP will not be serviced by this node. Hence, its a good practice to specify all the Public IP addresses on each node, so each node can failover for the IP if required. Even though a node has all the public IP specified, CTDB assigns a unique IP address for it to service. This means, for example, if we have a six node cluster and six public IP addresses, then each node can hold all the six IP addresses. However, CTDB will assign just unique IP addresses to each node such that at any point in time, a single IP address is serviced only by a single node. As another example, consider a six node cluster with twelve IP addresses. In this case, each node can take any of the twelve IP addresses, but CTDB will assign two IP addresses that it will service, which is unique while each node. CTDB uses round robin to assign IP addresses to the nodes. CTDB makes a table of all the VNN number and assigns or maps each VNN with an IP address in a round robin way. When a node fails, the CTDB remakes this table of IP addresses mapping for each node. It considers all the nodes whether or not it is down. It assigns each node again with IP addresses in a round robin way. Once this is done, the CTDB then picks the IP addresses assigned to the node that has failed. It now counts the number of IP address each node is servicing and redistributes the IP addresses to the nodes that have the least IP addresses. In case, all are equal, it uses round robin mechanism.
494
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
IP failover mechanism
When a node goes leaves the cluster, CTDB moves its public IP addresses to other nodes that have the addresses listed in their public addresses pool. But now the clients connected to that node have to reconnect to the cluster. In order to reduce the nasty delays that come with these IP switches to a minimum, CTDB makes use of a clever trick called tickle-ACK. How it works is, the client does not know that the IP he/she is connected to has moved, while the new CTDB node only knows the TCP connection has become invalid, but does not know the TCP sequence number. So the new CTDB node sends an invalid TCP packet with sequence and ACK number set to zero. This tickles the client to send a valid ACK packet back to the new node. Now CTDB can validly close the connection by sending a RST packet and force the client to reestablish the connection.
495
This node does not participate in the CTDB cluster but can still be communicated with. that is, ctdb commands can be sent to it. You can check the status using the command: # ctdb status
CTDB tunables
CTDB has a lot of tunables that can be modified. However, this is rarely necessary. You can check the variables by running the command shown in Example A-4.
Example A-4 Checking CTDB databases
# ctdb listvars Example output: MaxRedirectCount = 3 SeqnumInterval = 1000 ControlTimeout = 60 TraverseTimeout = 20 KeepaliveInterval = 5 KeepaliveLimit = 5 MaxLACount = 7 RecoverTimeout = 20 RecoverInterval = 1 ElectionTimeout = 3 TakeoverTimeout = 5 MonitorInterval = 15 TickleUpdateInterval = 20 EventScriptTimeout = 30 EventScriptBanCount = 10 EventScriptUnhealthyOnTimeout = 0 RecoveryGracePeriod = 120 RecoveryBanPeriod = 300 DatabaseHashSize = 10000 DatabaseMaxDead = 5 RerecoveryTimeout = 10 EnableBans = 1 DeterministicIPs = 1 DisableWhenUnhealthy = 0 ReclockPingPeriod = 60
CTDB databases
You can lists all clustered TDB databases that the CTDB daemon has attached to. Some databases are flagged as PERSISTENT, this means that the database stores data persistently and the data will remain across reboots. One example of such a database is secrets.tdb where information about how the cluster was joined to the domain is stored.
496
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
You can check the databases available by running the command in Example A-5.
Example A-5 Checking CTDB databases
# ctdb getdbmap Example output: Number of databases:10 dbid:0x435d3410 name:notify.tdb path:/var/ctdb/notify.tdb.0 dbid:0x42fe72c5 name:locking.tdb path:/var/ctdb/locking.tdb.0 dbid:0x1421fb78 name:brlock.tdb path:/var/ctdb/brlock.tdb.0 dbid:0x17055d90 name:connections.tdb path:/var/ctdb/connections.tdb.0 dbid:0xc0bdde6a name:sessionid.tdb path:/var/ctdb/sessionid.tdb.0 dbid:0x122224da name:test.tdb path:/var/ctdb/test.tdb.0 dbid:0x2672a57f name:idmap2.tdb path:/var/ctdb/persistent/idmap2.tdb.0 PERSISTENT dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT dbid:0xe98e08b6 name:group_mapping.tdb path:/var/ctdb/persistent/group_mapping.tdb.0 PERSISTENT dbid:0x7bbbd26c name:passdb.tdb path:/var/ctdb/persistent/passdb.tdb.0 PERSISTENT You can also check the details of a database by running the command in Example A-6.
Example A-6 CTDB database status
# ctdb getdbstatus <dbname> Example: ctdb getdbstatus test.tdb.0 Example output: dbid: 0x122224da name: test.tdb path: /var/ctdb/test.tdb.0 PERSISTENT: no HEALTH: OK You can get more information about CTDB by running manpage for CTDB as follows: #man ctdb
497
write execute
If a permission is not set, the access it would allow is denied. Permissions are not inherited from the upper level directory.
DOS attributes
There are four DOS attributes which can be assigned to files and folders. Read Only Archive File cannot be written to File has been touched since the last backup
498
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
File is used by the operating system File is relatively invisible to the user File is gone
NTFS security
There are 13 basic permissions which are rolled up into six permission groups. These apply only to the NTFS file system, not FAT nor FAT32. The six permission groups are: Full Control Modify Read Write Read and execute List folder contents Allow all 13 basic permissions Allow all permissions except Delete subfolders and files, Change permission, and Take ownership Allow List folder/Read data, Read attributes, Read extended attributes and Read permissions Allow Create files/Append data, Write attributes, Write extended attributes, Delete subfolders and files, and Read Permissions Allow all that the Read permission group allows plus Traverse Folder/Execute File This is for folders only, not files. It is the same as Read and Execute for files
The 13 basic permissions are the following; some of them differ depending on whether they apply to folders or files: Traverse folders (for folders only)/Execute file (for files only) List folder/Read data Read attributes Read extended attributes Create files/Append data Write attributes Write extended attributes Delete subfolders and files Delete Read permissions Change permissions Take ownership To view the permission groups, right-click any file or folder in Windows explorer, choose the Properties menu item and then choose the Security tab. More information about Windows file and folder permissions is available on the Microsoft technet site at: http://technet.microsoft.com/en-us/library/bb727008.aspx
GPFS overview
Smart Storage Management with IBM General Parallel File System. Enterprise file data is often touched by multiple processes, applications and users throughout the lifecycle of the data. Managing the data workflow is often the highest cost of part storage processing and management, in regards to processing and people time. In the past, companies have addressed this challenge using different approaches including clustered servers and network attached storage. Clustered servers are typically limited in scalability and often require redundant copies of data. Traditional network attached storage solutions are restricted in performance, security and scalability.
499
To effectively address these issues, you need to look at a new more effective data management approach. Figure A-8 describes a typical infrastructure with unstructured data. This is a data storage approach, but not data management.
500
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
In Figure A-9, GPFS provides a real data management solution with the following capabilities: File management Performance Enhanced availability Better automation Scale-out growth Indeed because GPFS allows you to bring together islands of information, redundant data and under-utilized segment of storage it provides a strong File Management solution. GPFS is also a solution which is able to scale and integrate emerging technologies, providing you both performance and security regarding your storage investment. By design GPFS is an enhanced availability solution, ensuring data consistency through varied mechanism. These mechanisms can even be easily automated thanks to powerful ILM tools integrated inside GPFS.
To fulfill above capabilities, GPFS provides you a single global namespace with a centralized management. This allows better Storage utilization and performances for varied workloads as described in Figure A-10. Indeed both database application, archive or application workload can use the single global namespace provided by GPFS. GPFS will automatically handle all your storage subsystems ensuring a homogenous storage utilization.
501
GPFS architecture
Figure A-11 describes the GPFS architecture. Basically a typical GPFS utilization is to run your daily business application on NSD Clients (or GPFS clients). These clients will access the same global name space through a LAN. Data accessing from clients will then been transferred to NSD servers (or GPFS servers) though the LAN. NSD clients and NSD servers are gather in a GPFS cluster. Latest GPFS version (3.3) supports AIX, Linux and Windows as NSD clients or NSD servers. These Operating system can run on many IBM and even non IBM hardware. Regarding the LAN, GPFS can use GigE Network as well as 10GigE or InfiniBand networks. Then servers will commit IO operations on storage subsystem where are physically located LUNs. From a GPFS point of view a LUN is actually a NSD. GPFS supports varied Storage Subsystems IBM and non even non IBM. Basically as the IBM SAN Volume Controller solution is also supported by GPFS, several Storage Subsystem solutions are de facto compatible with GPFS. To find more details regarding the software or hardware supported version refers to the following link: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cl uster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html
502
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Before describing in more details GPFS features and mechanism, it is important to specify here that like any File System, GPFS handle data and metadata, but even if there are two kinds of nodes (clients and server) inside a single GPFS cluster, there are no nodes dedicated to metadata management. Like all mechanism inside GPFS every nodes inside the cluster can be used to execute this mechanism. Basically some GPFS mechanisms are run in parallel from all nodes inside the cluster like the File System scanning for instance (see Figure A-11).
503
Another option would be to create some Storage Pools. Still assuming you have several storage subsystem in your SAN, and then accessing by your NSD servers, you can decide to create multiple Storage Pools. One SATA storage pool, one SAS Storage pool or a Tape storage pool, the same way you can also decide to create an IBM DS8300 storage pool and an IBM DS5300 storage pool, or even an IBM storage Pool and X Storage Pool. Here again you can decide it during NSD creation, or change it later. Then you can use these storage pool for different workload. For instance use SATA pool for multimedia files, use SAS pool for financial workload, and tape pool for archive. Or you can use SAS pool for daily business, then move file to SATA pool at the end of the week, and later to Tape Pool. Whereas failure groups were automatically handled by GPFS, Storage Pool mechanism needs some rules to be automated by GPFS. Indeed with Information Life cycle Management tool provided by GPFS, you can create some rules which will then be part of a policy. Basics rules are placement rules: place multimedia files on SATA pool, and financial workload files on SAS pool, or migration rules: move data from SAS pool to SATA pool at the end of the week, and move data from SATA to Tape pool at the end of the month. These rules can be gathered inside GPFS policies. This policies can then be automated and scheduled. You can also use more complex rules and policies to run a command at any given time on the entire GPFS File System on a subset of files like: delete all files older than two years, or move all files from the projectA directory to tape. In order to compare with classical UNIX commands, migration rules would be an mv command, whereas the last one would be more like a find command combined with an exec command. The ILM tool can be used for each GPFS File System, but you can also create some GPFS FileSets inside a single GPFS File System for a finer granularity, and then apply policy or quota rules to this File Set which are basically directory or GPFS sub trees.
GPFS performance
GPFS is not only a centralized management solution providing a global namespace, indeed GPFS has been design to scale according to your need in term of capacity but also to provide an aggregate bandwidth if set up appropriately. As explained above a typical use of GPFS is to run daily business application on NSD clients which are accessing data through the network. Note that depending your requirement, you might have only NSD servers and no NSD clients. With such a configuration you will run your application directly from NSD servers which also have access to the global namespace. Assuming you have NSD clients running your application on the GPFS File System, this file system has been created with a key parameters: the GPFS BlockSize. In few words the equivalent of the GPFS BlockSize for a NSD servers, is the chunk size or segment size for a RAID controller. This block size can be set from 16KB to 4MB. Assuming a GPFS cluster with some NSD clients, four NSD servers and one storage subsystem with four RAID arrays. The GPFS File System has been created with a 1MB Block Size. From the storage subsystem point of view, all four arrays have been configured in a RAID5 configuration with a 256KB segment size. Your application is running on NSD clients and generate a 4MB IO. These 4MB packets will be sent through the network in 1MB piece to NSD servers. NSD servers will then forward the 1MB packets to Storage Subsystem controller which will split these into 256KB pieces (segment size). This leads to a single 4MB IO written in a single IO operation on disk level as described in Figure A-12 on page 505, Figure A-13 on page 505, Figure A-14 on page 506 and Figure A-15 on page 506. In Figure A-12, Figure A-13, Figure A-14, and Figure A-15, each NSD is a RAID 5 array built with four data disks and an extra parity one. Performing any IO operation on an NSD is equivalent to performing I/O operations on physical disks inside the RAID.
504
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
505
Figure A-15 Step 4 GPFS block size chop into segment size piece by the controller
506
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
The foregoing figures describe the GPFS function with few NSD and a single Storage Subsystem, but this is exactly the same for a larger configuration. All NSD clients will run application on the GPFS File System in parallel. As GPFS has been designed to scale with your storage infrastructure, if you add more Storage Subsystems and NSD servers, you will increase your overall bandwidth
507
Features
Among these interesting features, you have the following possibilities: The GUI which is included in GPFS packages if you are more familiar with GUI than CLI. The Cluster NFS (CNFS) feature, which allows you to use some nodes inside the GPFS cluster as NFS clients, and then access the GPFS File System from other nodes not in the GPFS clutter using the NFS protocol. You can even load balance to access through many NFS servers with appropriate DNS configuration. Similarly, GPFS also supports NFSv4 ACLs and Samba and then allows Windows and UNIX users to share data. The HSM compatibility; indeed, you can use GPFS in combination with HSM for better tiering usage inside your Storage Infrastructure. The cross cluster feature, which allows you in case of a multi site data center, to grant access to NSD clients from the remote GPFS cluster site to the local GPFS File System (and the opposite direction).
Documentation
For any more detailed documentation on GPFS refer to the IBM website: http://www-03.ibm.com/systems/software/gpfs/index.html or the online GPFS documentation: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm .cluster.gpfs.doc/gpfsbooks.html or GPFS wiki: http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+Fil e+System+%28GPFS%29
508
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
509
We review and discuss the main components and functions of a Tivoli Storage Manager environment, emphasizing the components that are most relevant to an ILM-optimized environment. These components are: Tivoli Storage Manager server Administrative interfaces The server database Storage media management Data management policies Security concepts Backup Archive client interface Client application programming interface (API) Automation The client to server data path Tip: For a detailed overview of Tivoli Storage Manager and its complementary products, see the IBM Tivoli software information center at the following location: http://publib.boulder.ibm.com/infocenter/tivihelp 510
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Administrative interfaces
For the central administration of one or more Tivoli Storage Manager server instances, as well as the whole data management environment, Tivoli Storage Manager provides command line or Java-based graphical administrative interfaces, otherwise known as administration clients. The administrative interface enables administrators to control and monitor server activities, define management policies for clients, and set up schedules to provide services to clients at regular intervals.
Server database
The Tivoli Storage Manager server database is based on a standard DB2 database that is integrated into and installed with the Tivoli Storage Manager server itself. The Tivoli Storage Manager server DB2 database stores all information relative to the Tivoli Storage Manager environment, such as the client nodes that access the server, storage devices, and policies. The Tivoli Storage Manager database contains one entry for each object stored in the Tivoli Storage Manager server, and the entry contains information, such as: Name of the object Tivoli Storage Manager client that sent the object Policy information or Tivoli Storage Manager management class associated with the object Location where the object is stored in the storage hierarchy The Tivoli Storage Manager database retains information called metadata, which means data that describes data. The flexibility of the Tivoli Storage Manager database enables you to define storage management policies around business needs for individual clients or groups of clients. You can assign client data attributes, such as the storage destination, number of versions, and retention period at the individual file level and store them in the database. The Tivoli Storage Manager database also ensures reliable storage management processes. To maintain data integrity, the database uses a recovery log to roll back any changes made if a storage transaction is interrupted before it completes. This is known as a two-phase commit.
511
A Tivoli Storage Manager server can write data to more than 400 types of devices, including hard disk drives, disk arrays and subsystems, standalone tape drives, tape libraries, and other forms of random and sequential-access storage. The server uses media grouped into storage pools. You can connect the storage devices directly to the server through SCSI, through directly attached Fibre Channel, or over a Storage Area Network (SAN). Tivoli Storage Manager provides sophisticated media management capabilities that enable IT managers to perform the following tasks: Track multiple versions of files (including the most recent version) Respond to online file queries and recovery requests Move files automatically to the most cost-effective storage media Expire backup files that are no longer necessary Recycle partially filled volumes Tivoli Storage Manager provides these capabilities for all backup volumes, including on-site volumes inside tape libraries, volumes that have been checked out of tape libraries, and on-site and off-site copies of the backups. Tivoli Storage Manager provides a powerful media management facility to create multiple copies of all client data stored on the Tivoli Storage Manager server. Enterprises can use this facility to back up primary client data to two copy pools: One stored in an off-site location, and the other kept on-site for possible recovery from media failures. If a file in a primary pool is damaged or resides on a damaged volume, Tivoli Storage Manager automatically accesses the file from an on-site copy if it is available or indicates which volume needs to be returned from an off-site copy. Tivoli Storage Manager also provides a unique capability for reclaiming expired space on off-site volumes without requiring the off-site volumes to be brought back on-site. Tivoli Storage Manager tracks the utilization of off-site volumes just as it does for on-site volumes.
512
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
When the free space of off-site volumes reaches a determined reclamation threshold, Tivoli Storage Manager uses the on-site volumes to consolidate the valid files onto new volumes, then directs the new volumes to be taken off-site. When the new tapes arrive off-site, Tivoli Storage Manager requests the return of the original off-site volumes, which can be reused as scratch volumes.
Security concepts
Because the storage repository of Tivoli Storage Manager is the place where an enterprise stores and manages all of its data, security is a vital aspect for Tivoli Storage Manager. To ensure that only the owning client or an authorized party can access the data, Tivoli Storage Manager implements, for authentication purposes, a mutual suspicion algorithm, which is similar to the methods used by Kerberos authentication. Whenever a client (backup/archive or administrative) wants to communicate with the server, an authentication has to take place. This authentication contains both-sides verification, which means that the client has to authenticate itself to the server, and the server has to authenticate itself to the client. To do this, all clients have a password, which is stored at the server side as well as at the client side. In the authentication dialog, these passwords are used to encrypt the communication. The passwords are not sent over the network, to prevent hackers from intercepting them. A communication session will be established only if both sides are able to decrypt the dialog. If the communication has ended, or if a time-out period has ended with no activity, the session will automatically terminate and a new authentication will be necessary. Tivoli Storage Manager offers encription of data sent by the client to the server. It offers both 128 bit AES and 56 bit DES encription.
513
The archive feature allows users to keep a copy of their data for long term storage and to retrieve the data if necessary. Examples of this are to meet legal requirements, to return to a previous working copy if the software development of a program is unsuccessful, or to archive files that are not currently necessary on a workstation. The latter features are the central procedures around which Tivoli Storage Manager is built. Backup and archive are supporting functions to be able to retrieve lost data later on. You can interact with the Tivoli Storage Manager server to run a backup/restore or archive/retrieve operation through three different interfaces: Graphical User Interface (GUI) Command Line Interface (CLI) Web Client Interface (Web Client) The command line interface has a richer set of functions than the GUI. The CLI has the benefit of being a character mode interface, and, therefore, is well suited for users who need to type the commands. You might also consider using it when you cannot access the GUI interface or when you want to automate a backup or archive by using a batch processing file.
Automation
Tivoli Storage Manager includes a central scheduler that runs on the Tivoli Storage Manager server and provides services for use by the server and clients. You can schedule administrative commands to tune server operations and to start functions that require significant server or system resources during times of low usage. You can also schedule client action, but that would be unusual for a data retention-enabled client. Each scheduled command (administrative or client) action is called an event. The server tracks and records each scheduled event and its completion status in the Tivoli Storage Manager server database.
514
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure A-18 shows the data flow or pipeline and potential bottlenecks in a Tivoli Storage Manager environment. It illustrates the route the data takes through the many components of the client-server storage environment. For each step in this route, we list causes of potential performance bottlenecks. Data is read by the backup/archive client from client disk or transferred in memory to the API client from a content manager application. The Tivoli Storage Manager client might compress the data before sending it to the Tivoli Storage Manager server in order to reduce network utilization. The client can choose whether or not to use the LAN or the SAN, also called LAN-free, for data transport. The SAN is optimized for bulk transfers of data and allows writing directly to the storage media, bypassing the Tivoli Storage Manager server and the network. LAN-free support requires an additional Tivoli Storage Manager license called Tivoli Storage Manager for SAN. Archiving data is normally a low volume operation, handling relatively small amounts of data to be retained for long periods of time. In this case, the LAN is more than adequate for data transport. The Tivoli Storage Manager server receives metadata, and data when using LAN transport, over the LAN network. Tivoli Storage Manager then updates its database. Many small files potentially can cause a high level of database activity. When the data is received over the LAN, it generally is stored in a disk storage pool for later migration to tape as an overflow location.
515
The maximum performance of data storage or retrieval operations depends on the slowest link in the chain, another way of illustrating it is that performance is constrained by the smallest pipe in the pipeline, as shown in Figure A-18 on page 515. In the figure, the LAN is the constraint on performance.
Backup client
LAN, WAN, or SAN Data object Device class - disk Primary storage pool - disk
Data object
Copy storage pool Migrate storage pool volumes - tape Copy pool
Each object is bound to an associated management policy. The policy defines how long to keep that object and where the object enters the storage hierarchy. The physical location of an object within the storage pool hierarchy has no effect on its retention policies. You can migrate or move an object to another storage pool within a Tivol Storage Manager storage hierarchy. This can be useful when freeing up storage space on higher performance devices, such as disk, or when migrating to new technology. You can and should also copy objects to copy storage pools. To store these data objects on storage devices and to implement storage management functions, Tivoli Storage Manager uses logical definitions to classify the
516
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
available physical storage resources. Most important is the logical entity called a storage pool, which describes a storage resource for a single type of media, such as disk volumes, which are files on a file system, or tape volumes, which are cartridges in a library.
Device classes
A storage pool is built up from one or more Tivoli Storage Manager storage pool volumes. For example, a disk storage pool can consist of several AIX raw logical volumes or multiple AIX files on a file system. Each AIX raw logical volume or AIX file corresponds to one Tivoli Storage Manager storage pool volume. A logical entity called a device class is used to describe how Tivoli Storage Manager can access those physical volumes to place the data objects on them. Each storage pool is bound to a single device class. The storage devices used with Tivoli Storage Manager can vary in their technology and total cost. To reflect this fact, you can imagine the storage as a pyramid (or triangle), with
517
high-performance storage in the top (typically disk), normal performance storage in the middle (typically optical disk or cheaper disk), and low-performance, but high-capacity, storage at the bottom (typically tape). Figure 4-4 illustrates this tiered storage environment that Tivoli Storage Manager uses: Disk storage devices are random access media, making them better candidates for storing frequently accessed data. Disk storage media with Tivoli Storage Manager can accept multiple parallel data write streams. Tape, however, is an economical high-capacity sequential access media, which you can can easily transport off-site for disaster recovery purposes. Access time is much slower for tape due to the amount of time necessary to load a tape into a tape drive and locate the data. However, for many applications, that access time is still acceptable. Tape: Today many people in the industry say that tape is dead and customers should use disk instead. However, the performance of high-end tape devices is often unmatched by disk storage subsystems. Current tape has a native performance in the range of, or over, 100 MB/sec. that with compression can easily pass 200 MB/sec. Also, consider the cost: The overall power consumption of tape is usually less than that of a disk. Disk storage is referred to as online storage, while tape storage has often been referred to as off-line and also near-line with regard to HSM in the past. With Tivoli Storage Manager HSM, tape volumes, located in a tape library, are accessed by the application that is retrieving data from them (near-line) transparently. Tapes no longer in the library are off-line, requiring manual intervention. The introduction of lower cost mass storage devices, such as Serial Advanced Technology Attachment (SATA) disk systems, offers an alternative to tape for near-line storage. Figure A-20 illustrates the use of a SATA disk as near-line storage.
Device types
Each device defined to Tivoli Storage Manager is associated with one device class. Each device class specifies a device type. A device type identifies a device as a member of a group of devices, devices that shares similar media characteristics. For example, the LTO device type applies toLTO tape drives. The device type also specifies management information, such as how the server gains access to the physical volumes, recording format, estimated capacity, and labeling prefixes. Device types include DISK, FILE, and a variety of removable media types for tape and optical devices. Note that a device class for a tape or optical drive must also specify a library. The library defines how Tivoli Storage Manager can mount a storage volume onto a storage device such as a tape drive.
518
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Tape devices
Tivoli Storage Manager supports a wide variety of enterprise class tape drives and libraries. Use tape devices for backing up your primary storage pools to copy storage pools and for backing up the database. Tape devices are well suited for this, because the media can be transported off-site for disaster recovery purposes.
Policy management
A data storage management environment consists of three basic types of resources: client system, policy, and data. The client systems contains the data to manage, for example, file systems with multiple files. The policies are the rules to specify how to manage the objects. For example, for archives they define how long to retain an object in Tivoli Storage Manager storage, in which storage pool to place an object; or, in the case of backup, how many versions to keep, where they should be stored, and what Tivoli Storage Manager does to the stored object once the data is no longer on the client file system. Client systems, or nodes, in Tivoli Storage Manager terminology, are grouped together with other nodes with common storage management requirements into a policy domain. The policy domain links the nodes to a policy set, a collection of storage management rules for different storage management activities. Client node: The term client node refers to the application sending data to the Tivoli Storage Manager server. A policy set consists of one or more management classes. A management class contains the rule descriptions called copy groups and links these to the data objects to manage. A copy group is the place where you define all the storage management parameters, such as the number of stored copies, retention period, and storage media. When the data is linked to particular rules, it is said to be bound to the management class that contains those rules. Another way to look at the components that make up a policy is to consider them in the hierarchical fashion in which they are defined; that is, consider the policy domain containing the policy set, the policy set containing the management classes, and the management classes containing the copy groups and the storage management parameters, as illustrated in Figure A-21.
519
Clients nodes
Policy domain Policy set #3 Policy set #2 Policy set #1 Management Class #1 Management Class #2 Management Class #3 Copy group Rules Data
Data
Data
We explain the relationship between the items in Figure A-21 in the following topics.
520
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Management class
The management class associates client files with archive copy groups with files. A management class is a Tivoli Storage Manager policy. Each individual object stored in Tivoli Storage Manager is associated with one and only one management class. A management class is a container for copy groups; it can contain either one backup or archive copy group, both a backup and an archive copy group, or no copy groups at all. Users can bind (that is, associate) their files to a management class through the include-exclude list, a set of statements or rules that associate files to a management class based on file filtering rules. Alternatively, a user can explicitly request an archive management class.
Policy set
The policy set specifies the management classes that are available to groups of users. Policy sets contain one or more management classes. You must identify one management class as the default management class. Only one policy set, the ACTIVE policy set, controls policies in a policy domain.
Policy domain
The concept of policy domains enables an administrator to group client nodes by the policies that govern their files and by the administrators who manage their policies. A policy domain contains one or more policy sets, but only one policy set (named ACTIVE) can be active at a time. The server uses only the ACTIVE policy set to manage files for client nodes assigned to a policy domain. You can use policy domains to: Group client nodes with similar file management requirements Provide different default policies for different groups of clients Direct files from different groups of clients to different storage hierarchies based on need Restrict the number of management classes to which clients have access Figure A-22 summarizes the relationships among the physical device environment, Tivoli Storage Manager storage and policy objects, and clients. The numbers in the following list correspond to the numbers in the figure.
521
Figure A-22 shows an outline of the policy structure. These are the steps to create a valid policy: 1. When clients are registered, they are associated with a policy domain. Within the policy domain are the policy set, management class, and copy groups. 2. When a client (application) backs up an object, the object is bound to a management class. A management class and the backup copy group within it specify where files are stored first (destination), and how they are managed. 3. Storage pools are the destinations for all stored data. A backup copy group specifies a destination storage pool for archived files. Storage pools are mapped to device classes, which represent devices. The storage pool contains volumes of the type indicated by the associated device class. Data stored in disk storage pools can be migrated to tape or optical disk storage pools and can be backed up to copy storage pools.
522
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Figure A-23 illustrates a Tivoli Storage Manager server hierarchy with three storage pools. Storage pools are managed by threshold; each pool has a high threshold and a low threshold. When the amount of data in the storage pool exceeds the high threshold, Tivoli Storage Manager initiates a migration process to move the data. The data is moved to a destination called next storage pool, the next storage pool is defined a storage pool parameter in the original storage pool. So, in the example we see that poolfast has a next storage pool called poolslow. The migration process will move data from the poolfast to poolslow; the process starts when the amount of data stored in poolfast exceeds the high migration threshold and stops when it reaches the low threshold.
Tivoli Storage Manager offers additional parameters to control migration of data from one storage pool to the next. One of these is migdelay that specifies the minimum number of days that a file must remain in a storage pool before the file becomes eligible for migration to the next storage pool.
523
Figure A-24 Illustrates a sample HSM storage hierarchy built to minimize storage costs.
data
gra te
Pool A
Mi
Recall
data Pool B
Mi n tio gra
Recall
Pool B: Cheap SATA disk. Migrate to Pool C if capacity utilization exceeds 80%
Pool C
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
For planned processes, such as storing a large group of files in storage and returning them to your local file system for processing, use the archive and retrieve processes. You can use the backup-archive client to archive and retrieve copies of migrated files in the same manner as you would archive and retrieve copies of files that reside on your local file system. HSM supports various file systems. Currently, the following integrations exist: File system proprietary integration Data can be directly accessed and read from any tier in the storage hierarchy. This is supported on JFS on AIX. DMAPI standard-based integration The Data Management Application Programming Interface (DMAPI) standard has been adopted by several storage management software vendors. File system vendors focus on the application data management part of the protocol. Storage management vendors focus on the HSM part of the protocol. Tivoli Storage Manager HSM Client supported platforms currently are: GPFS on AIX, VxFS on Solaris, GPFS on xLinux, and VxFS on HP.
525
526
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
Other publications
These publications are also relevant as further information sources: IBM Scale Out Network Attached Storage - Software Configuration Guide, GA32-0718 IBM Scale Out Network Attached Storage Installation Guide, GA32-0715 IBM Scale Out Network Attached Storage Introduction and Planning Guide, GA32-0716 IBM Scale Out Network Attached Storage Troubleshooting Guide, GA32-0717 IBM Scale Out Network Attached Storage Users Guide, GA32-0714 GPFS Advanced Administration Guide - Version 3 Release 3, SC23-5182
Online resources
These websites are also relevant as further information sources: SONAS Support Site: http://www.ibm.com/storage/support/ and select: Product family: Network Attached Storage (NAS) Product: Scale Out Network Attached Storage Click Go. Support for IBM System Storage, TotalStorage, and Tivoli Storage products: http://www.ibm.com/storage/support/ Additional GPFS documentation sources: http://www.ibm.com/systems/gpfs http://www-03.ibm.com/systems/software/gpfs/resources.html NFS V4 ACL information: http://www.nfsv4.org/
Copyright IBM Corp. 2010. All rights reserved.
527
528
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Index
Numerics
36 port InfiniBand switch storage capacity 231 authentication services 436 Automated triggers 169 Automount filesystem 356 Average Seek Time 237
A
access control element 498 access control entries 436 access control list 144, 498 access control lists 355 access pattern 235236 access SONAS 307 accessing the CLI 345 ACL modify 305 ACL management 305 acoustics tests 253 activate export share 387 Active Energy Manager 252 Active Energy Manager component 61 active-active configuration recovery 216 active-active peer nodes 80 active-passive config recovery 215 Active-Passive systems 492 Add/Delete Cluster 351 addcluster command 282, 351 adding new disks 366 admin create 452 Administrator role 128 administrator role 453 administrator roles 453 aggregate mode 244 Alert logs 432 Apache daemon 78 appliance connection 450 application characteristics 234 architecture 8 assign disk to filesystem 172 async configuration 211 async replication 205, 261 process 211 two directions 209 async replication code 209 async replication tool 212 asynchronous replication 116, 199 attachnw command 133, 282, 304 authentication 91, 301 CIFS 92 authentication environment, 257 authentication method 266 authentication methods 91, 145 authentication server 91 authentication servers 94 Copyright IBM Corp. 2010. All rights reserved.
B
backup 187 backupmanagementnode command 186, 217 bandwidth 243 bandwidth consideration 236 bandwidth requirement 234 banned node 183, 352 Base Rack Feature Code 9004 228 feature code 9005 226 base rack 62 feature code 9003 62 Feature Code 9004 63 Feature Code 9005 63 base rackKFeature Code 9003 227 Baseboard Management Controller 56 better performance of whole SONAS file system and failure protection. 265 Block I/O 4 block level replication 199 block size 262 block-level migration 445 bonded IP address 5960 bonded ports monitoring 151 bonding 150 bonding interface hardware address 150 bonding mode mode 1 150 bonding modes 150 mode 6 150 business continuance 214 Byte-range locking 78, 86
C
cabling consideration 225 cache hit 235 cache hit ratio 235, 239240 cache miss 235, 239 Call Home feature 255, 434 capacity Storage Subsystem disk type 243 capacity and bandwidth 234 capacity requirements 242 cfgad command 266, 301 cfgbackupfs command 135136, 476 cfgcluster command 293 cfghsmfs command 137
529
cfgldap command 267, 302 cfgrepl command 210 cfgtsmnode command 135136, 190 chdisk command 294 chexport command 215, 385 chfs command 174, 204 chgrp command 305 chkauth command 301 chkauth command,commands chkauth 267 chkpolicy command 177 chpolicy command 164 CIFS access control list 144 authentication 92 authorization 144 byte-range locks 78 export configuration 330 file lock 144 file serving functions 438 file shares 144 session-oriented 144 timestamps 77 CIFS details 385 CIFS export 381, 390 CIFS locks 78 CIFS parameters 410 CIFS protocol 438 CIFS share Windows access 390, 470 CIFS shares migration 444 CIM messages 133 CLI credentials 256 CLI policy commands 162 CLI ssh access 345 CLI tasks 350 CLI user 402 client interface node assignment 82 Cloud Storage 268 cluster thresholds 340 cluster backup Tivoli Storage Manager 186 cluster configuration 286 cluster configuration information 186 cluster details via CLI 351 Cluster management 351 cluster management 351 Cluster Manager 79 CIFS function 91 components 88 function 76, 83, 90 interface node management 83 responsibilities 79 Cluster manager concurrent file access 86 cluster replication 199
cluster utilization 411 clustered file server 486 Clustered Trivial Data Base 89 clustered trivial database 264 cnmgmtconfbak command 218 cnreplicate command 207 cnrsscheck command 281, 288 Command Line Interface 131 commands addcluster 282, 351 attachnw 133, 282 backupmanagementnode 217 cfgad 266, 301 cfgbackupfs 135136, 476 cfgcluster 293 cfghsmfs 113 cfgldap 267, 302 cfgtsmnode 135136, 190 chdisk 294 chexport 385 chfs 367 chgrp 305 chkauth 301 chpolicy 164 cnrsscheck 281, 288 dsmmigundelete 124 lsbackupfs 190 lscluster 294 lscurrnode 292 lsdisk 294, 297 lsexport 383, 469 lsfs 296 lsfset 372 lshsmlog 114 lshsmstatus 114 lsnode 291 lsnwgroup 303 lspolicy 164 lsquota 370 lstsmnode 188, 476 lsuser 452 mkexport 133, 304, 382 mkfs 295 mkfset 373 mknw 302 mknwnatgateway 299 mkpolicytask 165 mkuser 282, 452 mmgetacl 307, 436 mmlsattr 204 restripefs 203 rmexport 389 rmfs 369 rmsnapshot 196 runpolicy 165 setnwdns 297298 setpolicy 113, 165 setquota 370 startbackup 113, 189, 477 startrestore 190
530
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
unlinkfset 374, 376 unmountfs 364 commandsLmkpolicy 163 computing capacities 241 concurrent file access 86 configuration changes 46 configuration data restore 218 configuration information backup 217 configuration sizing 233 configurations controller 223 rack 222 Console Logging and Tracing 344 Console users 344 contact information 343 Create File system panel 358 create filesystem 359 Cron jobs 165 CRON task MkSnapshotCron 419 cron tasks 340 cron triggers 169 CTDB 182, 293 cluster management 495 configuration 494 databases 488, 496 DMASTER 489 function 8990 GPFS management 482 High Availability 492 ip failover 495 LACCESSOR 490 LACCOUNT 490 LMASTER 490 Node recovery 494 Node status 495 overview 486 Record Sequence Number 489 RECOVER MASTER 490 services manages 482 CTDB architecture 487 CTDB Health Check 480 CTDB layer 268 CTDB logs 481 CTDB tickle-acks 182 CTDB unhealthy 482 customer-supplied racks 61
Data Network, 268 Data Path IP 302 data replication internal 103 data striping 82 database application file access 4 dblservice command 407 default network group 152 default placement rule 167 default userid 451 defined roles 128 delete file system 369 Denali code 433 direct attached storage 6 Director API module 433 disable service 407 disaster recovery 116, 217 disaster recovery purposes 261 disk characteristics average seek time 236 rotational latency 236 disk management 394 disk properties change 397 disk scrubbing 67 distributed metadata 101 distributed token management 101 DMAPI 193 DNS round robin config 278 DNS configuration 297 DNS function 145 Domain Name Servers 145 drive configurations 67 drive options 66 dual-inline-memory modules 224
E
eblservice command 407 EFSSG0026I error message 480 enable service 407 end users 403 Ethernet connections six additional 54 Ethernet network external ports 67 Ethernet switch internal private 48 Ethernet switches 47 Event logs 432 expansion unit 52 expansion units 58 export access 470 deactivate 387 modify 383 remove protocols 386 Export administrator 128 export administrator role 453 Index
D
data access failover 182 data access layer 75 data blocks 262 data growth contributors 3 data management 107 Data Management API 193 data migration process 441 Data Network Topology 427
531
export configuration 378 export configuration wizard 467 Exports 387 exports create 466 details 330 exports panel 330 external connections 49 external Ethernet connections 49 external network 154 external notifications 132 external storage pool 109 External storage pools 102 external storage pools 265
F
F5 file virtualization solution 443 failback interface node 84 failover 149 NFS 185 protocol behaviors 185 failover considerations 184 failover failback 148 failure group 508 failure groups 264 FAT file system 142 Feature code 1000 44 Feature code 1001 44 Feature code 1100 44 Feature code 1101 44 file access protocols 75 File I/O 4 File level backup/restore 446 file level replication 199 file level security 144 file management rules 110 file placement policies 110 file restore 190 file set Hard Limit Disk 334 Grace Time 334 Hard Limit I-nodes 334 Quota 334 Snapshots 335 soft limit disk 334 Soft Limit I-nodes 334 unlink 376 File Sets 333 file shares 144 File System configuration panel 328 File system concepts 497 file system concept 142 file sets 333 GUI create 462 mount 361
overhead 263 permissions 498 related tasks 327 file system concept 142 File System Disks GUI info 329 file system migration 446 file system snapshot 101 file system status 361 File System Usage 329 File System utilization 413 file system utilization 339 fileset 102, 104 filesets 372 Filesystem management 354 Filesystem utilization 459 first_time_install script 283 floor load 250 FTP 77 FTP protocol 438 FTP shares 393
G
General Parallel File System overview 499 General Parallel File System 8 global namespace 10, 146, 501 global policy engine 265 GPFS 81 architecture 502 cluster manager roles 507 failure group 508 global namespace 501 high availability 507 metanode 507 performance 504 GPFS and CTDB 482 GPFS Filesystem 295 GPFS logs 481 GPFS metadata 111 GPFS technology 97 Grid view 424 grouping concepts Tivoli Storage Manager 134 GUI File Sets 334 GUI tasks 314, 340, 349 GUI user 402
H
hardware architecture 41 hardware installation 280 hardware overview 42 Health Center 131 Health Check 480 Health Summary 310, 420 help information 451 high availability 182 high availability design 12
532
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
HSM 262 concepts 191 node grouping 137 rules 169 space management 112 HSM stub file 122 HSM usage advantages 111 CLI commands 113 HTTPS supported features 78 HTTPS protocol 78
internal private management network 57 internal storage pools 265 intracluster replication 199 IP address balancing 82, 268 IP Address configuration 297 IP address ranges 49, 153
J
junctionPath 375
L
LAN free backup 187 LDAP config file 302 LDAP server 302 LDAP software 267 Lightweight Directory Access Protocol 257 Limits tab GUI 357 link file set 374 linkfset 375 list storage pools 171 locking 144 locking capability 88 logical storage pools 104 lsbackupfs command 190 lscluster command 294, 351 lscurrnode command 292 lsdisk command 174, 203, 294, 297, 465 lservice command 406 lsexport command 305, 383, 469 lsfs command 296, 465 lsfset command 372373 lsnode command 291 LSNW command VLAN option 303 lsnwgroup command 303 lsnwinterface command 184 lspolicy command 164 lsquota command 370 lssnapshot command 197 lstask command 197 lstsmnode command 188, 476 lsuser command 452
I
InfiniBand connections 59 InfiniBand Network 268 InfiniBand switch 36-port 47 96-port 47 InfiniBand switches 47 configuration 223 initialsoftware configuration 291 Integrated Baseboard Management Controller 49 Integrated Management Module 49, 224 integration 265 intelligent PDU 252 intercluster replication 199 Interface Expansion Rack 230 interface expansion racks 62 Interface Network Topology 427 Interface node memory capacity 224 interface node 10, 81, 148 bandwidth 236 cache memory 10 components 43 configuration 224 connections 53 failover 84 failover failback 148 failure 270 Intel NIC 44 locking capability 88 network group 151 panel 326 Qlogic network adapter 44 rear view 55 redundancy 83 SAS HDD 43 single point of failure 55 suspend 353 Tivoli Storage Manager 187 Tivoli Storage Manager client code 125 TSM proxy agent 136 workload allocation 81 interface nodes optional features 44 Internal IP addresses ranges 49
M
macro defines 168 manage users 344 Management Ethernet network 60 Management GUI 127 defined roles 128 management node 45, 81 NTP server 154 management node connections 57 management policies 160 management service stopped 480 manpage command 348349 Manual trigger 169 marketplace requirements 2
Index
533
applications 3 Master file system 370 master file system 264 Master filesystem 296 maximum files 260 maximum size 97 maximum transmission unit 155 maxsess parameter 186 metadata migration 441 metadata servers 265 Microsoft Active Directory 266 migrate files 440 tools 442 migration 435 CIFS shares 444 metadata 441 methods 445 networth bandwidth 440 NFS exports 444 migration data 445 migration example 447 migration filters 170 migrequiresbackup option 171 Miscellaneous tab 358 mkexport command 133, 215, 304, 382 mkfs command 295, 359, 465 mkfset command 373 mknw command 302 mknwgroup command 303 mknwnatgateway command 299 mkpolicy command example 163 mkpolicy command 163 mkpolicytask 160 MkSnapshotCron template 197 mkuser command 282, 291, 452 mmapplypolicy command 162 mmeditacl command 403 mmgetacl command 307, 403, 436 mmlsattr command 179, 204 modify export 383 monitoring 453 nodes report 458 mount file system 361 mount.cifs command 392
N
NAS access 67 limitations 7 overview 6 NAS device compared to SAN 143 NAS Services 426 NAS storage 278 NAT Gateway 154, 299 Nearline SAS 53 net rpc vampire utility 443 net that utility 443 Network Address Translation 154
Network Address Translation gateway 299 network attached storage 142 network bonding 150 network group 151 network group concept 133 network interface name 153 network interface names 153 network latency 155 network object 133 network router 154 Network Shared Disk protocol 105 network traffic separate 152 networking 141 NTP server 154 new disks 366 NFS access control 144 authentication 437 locking 144 stateless service 144 NFS client 77 NFS clients 331 NFS details 384 NFS Export Configuration 331 NFS exports 77, 380 migration 444 NFS protocol 437 NFS share Linus host 471 mount rules 77 NFS shares DNS host names 150 NFS verses CIFS 144 NIC 142 nmon tool 241, 245 nmon_analyser tool 241 node failover 149 node health 281 node health script 281 nodes report 458 nodes verification 291 Notification technology tickle ack 84 NSD 354 NTFS file system security 499 NTP server 154
O
oldpolicy option 163 onboard Ethernet ports 57 Operator role 128 operator role 453 overview 1
P
parallel grid architecture 49
534
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
passwordless access 219 Peered policies 168 Peered pools 168 peered pools 110 perfmon tool 246 performance cache hit ratio 239 GPFS 504 RAID technology 237 read/write 239 permissions UNIX classes 498 permon tool 241 placement policies 160 Plugable Authentication Module 92 policies 160 best practices 165 CLI apply 176 default templates 110 file placement 110 list CLI 177 three types 108 Policies List 331 Policies panel 331 policy apply 179 CLI commands 162 GUI create 175 rules 161 scan engine 161 threshold implementation 162 validate 177 Policy details 332 policy implementation 108 policy rule syntax 109 policy rules 163 syntax 167 Policy triggers 169 port configuration switch migration 231 POSIX locks 78 power consumption 66, 253 power distribution units 61 power of two 241 predefined task 340 premigration files 192 Private Network range 268 protocol HTTPS 78 protocol behaviors 185 protocols CIFS 76 FTP 77 protocols mapping 86
R
rack cabling 47 rack configurations power distribution units 61 racks customer supplied 61 raid storage controller 51 Raw Usable Capacity 242 recover asynchronous replica 219 recovery master node 264 recovery steps active-passive config 215 Redbooks publications website 528 Contact us xvii redundancy 278 interface node 83 remote configuration 255 Remote Memory Direct Access 106 remove disk 367 remove task 418 REPLICATE clause 162 replication 116 local and remote 198 replication cycle 212 replication enabled 356 replication schedule 119 reporting. 411 resolv.conf file 298 restore remote replica 219 traditional backup 219 restripefs command 203204 Resume disk 396 Resume Node command 353 resumenode command 353 Richcopy tool 442 rmexport command 389 rmfs command 369 rmpolicy command 164 rmpolicytask command 165 rmsnapshot command 196 robocopy tool 440 Robocpy tool 442 rolling upgrades 81 root fileset 333 Rotational Latency 237 rsync tool 443 rsync transfer 118 rsynch tool 440 rule types 160 rules syntax 161 runpolicy command 160161, 165
Q
Quad ports GigE connectivity 244 quorum configuration 254 Quorum nodes 285 quorum topology 256
S
Samba 486 Index
535
Samba logs 481 SAS drive configuration 223 SATA drive configuration 224 SATA drives maximum capacity 260 scale out capability 10 SCAN engine 161 scan engine 103 Schedule task 420 schedule tasks 416 SCP 407 scp command 219, 442 SCP protocol 410 script verify_hardware_wellness 288 secure copy protocol 442 service configuration change 408 service maintenance port 5657 setnwdns command 297298 setpolicy command 160, 165 setquota command 370 share access load balance 278 shares information 431 size maximum physical 97 sizing 233 smbd daemon 488 snapshot 261 CLI create 473 create 472 Snapshots 114 software 11 upgrades 81 software architecture 73 Software Cluster Manager 148 software configuration 282 software installation 281 SONAS 74, 193 ACLs access control list 103 addressing 146 administrator roles 453 architecture 8 asynchronous replication 116 authentication 91 authentication methods 145 authorization 91 backup without Tivoli Storage Manager 126 base rack 62 CLI tasks 350 cluster backup 186 cluster configuration 286 cluster configuration information 186 Command Line Interface 131 component connections 53 configuration changes 46
create admin 452 data access 239 data access layer 75 data management 107 disaster recovery 116 DNS 145 drive options 66 Ethernet ports 43 external ports 67 file archive 124 file system administrator 101 fileset 102, 104 GUI access 314 GUI admin create 452 GUI tasks 349 hardware architecture 41 hardware installation 280 hardware overview 42 Health Center 131, 420 Hierarchical Storage Management processing 113 InfiniBand connections 59 interface expansion rack 64 interface node 45 IP address ranges 153 license 291 logs 457 Management GUI 127 management node connections 57 maximum files 260 monitoring 453 Network Address Translation 154 network interface names 153 node health 281 nodes verification 291 notification monitoring 132 online help 451 operating system 45 overview 1, 8 Private Network 292 raw storage 9 redundancy 278 scale out capability 10 scan engine 162 Snapshots 114 snapshots 101 software 11, 43 software configuration 282 software installation 281 storage controller 51 storage expansion uni 53 storage management 11, 126 storage node connections 56 switches 47 Tivoli Storage Manager 186 Tivoli Storage Manager licensing 125 Tivoli Storage Manager setup 124 SONAS administrator 399 SONAS appliance access 307 SONAS Base Rack
536
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
XIV storage 70 SONAS CLI user 399 SONAS end users 403 SONAS GUI and CLI tasks 350 SONAS GUI user 402 SONAS rack acoustic doors acoustic doors 253 SONAS snapshots 193 SONAS Software 74 function 81 Tivoli Storage Manager scripts 122 space efficient 193 space requirements 250 ssh client session 451 startbackup command 189, 477 startmgtsrv command 314, 405406, 480 startrepl command 213, 215 startrestore command 190 stateful 144 stateless service 144 stopmgtsrv command 405 storage 260 Storage administrator 128 storage administrator role 453 Storage Building Block 428 storage controller 32 KB chunk size 262 power consumption 66 RAID 5 66 rebuilds 52 Storage Disk details 336 Storage Expansion Rack 229 storage expansion unit 53 storage expansion units 58 storage node 45, 81 contents 45 GUI information 326 HA pairs 45 HDDs 45 maximum 45 storage node commands 353 storage node connections 56 Storage pod expansion 50 storage pod 50 configuration 223 connectivity 58 expansion 50 storage 429 storage pool 265 change 173 CLI create 173 external 109 GUI create 171 user 265 Storage Pool list 337 Storage Pools 504 Storage pools 102 suspend disk 395
suspendnode command 183, 352 switch configurations 223 switch health 288 switches 47 symbolic host names. 145 synchronous replication 199201 System administrator 128 system administrator role 453 system log 457 System logs 432 system management 126 system overhead 263 system utilization 338
T
task details 341 temperature specifications 254 threshold 166 utilization 133 threshold implementation 162 threshold monitoring 459, 461 threshold notification 460 threshold settings 339 tickle ack 84 tickle application 148 tickle-ack technology 79 Tiered policies 168 tiered pools 110 time synchronization nodes 92 Tivoli Storage Manager 186 access strategy 519 administration clients 511 architectural overview 510 archive copy group 520 archive feature 514 Backup Archive client 513 backup copy group 520 backup/archive client 125 bound policy 516 central scheduler 514 client API 514 client node 519 client software 125 clients 119 concepts 509 Copy group rule 520 Data archiving 509 data deduplication 517 database size 121 DB2 database 511 device classes 517 device type 518 dsmmigundelete command 123124 file based 119 grouping concepts 134 Hierarchical Storage Management for UNIX clients 524 Hierarchical Storage Management for Windows 525 HSM stub file 122 Index
537
interface node setup 187 LAN-free 515 LAN-free backup 121 management class 519, 521 metadata 511 overview 509 policy domain 521 policy set 521 progressive backup 509 reclamation threshold 513 restore command 123 scripts 122 security concepts 513 server 511 software licensing 125 SONAS interaction 119 storage management services 509 tape drives 519 Tivoli Storage Manager client code 125 Tivoli Storage Manager client definitions 188 Tivoli Storage Manager client software 182 Tivoli Storage Manager configuration 475 Tivoli Storage Manager database 187 Tivoli Storage Manager database sizing 187 Tivoli Storage Manager Hierarchical Storage Management client 190 Tivoli Storage Manager HSM client 111 Tivoli Storage Manager requirements 112 Tivoli Storage Manager server maxsess parameter 186 Tivoli Storage Manager server stanza 188 Tivoli Storage Manager setup 124 Tivoli Storage Manager stanzas 135 Tivoli Storage Manager storage pool 475 Topology Data Network 427 Interface Network 427 Interface Nodes 423 Management Node 426 topology 420 Topology Viewer 130 transparent recall 192 TSM data management 513 TSM server 120 two-tier architecture 81 two-tiered architecture 105
V
vampire utility 443 verify_hardware_wellness 281 VI editor 305 VLAN 153 VLAN option 303 VLAN tagging 153 VLAN trunking 153 Volume Shadow copy Services 116 Volume Shadow Services 77
W
WebDAV 78 weight distribution 250 weight expression 170 weight expressions 168 Winbind logs 481 Windows Active Directory 144 workload tools 245 workload allocation 81 workload analyzer tools 245 workload characteristics 240 performance impact 235
X
XCOPY utility 84, 149 xcopy utility 442 XIV Metadata replication 71 SONAS configuration 68 XIV configuration component considerations 70 XIV storage 68 attachment to SONAS 69
U
UID to SID mapping 94 UNIX permissions 498 unlink file set 376 unlinkfset command 374, 376 unmount file system 363 unmountfs command 364 upgrades 81 user storage pool 265 userid default 451 utilization monitoring 415
538
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics
Back cover