Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $9.99/month after trial. Cancel anytime.

PostgreSQL 9 High Availability Cookbook
PostgreSQL 9 High Availability Cookbook
PostgreSQL 9 High Availability Cookbook
Ebook1,101 pages6 hours

PostgreSQL 9 High Availability Cookbook

Rating: 5 out of 5 stars

5/5

()

Read preview

About this ebook

A comprehensive series of dependable recipes to design, build, and implement a PostgreSQL server architecture free of common pitfalls that can operate for years to come. Each chapter is packed with instructions and examples to simplify even highly complex database operations. If you are a PostgreSQL DBA working on Linux systems who want a database that never gives up, this book is for you. If you've ever experienced a database outage, restored from a backup, spent hours trying to repair a malfunctioning cluster, or simply want to guarantee system stability, this book is definitely for you.
LanguageEnglish
Release dateJul 17, 2014
ISBN9781849516976
PostgreSQL 9 High Availability Cookbook

Related to PostgreSQL 9 High Availability Cookbook

Related ebooks

Computers For You

View More

Related articles

Reviews for PostgreSQL 9 High Availability Cookbook

Rating: 5 out of 5 stars
5/5

2 ratings1 review

What did you think?

Tap to rate

Review must be at least 10 words

  • Rating: 5 out of 5 stars
    5/5
    PostgreSQL 9 High Availability Cookbook is a very well written book whose primary audience are experienced DBAs and system engineers who want to take their PostgreSQL skills to the next level by diving into the details of building highly available PostgreSQL based systems. Reading this book is like drinking from a fire hose, the signal-to-noise ratio is very high; in other words, every single page is packed with important, critical, and very practical information. As a consequence, this also means that the book is not for newbies: not only you have to know the fundamental aspects of PostgreSQL from a database administrator’s point of view, but you also need to have solid GNU/Linux system administration background.One of the strongest aspects of the book is the author’s principled and well-structured engineering approach to building a highly available PostgreSQL system. Instead of jumping to some recipes to be memorized, the book teaches you basic but very important principles of capacity planning. More importantly, this planning of servers and networking is not only given as a good template, but the author also explains the logic behind it, as well as drawing attention to the reason behind the heuristics he use and why some magic numbers are taken as a good estimate in case of lack of more case-specific information. This style is applied very consistently throughout the book, each recipe is explained so that you know why you do something in addition to how you do it.After the first chapter on basic planning, the author jumps to a set of miscellaneous topics in the Chapter 2, and details some important tricks such as defusing cache poisoning, concurrent indexes, and Linux kernel tweaks. This chapter starts to reveal another valuable aspect of the book: the information regarding an open source RDBMS such as PostgreSQL is freely available on the Internet, but depending on your needs, a particular set of information can very well be scattered over a lot of e-mail list messages, forum posts, Wiki pages, etc., and it takes a disciplined mind with a lot of field experience to put all of that scattered information into a single, consistent, logical and easy to follow form.Starting from Chapter 3, each chapter explores a single topic in a lot of practical detail, starting with connection pooling. This chapter, as well as almost all of the remaining ones has a nice feature: the author always try to explain alternative solutions, describes their advantages and disadvantages, and where possible shows how to combine some alternatives to get best of each.Chapters 4 and 5, namely Troubleshooting and Monitoring can be thought as a single chapter, because it is difficult to think these fundamental concepts separately. These chapters are also not only valuable for PostgreSQL DBAs but for any DBA or any GNU/Linux system administrator in general. Troubleshooting and monitoring a highly available database requires a book by itself, but since this book’s scope is clearly defined, the author provides enough background and practical starting points in about 70 pages.I can easily say that Chapter 6: Replication, together with Chapter 7: Replication Management Tools starts to form the ‘meat’ of the book; without successfully implementing and practically managing the replication of your critical database servers, it is impossible to think about building a highly available system, in other words, you need at least one replica of your database system, so that if your primary system goes down, you can very easily switch to your replica (or offload some of your less criticial applications to your replica and relive the stress on your primary system). These two chapters presents you the solid and practical information to achieve that goal. Similar to the previous chapters, the author shows and explains many useful and practical tools, he also does not refrain from presenting an open source tool, walctl, that he developed to as a “PostgreSQL WAL management system that pushes or pulls WAL files from a remote central storage server”. I consider another positive point for the book because it clearly indicates the serious time investment of the author for PostgreSQL and its high availability configuration.Chapter 8: Advanced Stack, is aptly named, because this chapter, together with Chapter 9: Cluster Control, forms the most advanced and complex part of the book. The author’s warnings regarding the information density, and related real-life complexity of the topics explained in these two chapters should not be taken lightly. Indeed, there are many combinations of events that can lead to subtle and hard to debug errors in case of clusters set up to take over from failing nodes. Creating such a highly available system with Linux based tools such LVM, XFS, DRBD, Pacemaker, and Corosync requires careful planning, probably experimenting in a safe virtual environment, and then a disciplined execution, as well as monitoring. Again, these chapters alone include topics that can take a volume, and a detailed training by themselves, and I think the author kept a good balance between depth and breadth.Final chapter, Data Distribution, can be considered as a bonus chapter that briefly shows setting up a PostgreSQL server, dealing with foreign tables, managing shards, creating a scalable nextval replacement, and relevant tips and tricks.There are not many negative sides to this very dense PostgreSQL book. A few minor points that deserves mention are its focus on the most popular Linux distributions such as Red Hat, Debian and their derivatives (FreeBSD and other BSD admins will require slightly more effort), some obsolete networking command usage such as ifconfig instead of ip (but then again, this might be helpful for FreeBSD admins), and inconsistent use of command outputs (sometimes no output is shown, whereas for some commands screen-shots or textual outputs are used inconsistently). One might also argue for a slight reordering of chapters for pedagogical concerns, but then again this is highly open to debate and one’s particular preferences when it comes to system and database administration.I can recommend PostgreSQL 9 High Availability Cookbook without hesitation to PostgreSQL DBAs who want to push their skill to the next level, and learn the fundamentals of building highly available PostgreSQL based database clusters. It certainly will not be as easy as reading a book, but it is good to know such a book exists as a very good guide.

Book preview

PostgreSQL 9 High Availability Cookbook - Shaun M. Thomas

Table of Contents

PostgreSQL 9 High Availability Cookbook

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why Subscribe?

Free Access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Hardware Planning

Introduction

Planning for redundancy

Getting ready

How to do it...

How it works...

There's more...

See also

Having enough IOPS

Getting ready

How to do it...

How it works...

There's more...

A working example

Making concessions

Sizing storage

Getting ready

How to do it...

How it works...

There's more...

Real-world example

Adjusting the numbers

Incorporate the spreadsheet

Investing in a RAID

Getting ready

How to do it...

How it works...

There's more...

See also

Picking a processor

Getting ready

How to do it...

How it works...

There's more...

Hyperthreading

Turbo Boost

Power usage

See also

Making the most of memory

Getting ready

How to do it...

How it works...

There's more...

Exploring nimble networking

Getting ready

How to do it...

How it works...

There's more...

A networking example

Remember redundancy

Save the research

See also

Managing motherboards

Getting ready

How to do it...

How it works...

There's more...

See also

Selecting a chassis

Getting ready

How to do it...

How it works...

There's more...

Saddling up to a SAN

Getting ready

How to do it...

How it works...

There's more...

See also

Tallying up

Getting ready

How to do it...

How it works...

There's more...

Protecting your eggs

Getting ready

How to do it...

How it works...

There's more...

2. Handling and Avoiding Downtime

Introduction

Determining acceptable losses

Getting ready

How to do it...

How it works...

Configuration – getting it right the first time

Getting ready

How to do it...

How it works...

There's more...

See also

Configuration – managing scary settings

Getting ready

How to do it...

How it works...

There's more...

Distinct settings

More information

See also

Identifying important tables

Getting ready

How to do it...

How it works...

There's more...

Reset stats

Use pgstattuple

See also

Defusing cache poisoning

Getting ready

How to do it...

How it works...

See also

Exploring the magic of virtual IPs

Getting ready

How to do it...

How it works...

There's more...

Terminating rogue connections

Getting ready

How to do it...

How it works...

There's more...

Reducing contention with concurrent indexes

Getting ready

How to do it...

How it works...

There's more...

No transactions

One at a time

Danger with OLTP use

See also

Managing system migrations

Getting ready

How to do it...

How it works...

There's more...

See also

Managing software upgrades

Getting ready

How to do it...

How it works...

There's more...

See also

Mitigating the impact of hardware failure

Getting ready

How to do it...

How it works...

There's more...

Copy WAL files more easily

Add compression

Secondary delay

See also

Applying bonus kernel tweaks

Getting ready

How to do it...

How it works...

There's more...

3. Pooling Resources

Introduction

Determining connection costs and limits

Getting ready

How to do it...

How it works...

There's more...

Installing PgBouncer

Getting ready

How to do it...

How it works...

There's more...

See also

Configuring PgBouncer safely

Getting ready

How to do it...

How it works...

There's more...

What about pool_mode?

Problems with prepared statements

See also

Connecting to PgBouncer

Getting ready

How to do it...

How it works...

See also

Listing PgBouncer server connections

Getting ready

How to do it...

How it works...

There's more...

See also

Listing PgBouncer client connections

Getting ready

How to do it...

How it works...

There's more...

See also

Evaluating PgBouncer pool health

Getting ready

How to do it...

How it works...

There's more...

See also

Installing pgpool

Getting ready

How to do it...

How it works...

There's more...

See also

Configuring pgpool for master/slave mode

Getting ready

How to do it...

How it works...

There's more...

See also

Testing a write query on pgpool

Getting ready

How to do it...

How it works...

There's more...

Swapping active nodes with pgpool

Getting ready

How to do it...

How it works...

There's more...

See also

Combining the power of PgBouncer and pgpool

Getting ready

How to do it...

How it works...

There's more...

4. Troubleshooting

Introduction

Performing triage

Getting ready

How to do it...

How it works...

There's more...

Installing common statistics packages

Getting ready

How to do it...

How it works...

See also

Evaluating the current disk performance with iostat

Getting ready

How to do it...

How it works...

There's more...

See also

Tracking I/O-heavy processes with iotop

Getting ready

How to do it...

How it works...

There's more...

See also

Viewing past performance with sar

Getting ready

How to do it...

How it works...

There's more...

See also

Correlating performance with dstat

Getting ready

How to do it...

How it works...

See also

Interpreting /proc/meminfo

Getting ready

How to do it...

How it works...

There's more...

See also

Examining /proc/net/bonding/bond0

Getting ready

How to do it...

How it works...

See also

Checking the pg_stat_activity view

Getting ready

How to do it...

How it works...

There's more...

See also

Checking the pg_stat_statements view

Getting ready

How to do it...

How it works...

There's more...

Reset the stats

Catch more queries

See also

Debugging with strace

Getting ready

How to do it...

How it works...

There's more...

See also

Logging checkpoints properly

Getting ready

How to do it...

How it works...

There's more...

See also

5. Monitoring

Introduction

Figuring out what to monitor

Getting ready

How to do it...

How it works...

There's more...

Installing and configuring Nagios

Getting ready

How to do it...

How it works...

There's more...

See also

Configuring Nagios to monitor a database host

Getting ready

How to do it...

How it works...

There's more...

See also

Enhancing Nagios with check_mk

Getting ready

How to do it...

How it works...

There's more...

See also

Getting to know check_postgres

Getting ready

How to do it...

How it works...

There's more...

See also

Installing and configuring collectd

Getting ready

How to do it...

How it works...

See also

Adding a custom PostgreSQL monitor to collectd

Getting ready

How to do it...

How it works...

There's more...

See also

Installing and configuring Graphite

Getting ready

How to do it...

How it works...

See also

Adding collectd data to Graphite

Getting ready

How to do it...

How it works...

See also

Building a graph in Graphite

Getting ready

How to do it...

How it works...

There's more...

Customizing a Graphite graph

Getting ready

How to do it...

How it works...

Creating a Graphite dashboard

Getting ready

How to do it...

How it works...

There's more...

6. Replication

Introduction

Deciding what to copy

Getting ready

How to do it...

How it works...

Securing the WAL stream

Getting ready

How to do it...

How it works...

There's more...

See also

Setting up a hot standby

Getting ready

How to do it...

How it works...

See also

Upgrading to asynchronous replication

Getting ready

How to do it...

How it works...

There's more...

See also

Bulletproofing with synchronous replication

Getting ready

How to do it...

How it works...

There's more...

See also

Faking replication with pg_receivexlog

Getting ready

How to do it...

How it works...

See also

Setting up Slony

Getting ready

How to do it...

How it works...

See also

Copying a few tables with Slony

Getting ready

How to do it...

How it works...

There's more...

See also

Setting up Bucardo

Getting ready

How to do it...

How it works...

See also

Copying a few tables with Bucardo

Getting ready

How to do it...

How it works...

See also

Setting up Londiste

Getting ready

How to do it...

How it works...

See also

Copying a few tables with Londiste

Getting ready

How to do it...

How it works...

See also

7. Replication Management Tools

Introduction

Deciding when to use third-party tools

Getting ready

How to do it...

How it works...

There's more...

Installing and configuring Barman

Getting ready

How to do it...

How it works...

See also

Backing up a database with Barman

Getting ready

How to do it...

How it works...

There's more...

See also

Restoring a database with Barman

Getting ready

How to do it...

How it works...

There's more...

See also

Installing and configuring OmniPITR

Getting ready

How to do it...

How it works...

See also

Managing WAL files with OmniPITR

Getting ready

How to do it...

How it works...

There's more...

See also

Installing and configuring repmgr

Getting ready

How to do it...

How it works...

See also

Cloning a database with repmgr

Getting ready

How to do it...

How it works...

There's more...

See also

Swapping active nodes with repmgr

Getting ready

How to do it...

How it works...

There's more...

See also

Installing and configuring walctl

Getting ready

How to do it...

How it works...

See also

Cloning a database with walctl

Getting ready

How to do it...

How it works...

Managing WAL files with walctl

Getting ready

How to do it...

How it works...

There's more...

8. Advanced Stack

Introduction

Why DRBD?

Why LVM?

Why XFS?

The stack

Preparing systems for the stack

Getting ready

How to do it...

How it works...

There's more...

See also

Getting started with the Linux Volume Manager

Getting ready

How to do it...

How it works...

There's more...

See also

Adding block-level replication

Getting ready

How to do it...

How it works...

See also

Incorporating the second LVM layer

Getting ready

How to do it...

How it works...

There's more...

See also

Verifying a DRBD filesystem

Getting ready

How to do it...

How it works...

See also

Correcting a DRBD split brain

Getting ready

How to do it...

How it works...

See also

Formatting an XFS filesystem

Getting ready

How to do it...

How it works...

See also

Tweaking XFS performance

Getting ready

How to do it...

How it works...

There's more...

See also

Maintaining an XFS filesystem

Getting ready

How to do it...

How it works...

There's more...

See also

Using LVM snapshots

Getting ready

How to do it...

How it works...

See also

Switching live stack systems

Getting ready

How to do it...

How it works...

There's more...

Detaching a problematic node

Getting ready

How to do it...

How it works...

There's more...

See also

9. Cluster Control

Introduction

Before we begin...

Installing the components

Getting ready

How to do it...

How it works...

There's more...

See also

Configuring Corosync

Getting ready

How to do it...

How it works...

See also

Preparing startup services

Getting ready

How to do it...

How it works...

There's more...

See also

Starting with base options

Getting ready

How to do it...

How it works...

There's more...

See also

Adding DRBD to cluster management

Getting ready

How to do it...

How it works...

There's more...

See also

Adding LVM to cluster management

Getting ready

How to do it...

How it works...

There's more...

Adding XFS to cluster management

Getting ready

How to do it...

How it works...

Adding PostgreSQL to cluster management

Getting ready

How to do it...

How it works...

There's more...

Adding a virtual IP to hide the cluster

Getting ready

How to do it...

How it works...

Adding an e-mail alert

Getting ready

How to do it...

How it works...

Grouping associated resources

Getting ready

How to do it...

How it works...

Combining and ordering related actions

Getting ready

How to do it...

How it works...

Performing a managed resource migration

Getting ready

How to do it...

How it works...

There's more...

Using an outage to test migration

Getting ready

How to do it...

How it works...

There's more...

10. Data Distribution

Introduction

Identifying horizontal candidates

Getting ready

How to do it...

How it works...

There's more...

See also

Setting up a foreign PostgreSQL server

Getting ready

How to do it...

How it works...

There's more...

Altering foreign servers

Dropping foreign servers

See also

Mapping a remote user

Getting ready

How to do it...

How it works...

There's more...

See also

Creating a foreign table

Getting ready

How to do it...

How it works...

There's more...

See also

Using a foreign table in a query

Getting ready

How to do it...

How it works...

There's more...

Optimizing foreign table access

Getting ready

How to do it...

How it works...

There's more...

Transforming foreign tables into local tables

Getting ready

How to do it...

How it works...

There's more...

See also

Creating a scalable nextval replacement

Getting ready

How to do it...

How it works...

There's more...

Building a sharding API

Getting ready

How to do it...

How it works...

There's more...

See also

Talking to the right shard

Getting ready

How to do it...

How it works...

There's more...

Create a cache

Choose an application data to logical shard mapping

Moving a shard to another server

Getting ready

How to do it...

How it works...

There's more...

Index

PostgreSQL 9 High Availability Cookbook


PostgreSQL 9 High Availability Cookbook

Copyright © 2014 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2014

Production reference: 1100714

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-84951-696-9

www.packtpub.com

Cover image by Pratyush (<[email protected]>)

Credits

Author

Shaun M. Thomas

Reviewers

Hans-Jürgen Schönig

Sheldon E. Strauch

Vasilis Ventirozos

Tomas Vondra

Acquisition Editors

Anthony Albuquerque

Harsha Bharwani

Content Development Editor

Sriram Neelakantan

Technical Editor

Tanvi Bhatt

Copy Editors

Janbal Dharmaraj

Sayanee Mukherjee

Karuna Narayanan

Project Coordinator

Kartik Vedam

Proofreaders

Maria Gould

Ameesha Green

Paul Hindle

Indexers

Hemangini Bari

Tejal Soni

Priya Subramani

Graphics

Sheetal Aute

Disha Haria

Abhinash Sahu

Production Coordinator

Melwyn D'sa

Cover Work

Melwyn D'sa

About the Author

Shaun M. Thomas has been working with PostgreSQL since late 2000. He is a frequent contributor to the PostgreSQL Performance and General mailing lists, assisting other DBAs with the knowledge he's gained over the years. In 2011 and 2012, he gave presentations at the Postgres Open conference on topics such as handling extreme throughput, high availability, server redundancy, and failover techniques. Most recently, he has contributed the Shard Manager extension and the walctl WAL management suite.

Currently, he serves as the database architect at OptionsHouse, an online options brokerage with a PostgreSQL cluster that handles almost 2 billion queries per day. Many of the techniques used in this book were developed specifically for this extreme environment.

He believes that PostgreSQL has a stupendous future ahead, and he can't wait to see the advancements subsequent versions will bring.

I'd like to thank my wife, Jennifer, for putting up with the weeks of long nights and for providing the encouragement I needed to get it all done. This book is my thank you letter to the PostgreSQL community, which helped me out of jams more times than I can count. I'd also like to thank OptionsHouse for putting me in charge of the busiest database I'd ever seen, forcing me to learn enough to keep it all running smoothly.

About the Reviewers

Hans-Jürgen Schönig is the founder and CEO of Cybertec Schönig & Schönig GmbH (www.postgresql-support.de), a company that focuses on PostgreSQL support, training, and consulting, as well as on scalable PostgreSQL solutions.

He has 15 years of experience in the field of PostgreSQL and has written several books that deal with PostgreSQL in the past couple of years.

Sheldon E. Strauch is a 20 year veteran of software consulting at companies such as IBM, Sears, Ernst & Young, and Kraft Foods. He has a Bachelor's degree in Business Administration and leverages his technical skills to improve business self-awareness. His interests include data gathering, management, and mining; maps and mapping; business intelligence; and application of data analysis for continuous improvement. He is currently focused on the development of end-to-end data management and mining at Enova International, a financial services company located in Chicago. In his spare time, he enjoys the performing arts, particularly music, and traveling with his wife Marilyn.

Vasilis Ventirozos has been working with databases for more than a decade on mission critical applications for companies in both the telecom and lottery industries. While he has worked with a number of database technologies, he considers PostgreSQL his database of choice. He currently works at OmniTI, a full-stack IT services company focused on highly scalable web infrastructure, providing PostgreSQL-related consulting and management.

Tomas Vondra has been working with PostgreSQL since 2003; although he's been working with various other databases since then (both open source and commercial), he instantly fell in love with PostgreSQL and the wonderful community built around it.

He is currently working at GoodData, a company that operates a BI cloud platform built on PostgreSQL, as a performance specialist and is mainly responsible for tracking and improving performance. In his free time, he's usually writing PostgreSQL extensions and patches or hacking something else related to PostgreSQL.

www.PacktPub.com

Support files, eBooks, discount offers, and more

You might want to visit www.PacktPub.com for support files and downloads related to your book.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

http://PacktLib.PacktPub.com

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print and bookmark content

On demand and accessible via web browser

Free Access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Welcome to PostgreSQL 9 High Availability Cookbook! As a database, PostgreSQL is beginning to take its place in the world of high transaction rates and very large data installations. With this comes an increasing demand for PostgreSQL to act as a critical piece of infrastructure. System outages in these environments can be spectacularly costly and demand a higher caliber of management and tooling.

It is the job of a DBA to ensure that the database is always available for application demands and clients' needs. Yet, this is extremely difficult to accomplish without the necessary skills and experience with a common operating system and PostgreSQL tools. Installing, configuring, and optimizing a PostgreSQL cluster is a tiny fraction of the process. We also need to know how to find and recognize problems, manage a swarm of logical and physical replicas, and scale to increasing demands, all while preventing or mitigating system outages.

This book is something the author wishes existed 10 years ago. Back then, there were no recipes to follow to build a fault-tolerant PostgreSQL cluster; we had to improvise. It is our aim to prevent other DBAs from experiencing the kind of frustration borne from reinventing the wheel. We've done all the hard work, taken notes, outlined everything we've ever learned about keeping PostgreSQL available and written it all down in here.

We hope you find this book useful and relevant; it is the product of years of trial, error, testing, and a large amount of input from the PostgreSQL community.

What this book covers

Chapter 1, Hardware Planning, sets the tone by covering the part that the appropriate hardware selection plays in a successful PostgreSQL cluster of any size.

Chapter 2, Handling and Avoiding Downtime, provides safe settings and defaults for a stable cluster and explains the basic techniques for responding to mishaps.

Chapter 3, Pooling Resources, presents PgBouncer and pgpool, two tools geared toward controlling PostgreSQL connections. Together, these can provide an abstraction layer to reduce the effect of outages and increase system performance.

Chapter 4, Troubleshooting, introduces a battery of common Unix and Linux tools and resources that can collect valuable diagnostic information. It also includes a couple of PostgreSQL views that can assist in finding database problems.

Chapter 5, Monitoring, further increases availability by adding Nagios, check_mk, collectd, and Graphite to watch active PostgreSQL clusters. This chapter helps us stay informed, and find potential problems before they happen.

Chapter 6, Replication, discusses several PostgreSQL replication scenarios and techniques for more durable data. This includes logical replication tools such as Slony, Bucardo, and Londiste.

Chapter 7, Replication Management Tools, brings WAL management to the forefront. It talks about integrating Barman, OmniPITR, repmgr, or walctl into PostgreSQL to further prevent data loss and control complicated multiserver clusters.

Chapter 8, Advanced Stack, explains how to use LVM, DRBD, and XFS to build a solid foundation and keep data on two servers simultaneously to prevent costly outages.

Chapter 9, Cluster Control, incorporates Pacemaker into the advanced stack. Fully automate PostgreSQL server migrations in case of impending maintenance or hardware failure.

Chapter 10, Data Distribution, shows how PostgreSQL features such as foreign data wrappers and materialized views can produce a scalable cluster. Included with this chapter is a simple data-sharding API technique to reduce dependency on a single PostgreSQL server.

What you need for this book

This book is written for Unix systems with a focus on Linux in particular. Such servers have become increasingly popular to host databases for companies both large and small. As such, we highly recommend that you have a virtual machine or development system running a recent copy of Debian, Ubuntu, Red Hat Enterprise Linux, or a variant such as CentOS or Scientific Linux.

You will also need a copy of PostgreSQL. If your chosen Linux distribution isn't keeping the included PostgreSQL packages sufficiently up to date, the PostgreSQL website maintains binaries for most popular distributions. You can find these at http://www.postgresql.org/download/.

Users of Red Hat Enterprise Linux and its variants should refer to the following URL to add the official PostgreSQL YUM repository to important database systems: http://yum.postgresql.org/repopackages.php.

Users of Debian, Ubuntu, Mint, and other related Linux systems should refer to the PostgreSQL APT wiki page at the following URL instead: https://wiki.postgresql.org/wiki/Apt.

Be sure to include any contrib packages in your installation. They include helpful utilities and database extensions we will use in some recipes.

Users of BSD should still be able to follow along with these recipes. Some commands may require slight alterations to run properly on BSD, so be sure to understand the intent before executing them. Otherwise, all commands have been confirmed to work on BASH and recent GNU tools.

Who this book is for

This book is written for PostgreSQL DBAs who want an extremely fault-tolerant database cluster. While PostgreSQL is suitable for enterprise environments, there are a lot of tertiary details even a skilled DBA might not know. We're here to fill in those gaps.

There is a lot of material here for all levels of DBA. The primary assumption is that you are comfortable with a Unix command line and maintain at least some regular exposure to PostgreSQL as a DBA or system administrator.

If you've ever experienced a database outage, restored from a backup, or spent hours trying to repair a malfunctioning cluster, we have material that covers all these scenarios. This book holds the key to managing a robust PostgreSQL cluster environment and should be of use to anyone in charge of a critical piece of database infrastructure.

Sections

This book contains the following sections:

Getting ready

This section tells us what to expect in the recipe, and describes how to set up any software or any preliminary settings needed for the recipe.

How to do it…

This section characterizes the steps to be followed for cooking the recipe.

How it works…

This section usually consists of a brief and detailed explanation of what happened in the previous section.

There's more…

It consists of additional information about the recipe in order to make the reader more anxious about the recipe.

See also

This section may contain references to the recipe.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The final query is a bit more complicated since it uses a CASE statement.

A block of code is set as follows:

SELECT name, setting

FROM pg_settings

WHERE context = 'postmaster';

Any command-line input or output is written as follows:

sudo apt-get install postgresql-9.3-pgfincore

New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Click on the Dashboard link on the top menu bar.

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.

Piracy

Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors, and our ability to bring you valuable content.

Questions

You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.

Chapter 1. Hardware Planning

In this chapter, we will learn about selection and provisioning of hardware necessary to build a highly available PostgreSQL database. We will cover the following recipes in this chapter:

Planning for redundancy

Having enough IOPS

Sizing storage

Investing in a RAID

Picking a processor

Making the most of memory

Exploring nimble networking

Managing motherboards

Selecting a chassis

Saddling up to a SAN

Tallying up

Protecting your eggs

Introduction

What does high availability mean? In the context of what we're trying to build, it means we want our database to start and remain online for as long as possible. A critical component of this is the hardware that hosts the database itself. No matter how perfect a machine and its parts may be, failure or unexpected behavior along any element can result in an outage.

So how do we avoid these unwanted outages? Expect them. We must start by assuming hardware can and will fail, and at the worst possible moment. If we start with that in mind, it becomes much easier to make decisions regarding the composition of each server we are building.

Make no mistake! Much of this planning will rely on worksheets, caveats, and compromise. Some of our choices will have several expensive options, and we will have to weigh the benefits offered against our total cost outlay. We want to build something stable, which is not always easy. Depending on the size of our company, our purchasing power, and available hosting choices, we may be in for a rather complicated path to that goal.

This chapter will attempt to paint a complete picture of a highly available environment in such a way that you can pick and choose the best solution without making too many detrimental compromises. Of course, we'll offer advice to what we believe is the best overall solution, but you don't always have to take our word for it.

Note

For the purposes of this chapter, we will not cover cloud computing or other elastic allocation options. Many of the concepts we introduce can be adapted to those solutions, yet many are implementation-specific. If you want to use a cloud vendor such as Amazon or Rackspace, you will need to obtain manuals and appropriate materials for applying what you learn here.

Planning for redundancy

Redundancy means having a spare. A spare for what? Everything. Every single part, from motherboard to chassis, power supply to network cable, disk space to throughput, should have at least one piece of excess equipment or capacity available for immediate use. Let's go through as many of these as we can imagine, before we do anything that might depend on something we bought.

Getting ready

Fire up your favorite spreadsheet program; we'll be using it to keep track of all the parts that go into the server, and any capacity concerns. If you don't have one, Open Office and Libre Office are good free alternatives for building these spreadsheets. Subsequent sections will help determine most of the row contents.

How to do it...

We simply need to produce a hardware spreadsheet to track our purchase needs. We can do that with the following steps:

Create a new spreadsheet for parts and details.

Create a heading row with the following columns:

Type

Capacity

Supplier

Price

Count

Total cost

Create a new row for each type of the following components:

Chassis

CPU

Hard Drive (3.5")

Hard Drive (2.5")

Hard Drive (SSD)

Motherboard

Network Card

Power Supply

RAID Controller

RAM

SAN

In the Chassis row, under the Total cost column, enter the following formula: =D2*E2

Copy and paste the formula into the Total Cost column for all the rows we created. The end result should look something like the following screenshot:

How it works...

What we've done is prepare a spreadsheet that we can fill in with information collected from the rest of this chapter. We will have very long discussions regarding each part of the server we want to build, so we need a place to collect each decision we make along the way.

The heading column can include any other details you wish to retain about each part, but for the sake of simplicity, we are stuck to the bare minimum. This also goes for the parts we chose for each column. Depending on the vendor you select to supply your server, many of these decisions will already be made. It's still a good idea to include each component in case you need an emergency replacement.

The Total Cost column exists for one purpose: to itemize the cost of each part, multiplied by how many we will need to complete the server.

Tip

To make sure we account for the redundancy element of the spreadsheet, we strongly suggest inflating the number you use for the Count column, which will also increase the price automatically. This helps so we automatically include extra capacity in case something fails. If you would rather track this separately, add a Spare Count column to the spreadsheet instead.

We'll have discussions later as to failure rates of different types of hardware, which will influence how many excess components to allocate. Don't worry about that for now.

There's more...

It's also a very good idea to include a summary for all of our Total Cost columns, so we get an aggregate cost estimate for the whole server. To do that with our spreadsheet example, keep in mind that the Total Cost column is listed as column F.

To add a Sum Total column to your spreadsheet on row 15, column F, enter the formula =SUM(F2:F12). If you've added more columns, substitute for column F whichever column now represents the Total Cost. Likewise, if you have more than 13 rows of different parts, use a different row to represent your summary price than row 15.

See also

There are a lot of spreadsheet options available. Many corporations supply a copy of Microsoft Excel. However, if this is not the case, there are many alternatives as follows:

Google Docs: http://docs.google.com/

Open Office: http://www.openoffice.org/

Libre Office: http://www.libreoffice.org/

All of these options are free to use and popular enough that support and documentation are readily available.

Having enough IOPS

IOPS stands for Input/Output Operations Per Second. Essentially, this describes how many operations a device can perform per second before it should be considered saturated. If a device is saturated, further requests must wait until the device has a spare bandwidth. A server overwhelmed with requests can amount to seconds, minutes, or even hours of delayed results.

Depending on application timeout settings and user patience, a device with low IOPS appears as a bottleneck that reduces both system responsiveness and the perception of quality. A database with insufficient IOPS to service queries in a timely manner is unavailable for all intents and purposes. It doesn't matter if PostgreSQL is still available and serving results in this scenario, as its availability has already suffered. We are trying to build a highly available database, and to do so, we need to build a server with enough performance to survive daily operation. In addition, we must overprovision for unexpected surges in popularity, and account for future storage and throughput needs based on monthly increases in storage utilization.

Getting ready

This process is more of a thought experiment. We will present some very rough estimates of IO performance for many different disk types. For each, we should increment entries in our hardware spreadsheet based on perceived need.

The main things we will need for this process are numbers. During development, applications commonly have a goal, expected client count, table count, estimated growth rates, and so on. Even if we have to guess for many of these, they will all contribute to our IOPS requirements. Have these numbers ready, even if they're simply guesses.

Tip

If the application already exists on a development or stage environment, try to get the development or QA team to run operational tests. This is a great opportunity to gather statistics before choosing potential production

Enjoying the preview?
Page 1 of 1