PostgreSQL 9 High Availability Cookbook
5/5
()
About this ebook
Related to PostgreSQL 9 High Availability Cookbook
Related ebooks
PostgreSQL 9 Administration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsMicrosoft SQL Server 2012 Performance Tuning Cookbook Rating: 0 out of 5 stars0 ratingsOracle Database 11g R2 Performance Tuning Cookbook Rating: 0 out of 5 stars0 ratingsPentaho Data Integration Cookbook - Second Edition Rating: 0 out of 5 stars0 ratingsSQL Server 2014 with PowerShell v5 Cookbook Rating: 0 out of 5 stars0 ratingsTalend Open Studio Cookbook Rating: 2 out of 5 stars2/5PowerCLI Cookbook Rating: 0 out of 5 stars0 ratingsInstant Pentaho Data Integration Kitchen Rating: 0 out of 5 stars0 ratingsIntroducing Microsoft SQL Server 2019: Reliability, scalability, and security both on premises and in the cloud Rating: 0 out of 5 stars0 ratingsLearn T-SQL Querying: A guide to developing efficient and elegant T-SQL code Rating: 0 out of 5 stars0 ratingsGoogle Cloud Platform Complete Self-Assessment Guide Rating: 1 out of 5 stars1/5PostgreSQL 9 Administration Cookbook LITE: Configuration, Monitoring and Maintenance Rating: 3 out of 5 stars3/5High Availability MySQL Cookbook Rating: 0 out of 5 stars0 ratingsAWS Organizations Second Edition Rating: 0 out of 5 stars0 ratingsOracle Database 12c Release 2 Testing Tools and Techniques for Performance and Scalability Rating: 5 out of 5 stars5/5Building Web Services with Microsoft Azure Rating: 0 out of 5 stars0 ratingsInstant PostgreSQL Backup and Restore How-to Rating: 0 out of 5 stars0 ratingsApache Hive Cookbook Rating: 0 out of 5 stars0 ratingsApache Hive Essentials Rating: 0 out of 5 stars0 ratingsOracle GoldenGate 12c Implementer's Guide Rating: 0 out of 5 stars0 ratingsBMC Control-M 7: A Journey from Traditional Batch Scheduling to Workload Automation Rating: 0 out of 5 stars0 ratingsAzure Data Engineering Cookbook: Design and implement batch and streaming analytics using Azure Cloud Services Rating: 0 out of 5 stars0 ratingsHDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsMastering PostgreSQL 9.6 Rating: 0 out of 5 stars0 ratingsLearning Azure DocumentDB Rating: 0 out of 5 stars0 ratingsMicrosoft Azure A Complete Guide - 2019 Edition Rating: 1 out of 5 stars1/5MySQL Admin Cookbook LITE: Replication and Indexing Rating: 4 out of 5 stars4/5PostgreSQL High Performance Cookbook Rating: 0 out of 5 stars0 ratingsGetting Started with Talend Open Studio for Data Integration Rating: 0 out of 5 stars0 ratingsAzure Databricks A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratings
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5Get Into UX: A foolproof guide to getting your first user experience job Rating: 4 out of 5 stars4/5Black Holes: The Key to Understanding the Universe Rating: 5 out of 5 stars5/5Python Machine Learning By Example Rating: 4 out of 5 stars4/5Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit Rating: 4 out of 5 stars4/5The Alignment Problem: How Can Machines Learn Human Values? Rating: 4 out of 5 stars4/5Prompt Engineering ; The Future Of Language Generation Rating: 3 out of 5 stars3/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I. Rating: 4 out of 5 stars4/5Deep Learning with PyTorch Rating: 5 out of 5 stars5/5Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis Rating: 0 out of 5 stars0 ratingsComputer Science I Essentials Rating: 5 out of 5 stars5/5Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis Rating: 0 out of 5 stars0 ratings
Reviews for PostgreSQL 9 High Availability Cookbook
2 ratings1 review
- Rating: 5 out of 5 stars5/5PostgreSQL 9 High Availability Cookbook is a very well written book whose primary audience are experienced DBAs and system engineers who want to take their PostgreSQL skills to the next level by diving into the details of building highly available PostgreSQL based systems. Reading this book is like drinking from a fire hose, the signal-to-noise ratio is very high; in other words, every single page is packed with important, critical, and very practical information. As a consequence, this also means that the book is not for newbies: not only you have to know the fundamental aspects of PostgreSQL from a database administrator’s point of view, but you also need to have solid GNU/Linux system administration background.One of the strongest aspects of the book is the author’s principled and well-structured engineering approach to building a highly available PostgreSQL system. Instead of jumping to some recipes to be memorized, the book teaches you basic but very important principles of capacity planning. More importantly, this planning of servers and networking is not only given as a good template, but the author also explains the logic behind it, as well as drawing attention to the reason behind the heuristics he use and why some magic numbers are taken as a good estimate in case of lack of more case-specific information. This style is applied very consistently throughout the book, each recipe is explained so that you know why you do something in addition to how you do it.After the first chapter on basic planning, the author jumps to a set of miscellaneous topics in the Chapter 2, and details some important tricks such as defusing cache poisoning, concurrent indexes, and Linux kernel tweaks. This chapter starts to reveal another valuable aspect of the book: the information regarding an open source RDBMS such as PostgreSQL is freely available on the Internet, but depending on your needs, a particular set of information can very well be scattered over a lot of e-mail list messages, forum posts, Wiki pages, etc., and it takes a disciplined mind with a lot of field experience to put all of that scattered information into a single, consistent, logical and easy to follow form.Starting from Chapter 3, each chapter explores a single topic in a lot of practical detail, starting with connection pooling. This chapter, as well as almost all of the remaining ones has a nice feature: the author always try to explain alternative solutions, describes their advantages and disadvantages, and where possible shows how to combine some alternatives to get best of each.Chapters 4 and 5, namely Troubleshooting and Monitoring can be thought as a single chapter, because it is difficult to think these fundamental concepts separately. These chapters are also not only valuable for PostgreSQL DBAs but for any DBA or any GNU/Linux system administrator in general. Troubleshooting and monitoring a highly available database requires a book by itself, but since this book’s scope is clearly defined, the author provides enough background and practical starting points in about 70 pages.I can easily say that Chapter 6: Replication, together with Chapter 7: Replication Management Tools starts to form the ‘meat’ of the book; without successfully implementing and practically managing the replication of your critical database servers, it is impossible to think about building a highly available system, in other words, you need at least one replica of your database system, so that if your primary system goes down, you can very easily switch to your replica (or offload some of your less criticial applications to your replica and relive the stress on your primary system). These two chapters presents you the solid and practical information to achieve that goal. Similar to the previous chapters, the author shows and explains many useful and practical tools, he also does not refrain from presenting an open source tool, walctl, that he developed to as a “PostgreSQL WAL management system that pushes or pulls WAL files from a remote central storage server”. I consider another positive point for the book because it clearly indicates the serious time investment of the author for PostgreSQL and its high availability configuration.Chapter 8: Advanced Stack, is aptly named, because this chapter, together with Chapter 9: Cluster Control, forms the most advanced and complex part of the book. The author’s warnings regarding the information density, and related real-life complexity of the topics explained in these two chapters should not be taken lightly. Indeed, there are many combinations of events that can lead to subtle and hard to debug errors in case of clusters set up to take over from failing nodes. Creating such a highly available system with Linux based tools such LVM, XFS, DRBD, Pacemaker, and Corosync requires careful planning, probably experimenting in a safe virtual environment, and then a disciplined execution, as well as monitoring. Again, these chapters alone include topics that can take a volume, and a detailed training by themselves, and I think the author kept a good balance between depth and breadth.Final chapter, Data Distribution, can be considered as a bonus chapter that briefly shows setting up a PostgreSQL server, dealing with foreign tables, managing shards, creating a scalable nextval replacement, and relevant tips and tricks.There are not many negative sides to this very dense PostgreSQL book. A few minor points that deserves mention are its focus on the most popular Linux distributions such as Red Hat, Debian and their derivatives (FreeBSD and other BSD admins will require slightly more effort), some obsolete networking command usage such as ifconfig instead of ip (but then again, this might be helpful for FreeBSD admins), and inconsistent use of command outputs (sometimes no output is shown, whereas for some commands screen-shots or textual outputs are used inconsistently). One might also argue for a slight reordering of chapters for pedagogical concerns, but then again this is highly open to debate and one’s particular preferences when it comes to system and database administration.I can recommend PostgreSQL 9 High Availability Cookbook without hesitation to PostgreSQL DBAs who want to push their skill to the next level, and learn the fundamentals of building highly available PostgreSQL based database clusters. It certainly will not be as easy as reading a book, but it is good to know such a book exists as a very good guide.
Book preview
PostgreSQL 9 High Availability Cookbook - Shaun M. Thomas
Table of Contents
PostgreSQL 9 High Availability Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why Subscribe?
Free Access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Hardware Planning
Introduction
Planning for redundancy
Getting ready
How to do it...
How it works...
There's more...
See also
Having enough IOPS
Getting ready
How to do it...
How it works...
There's more...
A working example
Making concessions
Sizing storage
Getting ready
How to do it...
How it works...
There's more...
Real-world example
Adjusting the numbers
Incorporate the spreadsheet
Investing in a RAID
Getting ready
How to do it...
How it works...
There's more...
See also
Picking a processor
Getting ready
How to do it...
How it works...
There's more...
Hyperthreading
Turbo Boost
Power usage
See also
Making the most of memory
Getting ready
How to do it...
How it works...
There's more...
Exploring nimble networking
Getting ready
How to do it...
How it works...
There's more...
A networking example
Remember redundancy
Save the research
See also
Managing motherboards
Getting ready
How to do it...
How it works...
There's more...
See also
Selecting a chassis
Getting ready
How to do it...
How it works...
There's more...
Saddling up to a SAN
Getting ready
How to do it...
How it works...
There's more...
See also
Tallying up
Getting ready
How to do it...
How it works...
There's more...
Protecting your eggs
Getting ready
How to do it...
How it works...
There's more...
2. Handling and Avoiding Downtime
Introduction
Determining acceptable losses
Getting ready
How to do it...
How it works...
Configuration – getting it right the first time
Getting ready
How to do it...
How it works...
There's more...
See also
Configuration – managing scary settings
Getting ready
How to do it...
How it works...
There's more...
Distinct settings
More information
See also
Identifying important tables
Getting ready
How to do it...
How it works...
There's more...
Reset stats
Use pgstattuple
See also
Defusing cache poisoning
Getting ready
How to do it...
How it works...
See also
Exploring the magic of virtual IPs
Getting ready
How to do it...
How it works...
There's more...
Terminating rogue connections
Getting ready
How to do it...
How it works...
There's more...
Reducing contention with concurrent indexes
Getting ready
How to do it...
How it works...
There's more...
No transactions
One at a time
Danger with OLTP use
See also
Managing system migrations
Getting ready
How to do it...
How it works...
There's more...
See also
Managing software upgrades
Getting ready
How to do it...
How it works...
There's more...
See also
Mitigating the impact of hardware failure
Getting ready
How to do it...
How it works...
There's more...
Copy WAL files more easily
Add compression
Secondary delay
See also
Applying bonus kernel tweaks
Getting ready
How to do it...
How it works...
There's more...
3. Pooling Resources
Introduction
Determining connection costs and limits
Getting ready
How to do it...
How it works...
There's more...
Installing PgBouncer
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring PgBouncer safely
Getting ready
How to do it...
How it works...
There's more...
What about pool_mode?
Problems with prepared statements
See also
Connecting to PgBouncer
Getting ready
How to do it...
How it works...
See also
Listing PgBouncer server connections
Getting ready
How to do it...
How it works...
There's more...
See also
Listing PgBouncer client connections
Getting ready
How to do it...
How it works...
There's more...
See also
Evaluating PgBouncer pool health
Getting ready
How to do it...
How it works...
There's more...
See also
Installing pgpool
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring pgpool for master/slave mode
Getting ready
How to do it...
How it works...
There's more...
See also
Testing a write query on pgpool
Getting ready
How to do it...
How it works...
There's more...
Swapping active nodes with pgpool
Getting ready
How to do it...
How it works...
There's more...
See also
Combining the power of PgBouncer and pgpool
Getting ready
How to do it...
How it works...
There's more...
4. Troubleshooting
Introduction
Performing triage
Getting ready
How to do it...
How it works...
There's more...
Installing common statistics packages
Getting ready
How to do it...
How it works...
See also
Evaluating the current disk performance with iostat
Getting ready
How to do it...
How it works...
There's more...
See also
Tracking I/O-heavy processes with iotop
Getting ready
How to do it...
How it works...
There's more...
See also
Viewing past performance with sar
Getting ready
How to do it...
How it works...
There's more...
See also
Correlating performance with dstat
Getting ready
How to do it...
How it works...
See also
Interpreting /proc/meminfo
Getting ready
How to do it...
How it works...
There's more...
See also
Examining /proc/net/bonding/bond0
Getting ready
How to do it...
How it works...
See also
Checking the pg_stat_activity view
Getting ready
How to do it...
How it works...
There's more...
See also
Checking the pg_stat_statements view
Getting ready
How to do it...
How it works...
There's more...
Reset the stats
Catch more queries
See also
Debugging with strace
Getting ready
How to do it...
How it works...
There's more...
See also
Logging checkpoints properly
Getting ready
How to do it...
How it works...
There's more...
See also
5. Monitoring
Introduction
Figuring out what to monitor
Getting ready
How to do it...
How it works...
There's more...
Installing and configuring Nagios
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring Nagios to monitor a database host
Getting ready
How to do it...
How it works...
There's more...
See also
Enhancing Nagios with check_mk
Getting ready
How to do it...
How it works...
There's more...
See also
Getting to know check_postgres
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring collectd
Getting ready
How to do it...
How it works...
See also
Adding a custom PostgreSQL monitor to collectd
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring Graphite
Getting ready
How to do it...
How it works...
See also
Adding collectd data to Graphite
Getting ready
How to do it...
How it works...
See also
Building a graph in Graphite
Getting ready
How to do it...
How it works...
There's more...
Customizing a Graphite graph
Getting ready
How to do it...
How it works...
Creating a Graphite dashboard
Getting ready
How to do it...
How it works...
There's more...
6. Replication
Introduction
Deciding what to copy
Getting ready
How to do it...
How it works...
Securing the WAL stream
Getting ready
How to do it...
How it works...
There's more...
See also
Setting up a hot standby
Getting ready
How to do it...
How it works...
See also
Upgrading to asynchronous replication
Getting ready
How to do it...
How it works...
There's more...
See also
Bulletproofing with synchronous replication
Getting ready
How to do it...
How it works...
There's more...
See also
Faking replication with pg_receivexlog
Getting ready
How to do it...
How it works...
See also
Setting up Slony
Getting ready
How to do it...
How it works...
See also
Copying a few tables with Slony
Getting ready
How to do it...
How it works...
There's more...
See also
Setting up Bucardo
Getting ready
How to do it...
How it works...
See also
Copying a few tables with Bucardo
Getting ready
How to do it...
How it works...
See also
Setting up Londiste
Getting ready
How to do it...
How it works...
See also
Copying a few tables with Londiste
Getting ready
How to do it...
How it works...
See also
7. Replication Management Tools
Introduction
Deciding when to use third-party tools
Getting ready
How to do it...
How it works...
There's more...
Installing and configuring Barman
Getting ready
How to do it...
How it works...
See also
Backing up a database with Barman
Getting ready
How to do it...
How it works...
There's more...
See also
Restoring a database with Barman
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring OmniPITR
Getting ready
How to do it...
How it works...
See also
Managing WAL files with OmniPITR
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring repmgr
Getting ready
How to do it...
How it works...
See also
Cloning a database with repmgr
Getting ready
How to do it...
How it works...
There's more...
See also
Swapping active nodes with repmgr
Getting ready
How to do it...
How it works...
There's more...
See also
Installing and configuring walctl
Getting ready
How to do it...
How it works...
See also
Cloning a database with walctl
Getting ready
How to do it...
How it works...
Managing WAL files with walctl
Getting ready
How to do it...
How it works...
There's more...
8. Advanced Stack
Introduction
Why DRBD?
Why LVM?
Why XFS?
The stack
Preparing systems for the stack
Getting ready
How to do it...
How it works...
There's more...
See also
Getting started with the Linux Volume Manager
Getting ready
How to do it...
How it works...
There's more...
See also
Adding block-level replication
Getting ready
How to do it...
How it works...
See also
Incorporating the second LVM layer
Getting ready
How to do it...
How it works...
There's more...
See also
Verifying a DRBD filesystem
Getting ready
How to do it...
How it works...
See also
Correcting a DRBD split brain
Getting ready
How to do it...
How it works...
See also
Formatting an XFS filesystem
Getting ready
How to do it...
How it works...
See also
Tweaking XFS performance
Getting ready
How to do it...
How it works...
There's more...
See also
Maintaining an XFS filesystem
Getting ready
How to do it...
How it works...
There's more...
See also
Using LVM snapshots
Getting ready
How to do it...
How it works...
See also
Switching live stack systems
Getting ready
How to do it...
How it works...
There's more...
Detaching a problematic node
Getting ready
How to do it...
How it works...
There's more...
See also
9. Cluster Control
Introduction
Before we begin...
Installing the components
Getting ready
How to do it...
How it works...
There's more...
See also
Configuring Corosync
Getting ready
How to do it...
How it works...
See also
Preparing startup services
Getting ready
How to do it...
How it works...
There's more...
See also
Starting with base options
Getting ready
How to do it...
How it works...
There's more...
See also
Adding DRBD to cluster management
Getting ready
How to do it...
How it works...
There's more...
See also
Adding LVM to cluster management
Getting ready
How to do it...
How it works...
There's more...
Adding XFS to cluster management
Getting ready
How to do it...
How it works...
Adding PostgreSQL to cluster management
Getting ready
How to do it...
How it works...
There's more...
Adding a virtual IP to hide the cluster
Getting ready
How to do it...
How it works...
Adding an e-mail alert
Getting ready
How to do it...
How it works...
Grouping associated resources
Getting ready
How to do it...
How it works...
Combining and ordering related actions
Getting ready
How to do it...
How it works...
Performing a managed resource migration
Getting ready
How to do it...
How it works...
There's more...
Using an outage to test migration
Getting ready
How to do it...
How it works...
There's more...
10. Data Distribution
Introduction
Identifying horizontal candidates
Getting ready
How to do it...
How it works...
There's more...
See also
Setting up a foreign PostgreSQL server
Getting ready
How to do it...
How it works...
There's more...
Altering foreign servers
Dropping foreign servers
See also
Mapping a remote user
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a foreign table
Getting ready
How to do it...
How it works...
There's more...
See also
Using a foreign table in a query
Getting ready
How to do it...
How it works...
There's more...
Optimizing foreign table access
Getting ready
How to do it...
How it works...
There's more...
Transforming foreign tables into local tables
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a scalable nextval replacement
Getting ready
How to do it...
How it works...
There's more...
Building a sharding API
Getting ready
How to do it...
How it works...
There's more...
See also
Talking to the right shard
Getting ready
How to do it...
How it works...
There's more...
Create a cache
Choose an application data to logical shard mapping
Moving a shard to another server
Getting ready
How to do it...
How it works...
There's more...
Index
PostgreSQL 9 High Availability Cookbook
PostgreSQL 9 High Availability Cookbook
Copyright © 2014 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2014
Production reference: 1100714
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-84951-696-9
www.packtpub.com
Cover image by Pratyush (<[email protected]>)
Credits
Author
Shaun M. Thomas
Reviewers
Hans-Jürgen Schönig
Sheldon E. Strauch
Vasilis Ventirozos
Tomas Vondra
Acquisition Editors
Anthony Albuquerque
Harsha Bharwani
Content Development Editor
Sriram Neelakantan
Technical Editor
Tanvi Bhatt
Copy Editors
Janbal Dharmaraj
Sayanee Mukherjee
Karuna Narayanan
Project Coordinator
Kartik Vedam
Proofreaders
Maria Gould
Ameesha Green
Paul Hindle
Indexers
Hemangini Bari
Tejal Soni
Priya Subramani
Graphics
Sheetal Aute
Disha Haria
Abhinash Sahu
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa
About the Author
Shaun M. Thomas has been working with PostgreSQL since late 2000. He is a frequent contributor to the PostgreSQL Performance and General mailing lists, assisting other DBAs with the knowledge he's gained over the years. In 2011 and 2012, he gave presentations at the Postgres Open conference on topics such as handling extreme throughput, high availability, server redundancy, and failover techniques. Most recently, he has contributed the Shard Manager extension and the walctl WAL management suite.
Currently, he serves as the database architect at OptionsHouse, an online options brokerage with a PostgreSQL cluster that handles almost 2 billion queries per day. Many of the techniques used in this book were developed specifically for this extreme environment.
He believes that PostgreSQL has a stupendous future ahead, and he can't wait to see the advancements subsequent versions will bring.
I'd like to thank my wife, Jennifer, for putting up with the weeks of long nights and for providing the encouragement I needed to get it all done. This book is my thank you letter to the PostgreSQL community, which helped me out of jams more times than I can count. I'd also like to thank OptionsHouse for putting me in charge of the busiest database I'd ever seen, forcing me to learn enough to keep it all running smoothly.
About the Reviewers
Hans-Jürgen Schönig is the founder and CEO of Cybertec Schönig & Schönig GmbH (www.postgresql-support.de), a company that focuses on PostgreSQL support, training, and consulting, as well as on scalable PostgreSQL solutions.
He has 15 years of experience in the field of PostgreSQL and has written several books that deal with PostgreSQL in the past couple of years.
Sheldon E. Strauch is a 20 year veteran of software consulting at companies such as IBM, Sears, Ernst & Young, and Kraft Foods. He has a Bachelor's degree in Business Administration and leverages his technical skills to improve business self-awareness. His interests include data gathering, management, and mining; maps and mapping; business intelligence; and application of data analysis for continuous improvement. He is currently focused on the development of end-to-end data management and mining at Enova International, a financial services company located in Chicago. In his spare time, he enjoys the performing arts, particularly music, and traveling with his wife Marilyn.
Vasilis Ventirozos has been working with databases for more than a decade on mission critical applications for companies in both the telecom and lottery industries. While he has worked with a number of database technologies, he considers PostgreSQL his database of choice. He currently works at OmniTI, a full-stack IT services company focused on highly scalable web infrastructure, providing PostgreSQL-related consulting and management.
Tomas Vondra has been working with PostgreSQL since 2003; although he's been working with various other databases since then (both open source and commercial), he instantly fell in love with PostgreSQL and the wonderful community built around it.
He is currently working at GoodData, a company that operates a BI cloud platform built on PostgreSQL, as a performance specialist
and is mainly responsible for tracking and improving performance. In his free time, he's usually writing PostgreSQL extensions and patches or hacking something else related to PostgreSQL.
www.PacktPub.com
Support files, eBooks, discount offers, and more
You might want to visit www.PacktPub.com for support files and downloads related to your book.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
http://PacktLib.PacktPub.com
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can access, read and search across Packt's entire library of books.
Why Subscribe?
Fully searchable across every book published by Packt
Copy and paste, print and bookmark content
On demand and accessible via web browser
Free Access for Packt account holders
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.
Preface
Welcome to PostgreSQL 9 High Availability Cookbook! As a database, PostgreSQL is beginning to take its place in the world of high transaction rates and very large data installations. With this comes an increasing demand for PostgreSQL to act as a critical piece of infrastructure. System outages in these environments can be spectacularly costly and demand a higher caliber of management and tooling.
It is the job of a DBA to ensure that the database is always available for application demands and clients' needs. Yet, this is extremely difficult to accomplish without the necessary skills and experience with a common operating system and PostgreSQL tools. Installing, configuring, and optimizing a PostgreSQL cluster is a tiny fraction of the process. We also need to know how to find and recognize problems, manage a swarm of logical and physical replicas, and scale to increasing demands, all while preventing or mitigating system outages.
This book is something the author wishes existed 10 years ago. Back then, there were no recipes to follow to build a fault-tolerant PostgreSQL cluster; we had to improvise. It is our aim to prevent other DBAs from experiencing the kind of frustration borne from reinventing the wheel. We've done all the hard work, taken notes, outlined everything we've ever learned about keeping PostgreSQL available and written it all down in here.
We hope you find this book useful and relevant; it is the product of years of trial, error, testing, and a large amount of input from the PostgreSQL community.
What this book covers
Chapter 1, Hardware Planning, sets the tone by covering the part that the appropriate hardware selection plays in a successful PostgreSQL cluster of any size.
Chapter 2, Handling and Avoiding Downtime, provides safe settings and defaults for a stable cluster and explains the basic techniques for responding to mishaps.
Chapter 3, Pooling Resources, presents PgBouncer and pgpool, two tools geared toward controlling PostgreSQL connections. Together, these can provide an abstraction layer to reduce the effect of outages and increase system performance.
Chapter 4, Troubleshooting, introduces a battery of common Unix and Linux tools and resources that can collect valuable diagnostic information. It also includes a couple of PostgreSQL views that can assist in finding database problems.
Chapter 5, Monitoring, further increases availability by adding Nagios, check_mk, collectd, and Graphite to watch active PostgreSQL clusters. This chapter helps us stay informed, and find potential problems before they happen.
Chapter 6, Replication, discusses several PostgreSQL replication scenarios and techniques for more durable data. This includes logical replication tools such as Slony, Bucardo, and Londiste.
Chapter 7, Replication Management Tools, brings WAL management to the forefront. It talks about integrating Barman, OmniPITR, repmgr, or walctl into PostgreSQL to further prevent data loss and control complicated multiserver clusters.
Chapter 8, Advanced Stack, explains how to use LVM, DRBD, and XFS to build a solid foundation and keep data on two servers simultaneously to prevent costly outages.
Chapter 9, Cluster Control, incorporates Pacemaker into the advanced stack. Fully automate PostgreSQL server migrations in case of impending maintenance or hardware failure.
Chapter 10, Data Distribution, shows how PostgreSQL features such as foreign data wrappers and materialized views can produce a scalable cluster. Included with this chapter is a simple data-sharding API technique to reduce dependency on a single PostgreSQL server.
What you need for this book
This book is written for Unix systems with a focus on Linux in particular. Such servers have become increasingly popular to host databases for companies both large and small. As such, we highly recommend that you have a virtual machine or development system running a recent copy of Debian, Ubuntu, Red Hat Enterprise Linux, or a variant such as CentOS or Scientific Linux.
You will also need a copy of PostgreSQL. If your chosen Linux distribution isn't keeping the included PostgreSQL packages sufficiently up to date, the PostgreSQL website maintains binaries for most popular distributions. You can find these at http://www.postgresql.org/download/.
Users of Red Hat Enterprise Linux and its variants should refer to the following URL to add the official PostgreSQL YUM repository to important database systems: http://yum.postgresql.org/repopackages.php.
Users of Debian, Ubuntu, Mint, and other related Linux systems should refer to the PostgreSQL APT wiki page at the following URL instead: https://wiki.postgresql.org/wiki/Apt.
Be sure to include any contrib
packages in your installation. They include helpful utilities and database extensions we will use in some recipes.
Users of BSD should still be able to follow along with these recipes. Some commands may require slight alterations to run properly on BSD, so be sure to understand the intent before executing them. Otherwise, all commands have been confirmed to work on BASH and recent GNU tools.
Who this book is for
This book is written for PostgreSQL DBAs who want an extremely fault-tolerant database cluster. While PostgreSQL is suitable for enterprise environments, there are a lot of tertiary details even a skilled DBA might not know. We're here to fill in those gaps.
There is a lot of material here for all levels of DBA. The primary assumption is that you are comfortable with a Unix command line and maintain at least some regular exposure to PostgreSQL as a DBA or system administrator.
If you've ever experienced a database outage, restored from a backup, or spent hours trying to repair a malfunctioning cluster, we have material that covers all these scenarios. This book holds the key to managing a robust PostgreSQL cluster environment and should be of use to anyone in charge of a critical piece of database infrastructure.
Sections
This book contains the following sections:
Getting ready
This section tells us what to expect in the recipe, and describes how to set up any software or any preliminary settings needed for the recipe.
How to do it…
This section characterizes the steps to be followed for cooking
the recipe.
How it works…
This section usually consists of a brief and detailed explanation of what happened in the previous section.
There's more…
It consists of additional information about the recipe in order to make the reader more anxious about the recipe.
See also
This section may contain references to the recipe.
Conventions
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: The final query is a bit more complicated since it uses a CASE statement.
A block of code is set as follows:
SELECT name, setting
FROM pg_settings
WHERE context = 'postmaster';
Any command-line input or output is written as follows:
sudo apt-get install postgresql-9.3-pgfincore
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: Click on the Dashboard link on the top menu bar.
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
Questions
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
Chapter 1. Hardware Planning
In this chapter, we will learn about selection and provisioning of hardware necessary to build a highly available PostgreSQL database. We will cover the following recipes in this chapter:
Planning for redundancy
Having enough IOPS
Sizing storage
Investing in a RAID
Picking a processor
Making the most of memory
Exploring nimble networking
Managing motherboards
Selecting a chassis
Saddling up to a SAN
Tallying up
Protecting your eggs
Introduction
What does high availability mean? In the context of what we're trying to build, it means we want our database to start and remain online for as long as possible. A critical component of this is the hardware that hosts the database itself. No matter how perfect a machine and its parts may be, failure or unexpected behavior along any element can result in an outage.
So how do we avoid these unwanted outages? Expect them. We must start by assuming hardware can and will fail, and at the worst possible moment. If we start with that in mind, it becomes much easier to make decisions regarding the composition of each server we are building.
Make no mistake! Much of this planning will rely on worksheets, caveats, and compromise. Some of our choices will have several expensive options, and we will have to weigh the benefits offered against our total cost outlay. We want to build something stable, which is not always easy. Depending on the size of our company, our purchasing power, and available hosting choices, we may be in for a rather complicated path to that goal.
This chapter will attempt to paint a complete picture of a highly available environment in such a way that you can pick and choose the best solution without making too many detrimental compromises. Of course, we'll offer advice to what we believe is the best overall solution, but you don't always have to take our word for it.
Note
For the purposes of this chapter, we will not cover cloud computing or other elastic allocation options. Many of the concepts we introduce can be adapted to those solutions, yet many are implementation-specific. If you want to use a cloud vendor such as Amazon or Rackspace, you will need to obtain manuals and appropriate materials for applying what you learn here.
Planning for redundancy
Redundancy means having a spare. A spare for what? Everything. Every single part, from motherboard to chassis, power supply to network cable, disk space to throughput, should have at least one piece of excess equipment or capacity available for immediate use. Let's go through as many of these as we can imagine, before we do anything that might depend on something we bought.
Getting ready
Fire up your favorite spreadsheet program; we'll be using it to keep track of all the parts that go into the server, and any capacity concerns. If you don't have one, Open Office and Libre Office are good free alternatives for building these spreadsheets. Subsequent sections will help determine most of the row contents.
How to do it...
We simply need to produce a hardware spreadsheet to track our purchase needs. We can do that with the following steps:
Create a new spreadsheet for parts and details.
Create a heading row with the following columns:
Type
Capacity
Supplier
Price
Count
Total cost
Create a new row for each type of the following components:
Chassis
CPU
Hard Drive (3.5")
Hard Drive (2.5")
Hard Drive (SSD)
Motherboard
Network Card
Power Supply
RAID Controller
RAM
SAN
In the Chassis row, under the Total cost column, enter the following formula: =D2*E2
Copy and paste the formula into the Total Cost column for all the rows we created. The end result should look something like the following screenshot:
How it works...
What we've done is prepare a spreadsheet that we can fill in with information collected from the rest of this chapter. We will have very long discussions regarding each part of the server we want to build, so we need a place to collect each decision we make along the way.
The heading column can include any other details you wish to retain about each part, but for the sake of simplicity, we are stuck to the bare minimum. This also goes for the parts we chose for each column. Depending on the vendor you select to supply your server, many of these decisions will already be made. It's still a good idea to include each component in case you need an emergency replacement.
The Total Cost column exists for one purpose: to itemize the cost of each part, multiplied by how many we will need to complete the server.
Tip
To make sure we account for the redundancy element of the spreadsheet, we strongly suggest inflating the number you use for the Count column, which will also increase the price automatically. This helps so we automatically include extra capacity in case something fails. If you would rather track this separately, add a Spare Count column to the spreadsheet instead.
We'll have discussions later as to failure rates of different types of hardware, which will influence how many excess components to allocate. Don't worry about that for now.
There's more...
It's also a very good idea to include a summary for all of our Total Cost columns, so we get an aggregate cost estimate for the whole server. To do that with our spreadsheet example, keep in mind that the Total Cost column is listed as column F.
To add a Sum Total column to your spreadsheet on row 15, column F, enter the formula =SUM(F2:F12). If you've added more columns, substitute for column F whichever column now represents the Total Cost. Likewise, if you have more than 13 rows of different parts, use a different row to represent your summary price than row 15.
See also
There are a lot of spreadsheet options available. Many corporations supply a copy of Microsoft Excel. However, if this is not the case, there are many alternatives as follows:
Google Docs: http://docs.google.com/
Open Office: http://www.openoffice.org/
Libre Office: http://www.libreoffice.org/
All of these options are free to use and popular enough that support and documentation are readily available.
Having enough IOPS
IOPS stands for Input/Output Operations Per Second. Essentially, this describes how many operations a device can perform per second before it should be considered saturated. If a device is saturated, further requests must wait until the device has a spare bandwidth. A server overwhelmed with requests can amount to seconds, minutes, or even hours of delayed results.
Depending on application timeout settings and user patience, a device with low IOPS appears as a bottleneck that reduces both system responsiveness and the perception of quality. A database with insufficient IOPS to service queries in a timely manner is unavailable for all intents and purposes. It doesn't matter if PostgreSQL is still available and serving results in this scenario, as its availability has already suffered. We are trying to build a highly available database, and to do so, we need to build a server with enough performance to survive daily operation. In addition, we must overprovision for unexpected surges in popularity, and account for future storage and throughput needs based on monthly increases in storage utilization.
Getting ready
This process is more of a thought experiment. We will present some very rough estimates of IO performance for many different disk types. For each, we should increment entries in our hardware spreadsheet based on perceived need.
The main things we will need for this process are numbers. During development, applications commonly have a goal, expected client count, table count, estimated growth rates, and so on. Even if we have to guess for many of these, they will all contribute to our IOPS requirements. Have these numbers ready, even if they're simply guesses.
Tip
If the application already exists on a development or stage environment, try to get the development or QA team to run operational tests. This is a great opportunity to gather statistics before choosing potential production