Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $9.99/month after trial. Cancel anytime.

Mastering PostgreSQL 9.6
Mastering PostgreSQL 9.6
Mastering PostgreSQL 9.6
Ebook600 pages5 hours

Mastering PostgreSQL 9.6

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book
  • Your one-stop guide to mastering the advanced concepts in PostgreSQL with ease
  • Master query optimization, replication, and high availability with PostgreSQL
  • Extend the functionalities of PostgreSQL to suit your organizational needs with minimum effort
  • Who This Book Is For

    If you are a PostgreSQL data architect or an administrator who wants to understand how to implement advanced functionalities and master complex administrative tasks with PostgreSQL, then this book is perfect for you. Prior experience of administrating a PostgreSQL database and a working knowledge of SQL is required to make the best use of this book.

LanguageEnglish
Release dateMay 30, 2017
ISBN9781783555369
Mastering PostgreSQL 9.6
Author

Hans-Jürgen Schönig

Hans-Jurgen Schonig has 15 years of experience with PostgreSQL.He is the CEO of a PostgreSQL consulting and support company called "Cybertec Schonig & Schonig GmbH" (www.postgresql-support.de),which has successfully served countless customers around the globe. Before founding Cybertec Schonig & Schonig GmbH in the year 2000,he worked as database developer at a private research company focusing on the Austrian labor market where he was primarily focusing on data mining and forecast models. He has written several books dealing with PostgreSQL already.

Read more from Hans Jürgen Schönig

Related to Mastering PostgreSQL 9.6

Related ebooks

Databases For You

View More

Related articles

Reviews for Mastering PostgreSQL 9.6

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Mastering PostgreSQL 9.6 - Hans-Jürgen Schönig

    Title Page

    Mastering PostgreSQL 9.6

    A comprehensive guide for PostgreSQL 9.6 developers and administrators

    Hans-Jürgen Schönig

    BIRMINGHAM - MUMBAI

    Copyright

    Mastering PostgreSQL 9.6

    Copyright © 2017 Packt Publishing

    All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

    Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

    Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

    First published: May 2017

    Production reference: 1250517

    Published by Packt Publishing Ltd.

    Livery Place

    35 Livery Street

    Birmingham 

    B3 2PB, UK.

    ISBN 978-1-78355-535-2

    www.packtpub.com

    Credits

    About the Author

    Hans-Jürgen Schönig has 18 years of experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig & Schönig GmbH (www.postgresql-support.de). It has successfully served countless customers around the globe.

    Before founding Cybertec Schönig & Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.

    About the Reviewer

    Shaun Thomas has been working with PostgreSQL since late 2000. From 2011 and beyond, he's been a frequent presenter at the PostgresOpen conference on topics such as handling extreme throughput, high availability, monitoring, architecture, and automation. He contributed a few PostgreSQL extensions, as well as a tool for administering massive database clusters. On occasion, he's even been known to guest lecture at the local university. His goal is to help the community make PostgreSQL a bigger, better database for everyone to enjoy.

    www.PacktPub.com

    For support files and downloads related to your book, please visit www.PacktPub.com.

    Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

    At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

    https://www.packtpub.com/mapt

    Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

    Why subscribe?

    Fully searchable across every book published by Packt

    Copy and paste, print, and bookmark content

    On demand and accessible via a web browser

    Customer Feedback

    Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1783555351.

    If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

    Table of Contents

    Preface

    What this book covers

    What you need for this book

    Who this book is for

    Conventions

    Reader feedback

    Customer support

    Errata

    Piracy

    Questions

    PostgreSQL Overview

    What is new in PostgreSQL 9.6?

    Understanding new database administration functions

    Killing idle sessions

    Finding more detailed information in pg_stat_activity

    Tracking vaccum progress

    Improving vacuum speed

    Digging into new SQL and developer-related functions

    Using new backup and replication functionality

    Streamlining wal_level and monitoring

    Using multiple synchronous standby servers

    Understanding performance-related features

    Improving relation extensions

    Checkpoint sorting and kernel interaction

    Using more advanced foreign data wrappers

    Introducing parallel queries

    Adding snapshot too old

    Summary

    Understanding Transactions and Locking

    Working with PostgreSQL transactions

    Handling errors inside a transaction

    Making use of savepoints

    Transactional DDLs

    Understanding basic locking

    Avoiding typical mistakes and explicit locking

    Considering alternative solutions

    Making use of FOR SHARE and FOR UPDATE

    Understanding transaction isolation levels

    Considering SSI transactions

    Observing deadlocks and similar issues

    Utilizing advisory locks

    Optimizing storage and managing cleanup

    Configuring VACUUM and autovacuum

    Digging into transaction wraparound-related issues

    A word on VACUUM FULL

    Watching VACUUM at work

    Making use of snapshot too old

    Summary

    Making Use of Indexes

    Understanding simple queries and the cost model

    Making use of EXPLAIN

    Digging into the PostgreSQL cost model

    Deploying simple indexes

    Making use of sorted output

    Using more than one index at a time

    Using bitmap scans effectively

    Using indexes in an intelligent way

    Improving speed using clustered tables

    Clustering tables

    Making use of index only scans

    Understanding additional B-tree features

    Combined indexes

    Adding functional indexes

    Reducing space consumption

    Adding data while indexing

    Introducing operator classes

    Hacking up an operator class for a B-tree

    Creating new operators

    Creating operator classes

    Testing custom operator classes

    Understanding PostgreSQL index types

    Hash indexes

    GiST indexes

    Understanding how GiST works

    Extending GiST

    GIN indexes

    Extending GIN

    SP-GiST indexes

    BRIN indexes

    Extending BRIN indexes

    Adding additional indexes

    Achieving better answers with fuzzy searching

    Taking advantage of pg_trgm

    Speed up LIKE queries

    Handling regular expressions

    Understanding full-text search - FTS

    Comparing strings

    Defining GIN indexes

    Debugging your search

    Gathering word statistics

    Taking advantage of exclusion operators

    Summary

    Handling Advanced SQL

    Introducing grouping sets

    Loading some sample data

    Applying grouping sets

    Investigating performance

    Combining grouping sets with the FILTER clause

    Making use of ordered sets

    Understanding hypothetical aggregates

    Utilizing windowing functions and analytics

    Partitioning data

    Ordering data inside a window

    Using sliding windows

    Abstracting window clauses

    Making use of onboard windowing functions

    rank and dense_rank functions

    ntile() function

    lead() and lag() functions

    first_value(), nth_value(), and last_value() functions

    row_number() function

    Writing your own aggregates

    Creating simple aggregates

    Adding support for parallel queries

    Improving efficiency

    Writing hypothetical aggregates

    Summary

    Log Files and System Statistics

    Gathering runtime statistics

    Working with PostgreSQL system views

    Checking live traffic

    Inspecting databases

    Inspecting tables

    Making sense of pg_stat_user_tables

    Digging into indexes

    Tracking the background worker

    Tracking, archiving, and streaming

    Checking SSL connections

    Inspecting transactions in real time

    Tracking vacuum progress

    Using pg_stat_statements

    Creating log files

    Configuring postgresql.conf file

    Defining log destination and rotation

    Configuring syslog

    Logging slow queries

    Defining what and how to log

    Summary

    Optimizing Queries for Good Performance

    Learning what the optimizer does

    Optimizations by example

    Evaluating join options

    Nested loops

    Hash joins

    Merge joins

    Applying transformations

    Inlining the view

    Flattening subselects

    Applying equality constraints

    Exhaustive searching

    Trying it all out

    Making the process fail

    Constant folding

    Understanding function inlining

    Join pruning

    Speedup set operations

    Understanding execution plans

    Approaching plans systematically

    Making EXPLAIN more verbose

    Spotting problems

    Spotting changes in runtime

    Inspecting estimates

    Inspecting buffer usage

    Fixing high buffer usage

    Understanding and fixing joins

    Getting joins right

    Processing outer joins

    Understanding the join_collapse_limit variable

    Enabling and disabling optimizer settings

    Understanding genetic query optimization

    Partitioning data

    Creating partitions

    Applying table constraints

    Modifying inherited structures

    Moving tables in and out of partitioned structures

    Cleaning up data

    Adjusting parameters for good query performance

    Speeding up sorting

    Speedup administrative tasks

    Summary

    Writing Stored Procedures

    Understanding stored procedure languages

    The anatomy of a stored procedure

    Introducing dollar quoting

    Making use of anonymous code blocks

    Using functions and transactions

    Understanding various stored procedure languages

    Introducing PL/pgSQL

    Handling quoting

    Managing scopes

    Understanding advanced error handling

    Making use of GET DIAGNOSTICS

    Using cursors to fetch data in chunks

    Utilizing composite types

    Writing triggers in PL/pgSQL

    Introducing PL/Perl

    Using PL/Perl for datatype abstraction

    Deciding between PL/Perl and PL/PerlU

    Making use of the SPI interface

    Using SPI for set returning functions

    Escaping in PL/Perl and support functions

    Sharing data across function calls

    Writing triggers in Perl

    Introducing PL/Python

    Writing simple PL/Python code

    Using the SPI interface

    Handling errors

    Improving stored procedure performance

    Reducing the number of function calls

    Using cached plans

    Assigning costs to functions

    Using stored procedures

    Summary

    Managing PostgreSQL Security

    Managing network security

    Understanding bind addresses and connections

    Inspecting connections and performance

    Living in a world without TCP

    Managing pg_hba.conf

    Handling SSL

    Handling instance-level security

    Creating and modifying users

    Defining database-level security

    Adjusting schema-level permissions

    Working with tables

    Handling column-level security

    Configuring default privileges

    Digging into row-level security - RLS

    Inspecting permissions

    Reassigning objects and dropping users

    Summary

    Handling Backup and Recovery

    Performing simple dumps

    Running pg_dump

    Passing passwords and connection information

    Using environment variables

    Making use of .pgpass

    Using service files

    Extracting subsets of data

    Handling various data formats

    Replaying backups

    Handling global data

    Summary

    Making Sense of Backups and Replication

    Understanding the transaction log

    Looking at the transaction log

    Understanding checkpoints

    Optimizing the transaction log

    Transaction log archiving and recovery

    Configuring for archiving

    Confguring the pg_hba.conf file

    Creating base backups

    Reducing the bandwidth of a backup

    Mapping tablespaces

    Using different formats

    Testing transaction log archiving

    Replaying the transaction log

    Finding the right timestamp

    Cleaning up the transaction log archive

    Setting up asynchronous replication

    Performing a basic setup

    Improving security

    Halting and resuming replication

    Checking replication to ensure availability

    Performing failovers and understanding timelines

    Managing conflicts

    Making replication more reliable

    Upgrading to synchronous replication

    Adjusting durability

    Making use of replication slots

    Handling physical replication slots

    Handling logical replication slots

    Use cases of logical slots

    Summary

    Deciding on Useful Extensions

    Understanding how extensions work

    Checking for available extensions

    Making use of contrib modules

    Using the adminpack

    Applying bloom filters

    Deploying btree_gist and btree_gin

    Dblink - consider phasing out

    Fetching files with file_fdw

    Inspecting storage using pageinspect

    Investigating caching with pg_buffercache

    Encrypting data with pgcrypto

    Prewarming caches with pg_prewarm

    Inspecting performance with pg_stat_statements

    Inspecting storage with pgstattuple

    Fuzzy searches with pg_trgm

    Connecting to remote servers using postgres_fdw

    Handling mistakes and typos

    Other useful extensions

    Summary

    Troubleshooting PostgreSQL

    Approaching an unknown database

    Inspecting pg_stat_activity

    Querying pg_stat_activity

    Treating Hibernate statements

    Figuring out where queries come from

    Checking for slow queries

    Inspecting individual queries

    Digging deeper with perf

    Inspecting the log

    Checking for missing indexes

    Checking for memory and I/O

    Understanding noteworthy error scenarios

    Facing clog corruption

    Understanding checkpoint messages

    Managing corrupted data pages

    Careless connection management

    Fighting table bloat

    Summary

    Migrating to PostgreSQL

    Migrating SQL statements to PostgreSQL

    Using lateral joins

    Supporting lateral

    Using grouping sets

    Supporting grouping sets

    Using WITH clause - common table expressions

    Supporting WITH clause

    Using WITH RECURSIVE clause

    Supporting WITH RECURSIVE clause

    Using FILTER clause

    Supporting FILTER clause

    Using windowing functions

    Supporting windowing and analytics

    Using ordered sets - WITHIN GROUP clause

    Supporting WITHIN GROUP clause

    Using TABLESAMPLE clause

    Supporting TABLESAMPLE clause

    Using limit/offset

    Supporting FETCH FIRST clause

    Using OFFSET

    Supporting OFFSET clause

    Using temporal tables

    Supporting temporal tables

    Matching patterns in time series

    Moving from Oracle to PostgreSQL

    Using the oracle_fdw extension to move data

    Using ora2pg to migrate from Oracle

    Common pitfalls

    Moving from MySQL or MariaDB to PostgreSQL

    Handling data in MySQL and MariaDB

    Changing column definitions

    Handling null values

    Expecting problems

    Migrating data and schema

    Using pg_chameleon

    Using foreign data wrappers

    Summary

    Preface

    PostgreSQL is an open source database management tool used for handling large datasets (big data) and as a JSON document database. It also has applications in the software and web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently. 

    What this book covers

    Chapter 1, PostgreSQL Overview, will give you an overview of PostgreSQL and its features. You will learn about new stuff and new functionality available in PostgreSQL.

    Chapter 2, Understanding Transactions and Locking, will cover one of the most important aspects of any database system. Proper database work is usually not possible without the existence of transactions, and understanding transactions and locking is vital to performance as well as professional work.

    Chapter 3, Making Use of Indexes, covers everything you need to know about indexes. Indexes are key to performance and are therefore an important cornerstone if you want good user experience and high throughput. All important aspects of indexing will be covered.

    Chapter 4, Handling Advanced SQL, will introduce some of the most important concepts of modern SQL. You will learn about windowing functions as well as other important, more modern, elements of SQL.

    Chapter 5, Log Files and System Statistics, will guide you through more administrative tasks, such as log file management and monitoring. You will learn how to inspect your servers and extract runtime information from PostgreSQL.

    Chapter 6, Optimizing for Good Query Performance, will tell you everything you need to know about good PostgreSQL performance. The chapter will cover SQL tuning as well as information about memory management.

    Chapter 7, Writing Stored Procedures, teaches you some more advanced topics related to server-side code. The most important server-side programming languages are covered and important aspects are pointed out.

    Chapter 8, Managing PostgreSQL Security, has been designed to help you improve the security of your server. The chapter features everything from user management to row-level security. Information about encryption is also included.

    Chapter 9, Handling Backup and Recovery, is all about backups and data recovery. You will learn to backup your data and it will enable you to restore things in case of disaster.

    Chapter 10, Making Sense of Backups and Replication, is all about redundancy. You will learn to asynchronously and synchronously replicate PostgreSQL database systems. All modern features are covered as extensively as possible.

    Chapter 11, Deciding on Useful Extensions, describes widely used modules that add additional functionality to PostgreSQL. You will learn about the most common extensions.

    Chapter 12, Troubleshooting PostgreSQL, offers a systematic approach to fixing problems in PostgreSQL. It will enable you to spot common problems and approach them in an organized way.

    Chapter 13, Migrating to PostgreSQL, is the final chapter of this book and shows you the way from commercial databases to PostgreSQL. The most important databases migrated these days will be covered.

    What you need for this book

    This book has been written for a broad audience. In order to follow the examples presented in this book, it makes sense to have at least some experience with SQL and maybe even PostgreSQL in general (although this is not a hard requirement). In general, it is a good idea to be familiar with the Unix command line.

    Who this book is for

    This book has explicitly been written for people who want to know more about PostgreSQL and who are not satisfied with basic information. The aim is to write a book that goes a bit deeper and explains the most important stuff in a clear and easy-to-understand way.

    Conventions

    In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

    Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: In this case, the \timing command will tell psql to show the runtime of a query.

    Any command-line input or output is written as follows:

    New terms and important words are shown in bold.

    Warnings or important notes appear in a box like this.

    Tips and tricks appear like this.

    Reader feedback

    Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

    To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

    If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

    Customer support

    Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

    Errata

    Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

    To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

    Piracy

    Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

    Please contact us at [email protected] with a link to the suspected pirated material.

    We appreciate your help in protecting our authors and our ability to bring you valuable content.

    Questions

    If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

    PostgreSQL Overview

    PostgreSQL is one of the world's most advanced open source database systems and it has many features widely used by developers and system administrators alike. In this book, many of those cool features will be covered and discussed in great detail.

    In this chapter, you will be introduced to PostgreSQL and the cool new features available in PostgreSQL 9.6 and beyond. All relevant new functionality will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is of course not complete, so I tried to focus on the most important aspects relevant to most people.

    The features outlined in this chapter will be split into the following categories:

    Database administration

    SQL and developer-related

    Backup, recovery, and replication

    Performance-related topics

    What is new in PostgreSQL 9.6?

    PostgreSQL 9.6 was released in late 2016 and is the last version that will still be following the old numbering scheme PostgreSQL has been using for more than a decade now. From PostgreSQL 10.0 onward, a new version numbering system will be in place. From 10.0 on, major releases will happen way more frequently.

    Understanding new database administration functions

    PostgreSQL 9.6 has many new features that can help the administrator to reduce work and make systems more robust.

    One of those features is the idle_in_transaction_session_timeout function.

    Killing idle sessions

    In PostgreSQL, a session or a transaction can basically live almost forever. In some cases, this has been a problem because transactions were kept open for too long. Usually, this was due to a bug. The trouble is this: insanely long transactions can cause cleanup problems and table bloat can occur. The uncontrolled growth of a table (table bloat) naturally leads to performance problems and unhappy end users.

    Starting with PostgreSQL 9.6, it is possible to limit the duration a database connection is allowed to spend inside a transaction without performing real work. Here is how it works:

    Administrators and developers can set a timeout, which is 2.5 seconds in my example. As soon as a transaction is idle for too long, the connection will be terminated automatically by the server. Nasty side effects of long idle transactions can be prevented easily by adjusting this parameter.

    Finding more detailed information in pg_stat_activity

    The pg_stat_activity function is a system view that has been around for many years. It basically contains a list of active connections. In older versions of PostgreSQL, administrators could see that a query is waiting for somebody else—however, it was not possible to figure out why and for whom. This has changed in 9.6. Two columns have been added:

    In addition to this extension, a new procedure has been added, which shows who caused whom to wait:

    When the function is called, it will return a list of blocking PIDs.

    Tracking vaccum progress

    For many years, people have asked for a progress tracker for vacuum. Finally, PostgreSQL 9.6 makes this wish come true by introducing a new system view. Here is how it works:

    PostgreSQL will provide detailed information about ongoing vacuum processes so that people can track the progress of this vital operation.

    Improving vacuum speed

    PostgreSQL 9.6 not only provides you with deeper insights into what vacuum does at the moment, it will also speed up the process in general. From PostgreSQL 9.6 onward, PostgreSQL will keep track of all frozen pages and avoid vacuuming those pages.

    Tables that are mostly read-only will massively benefit from this change, as vacuum load is drastically reduced.

    Digging into new SQL and developer-related functions

    One of the most promising new features of PostgreSQL is the ability to perform phrase searching. Up to 9.5 it was only possible to search for words—phrase searching was very hard to do. 9.6 nicely removes this limitation. Here is an example of how it works:

    The first query returns false because the words we are looking for do not occur in the desired order. In the second example, true is returned because there really is a proper match.

    However, there is more: in 9.6 it is possible to check whether words show up in a certain order. In the following example, we want a word to be between united and nations:

    The second example returns false as there is no word between united and nations.

    Using new backup and replication functionality

    PostgreSQL 9.6 has also seen improvements in the area of backup and recovery.

    Streamlining wal_level and monitoring

    The wal_level setting has always been a bit hard to understand for many people. Many were struggling with the difference between the archive and hot_standby settings. To remove this confusion altogether, both settings have been replaced with the easier-to-understand replica setting, which does the same as hot_standby.

    In addition to that, the monitoring of replicated setups has been simplified. Prior to 9.6, there was only the pg_stat_replication view, which could be queried on the master to supervise the flow of data to the slave. Now it is also possible to monitor the flow of data on the slaves, by consulting the pg_stat_wal_receiver function. It is basically the slave-side mirror of the pg_stat_replication function and helps to determine the state of replication.

    Using multiple synchronous standby servers

    PostgreSQL has been able to perform synchronous replication for quite a while already. In PostgreSQL, it is possible to have more than just one synchronous server from 9.6 onward. Earlier, only one server had to acknowledge a commit. Now it is possible to have an entire group of servers that has to confirm a commit. This is especially important if you want to improve reliability in case of multi-node error.

    The syntax to use this new feature is simple:

    However, there is more to synchronous replication in PostgreSQL 9.6. Previously, PostgreSQL ensured (synchronous_commit = on) that the transaction log has reached the slave. However, this did not mean that data was actually visible. Consider an example: somebody adds a user to the master, instantly connects to the slave, and checks for the user. While the transaction log was guaranteed to be on the slave, it was not necessarily guaranteed that the data inside the log was already visible to the end user (due to replication conflicts and so on). By setting synchronous_commit = 'remote_apply', it is now possible to query the slave directly after a commit on the master, without having to worry that data might not be visible yet. The remote_apply value is slower than the on value but it allows to write more advanced applications.

    Understanding performance-related features

    Just like every release of PostgreSQL, there are numerous performance improvements, which can help to speedup applications. In this section, I want to focus on the most important and most powerful ones. Of course, there are many more small improvements than listed here.

    Improving relation extensions

    For many years PostgreSQL has extended a table (or an index) block by block. In the case of a single writer process, this was usually fine. However, in cases of high-concurrency writing, writing a block at a time was a source of contention and suboptimal performance. From 9.6 onward, PostgreSQL started to extend tables by multiple blocks at a time. The number of blocks added at a time is 20 times the number of waiting processes.

    Checkpoint sorting and kernel interaction

    When PostgreSQL writes changes to disk during a checkpoint, it now does so in a more orderly way to ensure that writes are more sequential than earlier. This is done by sorting blocks before sending them too. Random writes will be dramatically reduced this way, which in turn leads to higher throughput on most hardware.

    Sorted checkpoints are not the only scalability thing to make it into 9.6. There are also new kernel write-back configuration options: what does this mean? In case of large caches, it could take quite a long time to write all changes out. This used to be especially nasty on systems with hundreds of gigabytes of memory because fairly intense I/O storms could happen. Of course, the operating system, level behavior of Linux could be changed using the /proc/sys/vm/dirty_background_ratio command. However, only a handful of consultants and system administrators actually knew how to do that and why. The checkpoint_flush_after, bgwriter_flush_after, and backend_flush_after functions can be used now to control the flush behavior. In general, the rule is to flush earlier. Still, as the feature is new, people are still gathering experience on how to use those settings in the most efficient way possible.

    Using more advanced foreign data wrappers

    Foreign data wrappers have been around for many years. Starting with PostgreSQL 9.6, the optimizer can use foreign tables way more efficiently. This includes join push down (joins can now already be performed remotely) and order push down (sorting can now happen remotely). Distributing data inside a cluster is now way more efficient due to faster remote operations.

    Introducing parallel queries

    Traditionally, a query had to run on a single CPU. While this was just fine in the OLTP world, it started to be a problem for analytical applications, which were bound to the speed provided by a single core. With PostgreSQL 9.6, parallel queries were introduced. Of course, implementing parallel queries was hard and so a lot of infrastructure has already been implemented over the years. All this infrastructure is now available to provide the end user with parallel sequential scans. The idea is to make many CPUs work on complicated WHERE conditions during a sequential scan. Version 9.6 also allowed for parallel aggregates and parallel joins. Of course, there is a lot of work left, but we are already looking at a major leap forward.

    To control parallelism, there are two essential settings:

    The first one limits the overall number of worker processes available. The second one controls the number of workers allowed per gather node.

    A gather node is a new thing you will see in an execution plan. It is in charge of unifying results coming from parallel subprocesses.

    In addition to those fundamental settings, there are a couple of new optimizer parameters to adjust the cost of parallel queries.

    Adding snapshot too old

    Those of you using Oracle would be aware of the following error message: snapshot too old. In Oracle, this message indicates that a transaction has been too long, so it has to be aborted. In PostgreSQL, transactions can run almost infinitely. However, long transactions can still be a problem, so the snapshot too old error has been added as a feature to 9.6, which allows transactions to be aborted after a certain amount of time.

    The idea behind that is to prevent table bloat and to make sure that end users are aware of the fact that they might be about to do something stupid.

    Summary

    In PostgreSQL 9.6 and 10.0, a lot of functionality has been added, which allows

    Enjoying the preview?
    Page 1 of 1