PostgreSQL Proficiency For Python People
PostgreSQL Proficiency For Python People
PostgreSQL Proficiency For Python People
Christophe Pettus
PostgreSQL Experts, Inc.
thebuild.com
pgexperts.com
Welcome!
Christophe Pettus
Consultant with PostgreSQL Experts, Inc.
Based in sunny San Francisco, California.
Technical blog: thebuild.com
Twitter: @xof
[email protected]
My background.
What is this?
This means
Cross-Platform.
Plus Windows.
Installation
use them!
Provides platform-specific scripting, etc.
RedHat-flavor and Debian-flavor have their
own repositories.
Other OSes.
initdb
Note on Debian
Just Do This.
pg_ctl
psql
PostgreSQL directories
Configuration files.
Configuration
Configuration files.
postgresql.conf
Important parameters.
Logging.
Memory.
Checkpoints.
Planner.
Youre done.
No, really, youre done!
Logging.
Where to log?
What to log?
log_destination = 'csvlog'
log_directory = 'pg_log'
logging_collector = on
log_filename = 'postgres-%Y-%m-%d_%H%M%S'
log_rotation_age = 1d
log_rotation_size = 1GB
log_min_duration_statement = 250ms
log_checkpoints = on
log_connections = on
log_disconnections = on
log_lock_waits = on
log_temp_files = 0
Memory configuration
shared_buffers
work_mem
maintenance_work_mem
shared_buffers
work_mem
maintenance_work_mem
effective_cache_size
Checkpoints.
A timeout occurs.
Checkpoint settings.
wal_buffers = 16MB
checkpoint_completion_target = 0.9
checkpoint_timeout = 10m-30m # Depends on restart time
checkpoint_segments = 32 # To start.
Checkpoint settings, 2.
Planner settings.
Do not touch.
fsync = on
Never change this.
synchronous_commit = on
Change this, but only if you understand
the data loss potential.
Changing settings.
pg_hba.conf
User security.
The WAL.
The Basics.
pg_xlog
WAL archiving.
archive_command
Runs a command each time a WAL
segment is complete.
on a different system.
On a crash
sychronous_commit
Backup and
Recovery
pg_dump
pg_restore
pg_restore
SELECT pg_start_backup(...);
Copy the disk image and any WAL files that
are created.
SELECT pg_stop_backup();
Make sure you have all the WAL segments.
The disk image + WAL segments are your
backup.
WAL-E
http://github.com/wal-e/wal-e
Provides a full set of appropriate scripting.
Automates create PITR backups into AWS
S3.
Highly recommended!
PITR Restore
PITR?
Point-in-time recovery.
You dont have to replay the entire WAL
stream.
Replication.
WAL Archiving.
Great idea!
Such a great idea, PostgreSQL implements
it!
recovery.conf
Disaster recovery.
pg_basebackup
Lets see!
Replication!
Advice?
Trigger-based replication
Highly configurable.
Can push part or all of the tables; dont
have to replicate everything.
Transactions,
MVCC and
VACUUM
Transaction
BEGIN;
INSERT INTO transactions(account_id, value, offset_id)
VALUES (11, 120.00, 14);
INSERT INTO transactions(account_id, value, offset_id)
VALUES (14, -120.00, 11);
COMMIT;
Transaction Properties.
In PostgreSQL
A brief warning
The Problem.
Bad Things.
Some Approaches.
The Implications.
Snapshots.
Nothings Perfect.
Isolation Modes.
PostgreSQL supports:
READ COMMITTED The default.
REPEATABLE READ
SERIALIZABLE
It does not support:
READ UNCOMMITTED (dirty read)
What is a snapshot?
Wait, what?
SERIALIZABLE
MVCC consequences.
VACUUM
ANALYZE
It probably is.
The database generally stabilize at 20% to
50% bloat. Thats acceptable.
Schema Design.
Whats Normal?
Normalization is important.
But dont obsess.
It flows naturally from proper separation of
data.
Pick Entities.
Copied.
Calculated.
No Polymorphic Fields.
Constraints.
Key Selection.
It looks normalized
but its really a pain in the neck.
Fast / Slow
Arrays.
hstore
JSON
NULL
Many-to-Many Tables
Character Encoding.
Use UTF-8.
Just. Do. It.
There is no compelling reason to use any
other character encoding.
Time Representation.
Indexing
Trick Question!
It doesnt.
B-Tree.
Hash.
GiST.
SP-GiST.
GIN.
B-Tree Indexes.
Single column.
Multiple column (composite).
Expression (functional) indexes.
When to create?
Mandatory indexes.
Ascending vs Descending?
Composite Indexes.
Expression Indexes.
Indexes on an expression.
PostgreSQL can recognize when you are
querying on that expression and use the
index.
Partial Indexes.
GiST Indexes.
GIN
GIN implementation
A B-tree of B-trees.
Tokens organized into B-trees.
Row pointers also organized into B-trees.
On-disk footprint can be quite large.
Bad Selectivity.
Index Prohibitorum
(cost=0.00..8.27 rows=1
Index Creation.
CREATE INDEX
REINDEX
Index Bloat.
http://pgsql.tapoueh.org/site/html/news/
20080131.bloat.html
Index Usage.
pg_stat_user_indexes
Reports the number of times an index is
used.
Debugging
http://explain.depesz.com/
---------------------------------------------------------------------Aggregate (cost=2550.42..2550.43 rows=1 width=4)
-> Nested Loop (cost=0.00..2550.41 rows=3 width=4)
-> Index Scan using ecommerce_order_subscriber_id
on ecommerce_order (cost=0.00..132.88 rows=16 width=4)
Index Cond: (subscriber_id = 396760)
Filter: (status = ANY ('{3,9,12,16,14}'::integer[]))
-> Index Scan using ecommerce_solditem_order_id
on ecommerce_solditem (cost=0.00..150.86
rows=19 width=4)
Index Cond: (ecommerce_solditem.order_id =
ecommerce_order.id)
Filter: (((ecommerce_solditem.user_access_denied
IS NULL) OR
(NOT ecommerce_solditem.user_access_denied))
AND (ecommerce_solditem.status = 1))
Query Analysis.
ANALYZE
---------------------------------------------------------------------Aggregate (cost=2550.42..2550.43 rows=1 width=4)
-> Nested Loop (cost=0.00..2550.41 rows=3 width=4)
-> Index Scan using ecommerce_order_subscriber_id
on ecommerce_order (cost=0.00..132.88 rows=16 width=4)
Index Cond: (subscriber_id = 396760)
Filter: (status = ANY ('{3,9,12,16,14}'::integer[]))
-> Index Scan using ecommerce_solditem_order_id
on ecommerce_solditem (cost=0.00..150.86
rows=19 width=4)
Index Cond: (ecommerce_solditem.order_id =
ecommerce_order.id)
Filter: (((ecommerce_solditem.user_access_denied
IS NULL) OR
(NOT ecommerce_solditem.user_access_denied))
AND (ecommerce_solditem.status = 1))
Planner Statistics
Cost.
Actual Time.
In milliseconds.
Wall-clock time, not only query execution
time.
Rows.
Loops.
CROSS JOINs
These can be created by accident!
Sequential scans on large tables.
SELECT COUNT(*)
So dont do that.
OFFSET / LIMIT
Python Particulars
Python 2? psycopg2
psycopg2 notes.
py-postgresql
Django Notes.
Django + Replication
Go South.
Special Situations.
Do this promptly!
Only requires installing new binaries.
If using packages, often as easy as just an
apt-get / yum upgrade.
Do a VACUUM afterwards.
AWS
PostgreSQL RDS
Sharding.
Then what?
Community PostgreSQL doesnt have an
integrated multi-master solution.
Postgres-XC
Bucardo
Custom Sharding.
http://instagram-engineering.tumblr.com/
post/10853187575/sharding-ids-atinstagram
Pooling, etc.
Why pooling?
pgbouncer
Developed by Skype.
Easy to install.
Very fast, can handle 1000s of connections.
Does not to failover, load-balancing.
Use HAProxy or similar.
pgPool II
Tools
Graphical clients
pgAdmin III
Comprehensive, open-source.
Navicat
Commercial product, not PostgreSQLspecific.
Log Analysis
pgbadger
The only choice now for monitoring text
logs.
pg_stat_statements
Maintains a buffer of data on statements
executed, within PostgreSQL.
Questions?
thebuild.com / @xof / pgexperts.com
Thank you!
http://tinyurl.com/pycon2014survey