Galera Schema

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

How to Avoid Pitfalls in Schema Upgrade

with Galera

February 2, 2020
Sveta Smirnova
Sveta Smirnova

• MySQL Support engineer


• Author of
• MySQL Troubleshooting
• JSON UDF functions
• FILTER clause for MySQL
• Speaker
• Percona Live, OOW, Fosdem,
DevConf, HighLoad...
2
Table of Contents

•TOI
•RSU
•pt-online-schema-change (pt-osc)

3
Introduction

• Galera Replication Library



Provides synchronous replication for MySQL

4
Introduction

• Galera Replication Library



Provides synchronous replication for MySQL

Galera Clusters
MariaDB Galera Cluster
Percona XtraDB Cluster
Galera Cluster for MySQL

4
How Galera works

• Data modification happens on a node


• Optimistic locking control

5
How Galera works

• Data modification happens on a node


• Optimistic locking control

• At the COMMIT time



Broadcasts write set for the cluster
• Waits confirmation of the successful update
From all other nodes

Yes Commits transaction locally


No Rollbacks transaction

5
Data Updates

• Committed on all nodes or nowhere


• Safe

6
Challenges of DDL

• Replicated independently from storage engine

7
Challenges of DDL

• Replicated independently from storage engine


• Changes may affect query results


Adding/removal of UNIQUE keys

Adding/removal columns
• Changing column definition

7
Challenges of DDL

• Replicated independently from storage engine


• Changes may affect query results

• Modification can happen on any node

• The schema must be upgraded before DML



There is no way to rollback schema upgrade
• MDLs are set only on one node
Not across the cluster
Not possible to rely on them for all nodes
Additional control required

7
TOI
Total Order Isolation (TOI)

• DDL changes are replicated in the same order


regarding other transactions

All nodes are in the absolutely same state at
any point of time

9
TOI: Illustration

• 3-nodes cluster

Node A
• Node B
• Node C

10
TOI: Illustration

• Initial state
Node A Node B Node C
INSERT(103) UPDATE(101) SELECT(100)
UPDATE(104) INSERT(102) INSERT(112)
ALTER(105) DELETE(108) SELECT(113)
UPDATE(109) UPDATE(114)

10
TOI: Illustration

• Queries status
Node A Node B Node C
 INSERT(103)  UPDATE(101)  SELECT(100)
 UPDATE(104)  INSERT(102)  INSERT(112)
 ALTER(105)  DELETE(108) ! SELECT(113)
 UPDATE(109)  UPDATE(114)

10
TOI: Illustration

• ALTER in progress
Node A Node B Node C
 ALTER(105)  DELETE(108)  INSERT(112)
 UPDATE(109) ! SELECT(113)
 UPDATE(114)

10
TOI: Illustration

• ALTER finished
Node A Node B Node C
 DELETE(108)  INSERT(112)
 UPDATE(109)  SELECT(113)
 UPDATE(114)

10
PROCESSLIST: DML before ALTER

DML node> select DB, COMMAND, TIME, STATE, INFO from information_schema.processlist WHERE DB=’sbtest’;
+--------+---------+------+---------------------------------------------------+-----------------------+
| DB | COMMAND | TIME | STATE | INFO |
+--------+---------+------+---------------------------------------------------+-----------------------+
| sbtest | Query | 1 | wsrep: initiating pre-commit for write set (2886) | COMMIT |
| sbtest | Query | 1 | wsrep: initiating pre-commit for write set (2888) | COMMIT |
| sbtest | Query | 1 | wsrep: initiating pre-commit for write set (2884) | COMMIT |
| sbtest | Query | 1 | updating | DELETE FROM sbtest1.. |
| sbtest | Query | 1 | wsrep: initiating pre-commit for write set (2887) | COMMIT |
| sbtest | Query | 0 | wsrep: initiating pre-commit for write set (2889) | COMMIT |
| sbtest | Query | 1 | wsrep: initiating pre-commit for write set (2885) | COMMIT |
| sbtest | Query | 1 | wsrep: pre-commit/certification passed (2883) | COMMIT |
+--------+---------+------+---------------------------------------------------+-----------------------+
8 rows in set (0.00 sec)

11
PROCESSLIST: SELECT before ALTER

SELECT node> select DB, COMMAND, TIME, STATE, INFO from information_schema.processlist
-> WHERE DB=’sbtest’;
+--------+---------+------+-------------------+---------------------------------------+
| DB | COMMAND | TIME | STATE | INFO |
+--------+---------+------+-------------------+---------------------------------------+
| sbtest | Query | 0 | statistics | SELECT pad FROM sbtest2 WHERE id=5009 |
| sbtest | Query | 0 | starting | SELECT pad FROM sbtest3 WHERE id=4951 |
| sbtest | Query | 0 | statistics | SELECT pad FROM sbtest4 WHERE id=4954 |
| sbtest | Query | 0 | System lock | SELECT pad FROM sbtest2 WHERE id=5351 |
| sbtest | Query | 0 | cleaning up | SELECT pad FROM sbtest2 WHERE id=4954 |
| sbtest | Sleep | 0 | | NULL |
| sbtest | Query | 0 | Sending to client | SELECT pad FROM sbtest1 WHERE id=4272 |
| sbtest | Query | 0 | closing tables | SELECT pad FROM sbtest4 WHERE id=4722 |
+--------+---------+------+-------------------+---------------------------------------+
8 rows in set (0.00 sec)

12
ALTER

DDL node> use ddltest;


Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

DDL node> alter table sbtest1 add key k1(c, k, pad);


Query OK, 0 rows affected (3 min 53.73 sec)
Records: 0 Duplicates: 0 Warnings: 0

13
PROCESSLIST: DML during ALTER

DML node> select DB, COMMAND, TIME, STATE, INFO from information_schema.processlist
-> WHERE DB in (’sbtest’,’ddltest’);
+---------+---------+------+---------------------------------------------------+----------------------+
| DB | COMMAND | TIME | STATE | |
+---------+---------+------+---------------------------------------------------+----------------------+
| sbtest | Query | 36 | wsrep: initiating pre-commit for write set (7886) | COMMIT |
| sbtest | Query | 37 | wsrep: initiating pre-commit for write set (7882) | COMMIT |
| sbtest | Query | 27 | wsrep: initiating pre-commit for write set (7887) | COMMIT |
| sbtest | Query | 27 | wsrep: initiating pre-commit for write set (7888) | COMMIT |
| sbtest | Query | 36 | wsrep: initiating pre-commit for write set (7885) | COMMIT |
| sbtest | Query | 37 | wsrep: initiating pre-commit for write set (7883) | COMMIT |
| sbtest | Query | 37 | wsrep: initiating pre-commit for write set (7884) | COMMIT |
| sbtest | Query | 10 | wsrep: initiating pre-commit for write set (7889) | COMMIT |
| ddltest | Sleep | 38 | altering table | alter table sbtest1. |
+---------+---------+------+---------------------------------------------------+----------------------+
9 rows in set (0.00 sec)

14
PROCESSLIST: SELECT during ALTER

SELECT node> select DB, COMMAND, TIME, STATE, INFO from information_schema.processlist
-> WHERE DB in (’sbtest’,’ddltest’);
+---------+---------+------+-------------------+-------------------------------------------+
| DB | COMMAND | TIME | STATE | |
+---------+---------+------+-------------------+-------------------------------------------+
| sbtest | Sleep | 0 | | NULL |
| sbtest | Sleep | 0 | | NULL |
| sbtest | Query | 0 | Sending to client | SELECT pad FROM sbtest4 WHERE id=4989 |
| sbtest | Sleep | 0 | | NULL |
| sbtest | Query | 0 | query end | SELECT pad FROM sbtest2 WHERE id=4961 |
| sbtest | Sleep | 0 | | NULL |
| sbtest | Sleep | 0 | | NULL |
| sbtest | Sleep | 0 | | NULL |
| ddltest | Sleep | 39 | altering table | alter table sbtest1 add key k1(c, k, pad) |
+---------+---------+------+-------------------+-------------------------------------------+
9 rows in set (0.14 sec)

15
TOI Advantages

• Data always consistent


• DDL applied to all nodes at the same time

• No failure due to schema inconsistency

16
TOI Disadvantages

• The whole cluster blocked



For the duration of the entire DDL operation

Schema upgrades replicated as a statement
• There is no guarantee that the ALTER succeed!

17
How to Perform Upgrade with TOI

• Schedule maintenance window


• Run DDL

• Cluster won’t be accessible until DDL finishes


• SELECTs can continue
• wsrep sync wait != 1

18
When to Use TOI

• Quick DDL operations

19
When to Use TOI

• Quick DDL operations


• Creating new database objects

• CREATE DATABASE

CREATE TABLE

19
When to Use TOI

• Quick DDL operations


• Creating new database objects

• Online operations which modify metadata only



RENAME INDEX

RENAME TABLE
• DROP INDEX
• ALGORITHM=INSTANT
Full list
19
RSU
Rolling Schema Upgrade (RSU)

• Variable wsrep OSU method



Puts node into de-sync state
• For the duration of DDL
• Pauses Galera provider

Schema can get out of sync!

21
User Responsibility

• Run DDL on the each node of the cluster



Block read-write access that depend on DDL
• Until all nodes are in sync
• Make sure no write is performed to the table

Until upgrade finishes on all nodes
• Failure makes cluster unrecoverable!

22
RSU Workflow

• User Action • Node Operation


• SET SESSION wsrep OSU method = ’RSU’; • Nothing

• DDL •Is wsrep OSU method set to RSU?


Yes Performs DDL

Any other statement •
Nothing

23
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


Yes Wait for 5 milliseconds

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


Yes Wait for 5 milliseconds
L Still transactions in the COMMIT mode exist?

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


Yes Wait for 5 milliseconds
L Still transactions in the COMMIT mode exist?
Yes Abort DDL

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


No Put node into de-sync state

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


No Put node into de-sync state
L Pause write-set application

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


No Put node into de-sync state
L Pause write-set application
L Execute DDL

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


No Put node into de-sync state
L Pause write-set application
L Execute DDL
L Bring the node back to the cluster

24
How Node Internally Executes DDL in RSU Mode?

L Does node have transactions in COMMIT mode?


No Put node into de-sync state
L Pause write-set application
L Execute DDL
L Bring the node back to the cluster
Synchronize

24
RSU: Locking

• Not avoidable
• Updates to all objects on the node in RSU

mode must finish before the operation



Failure aborts DDL

25
RSU Advantages

• Cluster remains functional



Schedule long-running ALTER
• In the best time possible

26
RSU Disadvantages

• No checks for data and schema consistency



This is your responsibility!

27
RSU Disadvantages

• No checks for data and schema consistency


• All writes must be stopped on the affected node


Otherwise DDL fails with an error

27
RSU Disadvantages

• No checks for data and schema consistency


• All writes must be stopped on the affected node

• gcache should be big enough to hold changes

• Made while DDL was running



Failure will cause SST when node re-joins cluster
• All schema changes will be lost

27
RSU Disadvantages

• No checks for data and schema consistency


• All writes must be stopped on the affected node

• gcache should be big enough to hold changes

• Any error can make cluster dysfunctional

27
RSU Disadvantages

• No checks for data and schema consistency


• All writes must be stopped on the affected node

• gcache should be big enough to hold changes

• Any error can make cluster dysfunctional


Affected table must be offline
• Until the schema upgrade is done on all nodes

Unless this is schema-compatible change

27
How to Use RSU

• Make sure gcache is big enough



Must hold all updates while DDL is in progress

28
How to Use RSU

• Make sure gcache is big enough



Must hold all updates while DDL is in progress
• Block all writes to the table/schema

28
How to Use RSU

! Choose an ”upgrading node”

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node
! SET SESSION wsrep OSU method = ’RSU’;

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node
! SET SESSION wsrep OSU method = ’RSU’;
! Perform DDL in the same session

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node
! SET SESSION wsrep OSU method = ’RSU’;
! Perform DDL in the same session
! SET SESSION wsrep OSU method = ’TOI’;

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node
! SET SESSION wsrep OSU method = ’RSU’;
! Perform DDL in the same session
! SET SESSION wsrep OSU method = ’TOI’;
! Re-enable writes

28
How to Use RSU

! Choose an ”upgrading node”


! Block all write requests to this node
! SET SESSION wsrep OSU method = ’RSU’;
! Perform DDL in the same session
! SET SESSION wsrep OSU method = ’TOI’;
! Re-enable writes
! Repeat for other nodes
28
pt-online-schema-change (pt-osc)
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades



With TOI

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

• Creates triggers which will copy modified rows

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

• Creates triggers which will copy modified rows

• Starts copying data in chunks

• Absolutely under control



Can be paused or stopped
–max-flow-ctl

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

• Creates triggers which will copy modified rows

• Starts copying data in chunks

• All rows already in the table are copied in chunks



Newly modified rows are copied using triggers

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

• Creates triggers which will copy modified rows

• Starts copying data in chunks

• Once copy is complete, drops the table

30
pt-online-schema-change (pt-osc)

• A tool, performing non-blocking upgrades


• Creates a copy of table with altered definition

• Creates triggers which will copy modified rows

• Starts copying data in chunks

• Once copy is complete, drops the table


Renames the copy into the original table name

30
pt-osc Advantages

• DDL is safe and non-blocking

31
pt-osc Disadvantages

• Works only with InnoDB tables


• Increases IO load even for inplace operations

• Conflicts with already existing triggers



Unless you use MariaDB >= 10.2.3
• Foreign keys updates are not effectively safe

32
How to Use pt-osc

• Study pt-osc options



--max-flow-ctl
• Set appropriate limits
• Make sure wsrep OSU method is TOI

• Run pt-osc

33
Which Method to Use?

L Will DDL be fast?



CREATE DATABASE
• CREATE TABLE
• DROP INDEX
• Any ALTER on small tables

Other

34
Which Method to Use?

L Will DDL be fast?


Yes Use TOI

34
Which Method to Use?

L Will DDL be fast?


Yes Use TOI
No Evaluate if you can use pt-osc
• Operation on the InnoDB table

Table has no triggers or MariaDB >= 10.2.3
• Table is not referenced by a foreign key
• You can tolerate increased IO

34
Which Method to Use?

L Will DDL be fast?


Yes Use TOI
No Evaluate if you can use pt-osc
Yes Use pt-osc

34
Which Method to Use?

L Will DDL be fast?


Yes Use TOI
No Evaluate if you can use pt-osc
Yes Use pt-osc
No Use RSU
• Stop all write traffic on the node

Stop all write traffic to the modified table
• Make sure to upgrade on all nodes
34
Conclusion

• Use TOI whenever possible


• Then use pt-osc

• RSU is a last resort

35
More information

Galera Cluster
MariaDB Galera Cluster
pt-online-schema-change

36
Thank you!

www.slideshare.net/SvetaSmirnova
twitter.com/svetsmirnova
github.com/svetasmirnova

37
DATABASE PERFORMANCE
MATTERS

You might also like