Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback requested: gh-ost on Galera / XtraDB cluster? #224

Open
shlomi-noach opened this issue Sep 2, 2016 · 20 comments
Open

Feedback requested: gh-ost on Galera / XtraDB cluster? #224

shlomi-noach opened this issue Sep 2, 2016 · 20 comments

Comments

@shlomi-noach
Copy link
Contributor

Dear community,
At this time we are not using Galera/XtraDB Cluster. We haven't tested gh-ost on Galera. We're curious to hear:

  • If anyone has
  • If there are specific blockers or problems
  • If gh-ost on Galera makes sense, given you can use Rolling Schema Upgrades.

Thank you!

@Roguelazer
Copy link
Contributor

That's actually on my list to test next week. It seems like the easiest way to do it would be to hang an asynchronous replica off of the cluster (which is a supported operation) and run gh-ost there to minimize load on the synchronous cluster.

gh-ost definitely makes sense; RSU is a huge pain in the butt since you are de-syncing the node (and need to stay in gcache to prevent undoing the migration and forcing a SST).

I'll let you know how the testing in our dev cluster works.

@SuperQ
Copy link
Contributor

SuperQ commented Sep 5, 2016

We use Galera/XtraDB and would definitely like to use gh-ost. We also have async slaves replicating from Galera "master cluster"s to provide additional read capacity and analytics access. Therefore it does make sense to continue to use an online schema tool.

@shlomi-noach
Copy link
Contributor Author

Thank both! Do you foresee a particular issue with the final lock/rename steps?

@SuperQ
Copy link
Contributor

SuperQ commented Sep 5, 2016

This applies to all writes that gh-ost does, not just the final step.

One thing that happens with Galera/XtraDB is the potential to deadlock "frequently". In theory this can happen between threads on a normal mysql master, but it's very rare.

This is caused by the rollup of multiple transactions into a single locking operation between the multiple galera masters. The default for this is 16 transactions, but is tunable0. I personally don't recommend changing this, but some people do.

gh-ost needs to be able to receive these deadlock rejected queries and retry them. This is safe, but slightly tedious. When dealing with distributed multi-master systems, this kind of problem will become more common. :)

@shlomi-noach
Copy link
Contributor Author

@SuperQ gh-ost already has retries, which are also configurable via -default-retries

@SuperQ
Copy link
Contributor

SuperQ commented Sep 5, 2016

👍 💯 🎱

@Frank-Leemkuil
Copy link

Frank-Leemkuil commented Oct 18, 2016

@shlomi-noach We test gh-ost on a Galera staging cluster and get a lot of deadlocks it looks like it never stops:

./gh-ost
--max-load=Threads_running=25
--critical-load=Threads_running=1000
--chunk-size=1000
--alter="MODIFY COLUMN EventId bigint SIGNED NOT NULL"
--max-lag-millis=1500
--user="username"
--password="password"
--database="databasename"
--table="alarm"
--host="ip-number"
--assume-master-host="ip-number"
--discard-foreign-keys
--allow-on-master
--allow-master-master
--switch-to-rbr
--cut-over="default"
--exact-rowcount
--concurrent-rowcount
--default-retries=120
--panic-flag-file="/tmp/ghost.panic.flag"
--postpone-cut-over-flag-file="/tmp/ghost.postpone.flag" \

Migrating db.alarm; Ghost table is db._alarm_gho
Migrating hostname:3306; inspecting hostname:3306; executing on hostname
Migration started at Tue Oct 18 13:40:26 +0200 2016
chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
throttle-additional-flag-file: /tmp/gh-ost.throttle
postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
panic-flag-file: /tmp/ghost.panic.flag
Serving on unix socket: /tmp/gh-ost.db.alarm.sock
Copy: 102911/102911 100.0%; Applied: 0; Backlog: 0/100; Time: 2m14s(total), 13s(copy); streamer: mysql-bin.000001:47131128; State: migrating; ETA: due
2016-10-18 13:42:41 ERROR Error 1050: Table '_alarm_del' already exists
2016-10-18 13:42:41 ERROR Error 1213: Deadlock found when trying to get lock; try restarting transaction
Copy: 102911/102911 100.0%; Applied: 0; Backlog: 0/100; Time: 2m15s(total), 13s(copy); streamer: mysql-bin.000001:47131834; State: migrating; ETA: due
2016-10-18 13:42:42 ERROR Session lock gh-ost.506.lock expected to be found but wasn't
Migrating db.alarm; Ghost table is db._alarm_gho
Migrating hostname:3306; inspecting hostname:3306; executing on hostname
Migration started at Tue Oct 18 13:40:26 +0200 2016
chunk-size: 1000; max-lag-millis: 1500ms; max-load: Threads_running=25; critical-load: Threads_running=1000; nice-ratio: 0.000000
throttle-additional-flag-file: /tmp/gh-ost.throttle
postpone-cut-over-flag-file: /tmp/ghost.postpone.flag
panic-flag-file: /tmp/ghost.panic.flag
Serving on unix socket: /tmp/gh-ost.db.alarm.sock
Copy: 102911/102911 100.0%; Applied: 0; Backlog: 0/100; Time: 2m17s(total), 13s(copy); streamer: mysql-bin.000001:47134853; State: migrating; ETA: due
2016-10-18 13:42:44 ERROR Error 1213: Deadlock found when trying to get lock; try restarting transaction
2016-10-18 13:42:44 ERROR Error 1050: Table '_alarm_del' already exists
2016-10-18 13:42:44 ERROR Error 1050: Table '_alarm_del' already exists

The staging environment is not a slave of production and has not to deal with live data.

@shlomi-noach
Copy link
Contributor Author

@Frank-Leemkuil a few questions:

  • as per Table '_alarm_del' already exists and Table '_alarm_del' already exists -- did those tables pre-exist before migration?
  • Session lock gh-ost.506.lock expected to be found but wasn't baffles me at this time.
  • What do you mean by "it looks like it never stops"? Please clarify
  • What happens after given log output?
  • What happens before Copy: 102911/102911 100.0% ?

@Frank-Leemkuil
Copy link

  • The tables are don't pre-exist before migration
  • The deadlock is retry every time
  • I have dropped the session, because no progress.
  • When I test the command without execute it works fine.

@shlomi-noach
Copy link
Contributor Author

  • "The deadlock is retry every time" so this repeats again and again, up to --default-retries (default 60) -- and than panic and exits?
  • "I have dropped the session, because no progress.". I'm sorry, I do not follow. Can you please tell the story and explain this? I don't have your screen in front of me.
  • "When I test the command without execute it works fine." - makes sense, since your problem is in the cut-over phase.

@Roguelazer
Copy link
Contributor

Since named locks and FLUSH TABLE x WITH READ LOCK don't work in Galera, it seems reasonable that you got errors?

@shlomi-noach
Copy link
Contributor Author

shlomi-noach commented Oct 18, 2016

Neither work in Galera? Does LOCK TABLES sometable WRITE work in Galera?

@Roguelazer
Copy link
Contributor

Roguelazer commented Oct 18, 2016

Let me clarify (and apologize for not posting this last week):

  • GET_LOCK() works, but is not replicated. It's possible to get the same named lock from multiple hosts.
  • FLUSH TABLE ... WITH READ LOCK and LOCK TABLES ... WRITE both also work, but are not replicated.

When you issue FLUSH TABLE ... WITH READ LOCK it actually entirely pauses the wsrep provider (which implements replication in Galera). This can lead to some bizarre behaviors. For example, given a database containing two empty tables (bar and baz) with only auto-incrementing primary key columns:

  • (on node 1): FLUSH TABLE bar WITH READ LOCK
  • (on node 1): INSERT INTO bar VALUES() -> returns ERROR 1099 (HY000): Table 'bar' was locked with a READ lock and can't be updated
  • (on node 1): INSERT INTO baz VALUES() -> returns ERROR 1100 (HY000): Table 'baz' was not locked with LOCK TABLES
  • (on node 1): RENAME TABLE baz TO bing -> returns ERROR 1223 (HY000): Can't execute the query because you have a conflicting read lock
    write locks are similarly confusing:

(all on node 1)

  • LOCK TABLE bar WRITE
  • RENAME TABLE baz TO bing -> session hangs forever and must be killed
  • RENAME TABLE bar TO bing -> session hangs forever and must be killed

After killing a session that had a lock, running UNLOCK TABLES seems to leave the server in a weird state where the table isn't locked, but it can't be written to (directly or by replication) and it can't be locked again (either by FLUSH TABLE ... or by LOCK TABLES ...). The only way to recover from this state (that I've found so far) is to kill -9 mysqld.

There may be some way to safely do this with Galera, but I have not yet found it.

All tests were executed on Percona XtraDB Cluster 5.6.30.

@shlomi-noach
Copy link
Contributor Author

@Roguelazer thank you so much for elaborating.

On gh-ost's part, it is not expecting to replicate advisory locks nor table locks; it will expect to set and see the lock on a particular server. Having said that, a multi-write Galera setup would be a problem, as other writable masters will not honor the lock.

Your comment also presents a bigger problem: a lock on the migrated table will lock other tables as well. That'll easily cause a complete stall.

Last, @Frank-Leemkuil 's output shows "Session lock gh-ost.506.lock expected to be found but wasn't", meaning gh-ost was not even able to acquire the session lock (used internally to prove that the session is alive), which is confusing.

I do not intend to work on any of these in the immediate future; Galera is not on our priority list, and personally I do not have production experience with Galera.

If anyone reading this wants to give this a try please let me know.

@taylorchu
Copy link

I used docker-compose to test a minimal 3-node percona xtradb cluster. The main blocker is the default cut-over algorithm, as described before, but--cut-over two-step works.

@timglabisch
Copy link

timglabisch commented Sep 15, 2017

doesnt work for me

when i've

mysqld --version
mysqld  Ver 5.7.18-15-57 for debian-linux-gnu on x86_64 (Percona XtraDB Cluster (GPL), Release rel15, Revision 7693d6e, WSREP version 29.20, wsrep_29.20)

and i run:

./gh-ost --cut-over two-step --host=192.168.10.99 --user=pxcuser --password=pxcpass --database=foo --table=documents --alter="ADD COLUMN x INT NOT NULL DEFAULT '0'" --chunk-size=2000 --max-load=Threads_connected=20 --allow-on-master --initially-drop-ghost-table --execute

i'll get

Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 0s(total), 0s(copy); streamer: mysql-bin.000001:291700; State: migrating; ETA: N/A
Copy: 0/1 0.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000001:294236; State: migrating; ETA: N/A
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 1s(total), 1s(copy); streamer: mysql-bin.000001:295917; State: migrating; ETA: due
2017-09-15 23:14:19 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 2s(total), 1s(copy); streamer: mysql-bin.000001:298453; State: migrating; ETA: due
2017-09-15 23:14:20 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 3s(total), 1s(copy); streamer: mysql-bin.000001:301402; State: migrating; ETA: due
2017-09-15 23:14:21 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 4s(total), 1s(copy); streamer: mysql-bin.000001:304353; State: migrating; ETA: due
2017-09-15 23:14:22 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 5s(total), 1s(copy); streamer: mysql-bin.000001:307304; State: migrating; ETA: due
2017-09-15 23:14:23 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 6s(total), 1s(copy); streamer: mysql-bin.000001:310255; State: migrating; ETA: due
2017-09-15 23:14:24 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 7s(total), 1s(copy); streamer: mysql-bin.000001:313617; State: migrating; ETA: due
2017-09-15 23:14:25 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
Copy: 2/2 100.0%; Applied: 0; Backlog: 0/1000; Time: 8s(total), 1s(copy); streamer: mysql-bin.000001:316979; State: migrating; ETA: due
2017-09-15 23:14:26 ERROR Error 1105: Percona-XtraDB-Cluster prohibits use of LOCK TABLE/FLUSH TABLE <table> WITH READ LOCK with pxc_strict_mode = ENFORCING
....

@shlomi-noach
Copy link
Contributor Author

@timglabisch for sake of clarity, can you specify what doesn't work for you? Did you mean using --cut-over two-step doesn't work?

@timglabisch
Copy link

@shlomi-noach sorry for confusion, i updated my comment

@Roguelazer
Copy link
Contributor

So, this works okay-ish on 5.6.

On 5.7, the explicit LOCK TABLE is forbidden (which is used even during the two-step process). It's not immediately clear to me why we need a LOCK TABLE at all in the two-step process, since the DDL change to rename the old table out of the way should itself acquire a write lock on the table. What would you think about adding a --skip-table-lock flag or somsuch which was only valid when in two-step cutover mode and which just didn't do the LOCK TABLE?

@beanbee
Copy link

beanbee commented Aug 5, 2018

gh-ost works on our three-nodes PXC (Percona XtraDB Cluster 5.6.40 with several asynchronous slaves), prerequisites:

  • PXC version 5.6, theoretically 5.7 with pxc_strict_mode in [PERMISSIVE, DISABLE] should work the same
  • with multi-master setups, we only have one PXC node to handle all the WRITE requests
  • two-step instead of default cut-over

LOCK/UNLOCK TABLES can only be executed on one PXC node (5.6 or 5.7 with pxc_strict_mode != ‘ENFORCING’), so if using one node for write requests, the data consistency will be guaranteed by LOCK TABLE.

As described before, the way PXC manages DDL statements(Total Order Isolation) would unintentionally cause gh-ost to hang at DROP TABLE tbl_old step. This is a 'deadlock' happens on the Galera level which can't be solved itself except a mysqld restart, but two-step is the safe option if the potential loss of cut-over is tolerable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants