Draft changes for sharding cacheops #68

ttyS15 · 2013-12-30T13:56:13Z

Hi, Alexander!

It seems I have got working version of sharded cacheops. I changed a lock behavior for compatibility with twemproxy. After that I assembled scheme with twemproxy and two redis servers behind it. And it works perfectly. Of course, I did not test it with high concurrency and high load yet. I plan to check my changes on test stage and production after vacation.

But now I would be happy if you will check my changes and discuss it with me.

I see some narrow places in the new behavior. Like max lock time. But there are some improvements. For instance, lock became more granular.

Also versions of server and client must be increased. Server >= 2.6.12, client >= 2.7.4

You can see my twemproxy config (YAML)

redis1:
  listen: 192.168.144.170:6379
  redis: true
  hash: fnv1a_64
  distribution: ketama
  auto_eject_hosts: true
  timeout: 400
  server_retry_timeout: 2000
  server_failure_limit: 1
  servers:
   - 192.168.144.151:6379:1
   - 192.168.144.61:6379:1

Thanks in advance!

Merge from upstream

…> 2.6.12, not verified

Suor · 2013-12-30T14:23:23Z

cacheops/invalidation.py

+        #     #       which reference cache keys we delete, they will be hanging out for a while.
+        #     pipe.delete(*conjs_keys)
+
+        for ck in conjs_keys:


There could be lots of conjs_keys. This will just hang up.

I am not sure that is so. I have experience with one system under load when about 20k conj_keys had been invalidated at one moment. In that moment 1500 uwsgi-workers (readers) smoked about several seconds while waiting for Redis. Thus all world had stopped and service was interrupted. It happened because current version of invalidation contains one long transaction (aka lock) for readers and writers. I will provide more explanation of my thoughts in comments below.

But in any way, thank you, that you noted this line. I will measure and compare it with current version.

Suor · 2013-12-30T14:33:47Z

Sorry, that won't really work. I already commented a line that will do too many network requests and too many work on redis side.

Also, why are you getting rid of transactions in favor of locks? It is significantly slower and won't guarantee invalidators be written in cache_thing() call.

ttyS15 · 2013-12-31T10:01:55Z

http://redis.io/commands/SUNION and http://redis.io/commands/SMEMBERS have a equal complexity. But actually SUNION is more complex because contains implicit merging of all sets.

May be merge results of all smember on the client side will be good idea. I`ll check it. Thank you!

Now I`ll try to answer you question about transaction vs. lock. Brief answer is the CAP theorem forced me. I need (A)vailabilty but not (C)onsistency. I think, (A) is more significant than (C) for cache system -- only my IMHO.

And full answer.

I need a cacheops version which can be distributed across several servers because:

redis node has reached a load about 80k rps and consumes 60-80 % CPU;
several thousand workers which distributed across 3 datacenters, depends on only one process (Redis) in the system, which can be blocked on undefined time (see my comment above);
one-threaded redis process can`t use effectively a multicore CPU;

I want that my system generally be live and interactive (though for most readers) despite of a cache invalidation.
I want to improve failover for Redis node, which is used by cacheops. At present we use hot-cold scheme with Sentinel and hardware balancer. Ugly shit! Sorry!

To solve 1 and 3 points I took twemproxy. Twemproxy does not provide full compatibility with Redis protocol by design. As you can see (https://github.com/twitter/twemproxy/blob/master/notes/redis.md) it does not supports MULTI and has a partial support of SUNION (actually no support for me). So I had not a choice. As side effect of replacing transaction with spin-lock for invalidators I have got (I belive it):

invalidation process has an interrupts where other workers can do they work;
readers does not affected by lock (actually with several Redis nodes);
diffirent invalidation processes can be in progress concurrently;

Yeah, you are right. An invalidation algorithm became more slower but system in whole seems to be more interactive. Also we must remember that invalidation is a rarely event in comparison with read and write cache. So my opinion, the right questions are "how much slower" and "where is a trade-off".

Thank you for you notes! I will measure the mentioned parts and get back with results.

ttyS15 · 2013-12-31T10:06:03Z

About robustness. As you know current version of invalidation algorithm has a place where die of worker lead to hanging of cache_keys. But it works good in 99,99... % cases, so ... :)

Suor · 2013-12-31T10:49:03Z

In your version death of redis instance (or network split) leads to lots of cache entries hanging without invalidators. That's one reason why data keys and their invalidation information should be stored on single instance. The other reason is that this way you can write data and invalidation in single network round trip on cache miss.

Various sharding strategies were discussed in #35. A right approach for your case depends on how data requests in your application are distributed across datacenters, how much data you have, whether it is sharded and such. If you don't mind I'd like to hear what service your building and how is everything working/supposed to work there.

Regarding twemproxy not able to support MULTI, client side sharding will work here and shouldn't be too hard to implement.

Suor · 2013-12-31T10:53:39Z

There is another way to not use MULTI. Rewrite all cache miss/invalidation logic in Lua, this way you can use twemproxy. You will however need to send invalidation request to all cache instances.

ttyS15 · 2013-12-31T14:37:14Z

In current version death of redis can kill whole service. With sharded version I can see several scenarios:

It`s just cache! Let it be slightly corrupted and in any case will be fully restored after expiration time. We miss only 1*k/n keys, where k - number of failed nodes and n - all nodes. Moreover some part of those keys will never be used. Not a problem.
Its just cache, but it must be consistent. Lets flush it if some nodes are died.
We like cache and not sure that we can survive cold start. Ok, lets make failover for each node and sleep well.

In all cases system wil survive failure without any trouble.

I really sorry. Happy New Year goes to me. I belive I will continue soon ...

Suor · 2013-12-31T14:42:10Z

Happy New Year to you, too )

I see that your approach can work. I also see downsides to it. And one of
them is significantly different code than one node scenario dictates. So
think a bit about client-side sharding I described above.

ttyS15 · 2014-01-17T16:54:23Z

Hi, Alexander

I have done some measurements. Check my test code and results, please. As you said smemers is quite bad. But I cousin get worked variant of slightly modified original algorithm. Please review benchmark-invalidation.pt and see results below.

Конфиг прокси

[sq@sq-twemproxy ~]$ cat nutcracker.conf
redis1:
listen: 192.168.144.170:6379
redis: true
hash_tag: "::"
hash: md5
distribution: ketama
auto_eject_hosts: true
timeout: 40000
server_retry_timeout: 2000
server_failure_limit: 1
servers:
- 192.168.144.151:6379:1
- 192.168.144.61:6379:1
[sq@sq-twemproxy ~]$

Результаты

/home/axeman/sandbox/django-cacheops/virtualenv/bin/python /home/axeman/sandbox/django-cacheops/bench_invalidation.py

Conj keys: 10, cache keys: 10

Backend: Redis - native
time: 0.00198602676392
Backend: TWEM - native
error: unsupported
Backend: Redis - twem_proxy_compat
time: 0.00952076911926
Backend: TWEM - twem_proxy_compat
time: 0.0115759372711
Backend: Redis - twem_proxy_compat_opt_1
time: 0.00291299819946
Backend: TWEM - twem_proxy_compat_opt_1
time: 0.0034761428833
Backend: Redis - twem_proxy_compat_opt_2
time: 0.00259709358215
Backend: TWEM - twem_proxy_compat_opt_2
time: 0.00280117988586
Backend: Redis - twem_proxy_compat_opt_3
time: 0.00199699401855
Backend: TWEM - twem_proxy_compat_opt_3
time: 0.00243401527405

Conj keys: 100, cache keys: 100

Backend: Redis - native
time: 0.0036518573761
Backend: TWEM - native
error: unsupported
Backend: Redis - twem_proxy_compat
time: 0.749498128891
Backend: TWEM - twem_proxy_compat
time: 0.17934012413
Backend: Redis - twem_proxy_compat_opt_1
time: 0.0529139041901
Backend: TWEM - twem_proxy_compat_opt_1
time: 0.133848905563
Backend: Redis - twem_proxy_compat_opt_2
time: 0.0429902076721
Backend: TWEM - twem_proxy_compat_opt_2
time: 0.0433089733124
Backend: Redis - twem_proxy_compat_opt_3
time: 0.00362396240234
Backend: TWEM - twem_proxy_compat_opt_3
time: 0.00596594810486

Conj keys: 100, cache keys: 1000

Backend: Redis - native
time: 0.0252318382263
Backend: TWEM - native
error: unsupported
Backend: Redis - twem_proxy_compat
time: 1.64295601845
Backend: TWEM - twem_proxy_compat
time: 2.11577010155
Backend: Redis - twem_proxy_compat_opt_1
time: 0.523754119873
Backend: TWEM - twem_proxy_compat_opt_1
time: 24.5513391495
Backend: Redis - twem_proxy_compat_opt_2
time: 0.4054210186
Backend: TWEM - twem_proxy_compat_opt_2
time: 0.411654949188
Backend: Redis - twem_proxy_compat_opt_3
time: 0.0190849304199
Backend: TWEM - twem_proxy_compat_opt_3
time: 0.107830047607

Conj keys: 1000, cache keys: 100

Backend: Redis - native
time: 0.00504016876221
Backend: TWEM - native
error: unsupported
Backend: Redis - twem_proxy_compat
time: 0.161504030228
Backend: TWEM - twem_proxy_compat
time: 0.373972892761
Backend: Redis - twem_proxy_compat_opt_1
time: 0.0837070941925
Backend: TWEM - twem_proxy_compat_opt_1
time: 0.322618961334
Backend: Redis - twem_proxy_compat_opt_2
time: 0.0658929347992
Backend: TWEM - twem_proxy_compat_opt_2
time: 0.068078994751
Backend: Redis - twem_proxy_compat_opt_3
time: 0.00423884391785
Backend: TWEM - twem_proxy_compat_opt_3
time: 0.00618290901184

Process finished with exit code 0

…oft lock (twem-proxy compatible)

ttyS15 · 2014-01-20T15:26:12Z

Hi, Alexander

What do you think about last version? I hope it is not so bad. It looks like good trade-off, IMHO. Would you accept it? :)

Suor · 2014-01-26T11:37:26Z

First, sorry for slow response. Second, sorry, but I won't merge it cause it goes against my own vision of how multiredis support should work in cacheops.

So I suggest for you to maintain your branch and use it as you like. I can link to it from README, at least until me or somebody else will come up with something better.

Anyway, good job! ) It will be interesting to see how fast/slow is cache miss in your version.

ttyS15 · 2014-02-07T16:05:31Z

Hi!
It`s a pity. :( But any case thank you for your attention! I will create branch in my fork as you suggested. I hope, I will sync it as long as possible.

I will get back to you if I will have interesting results.

Bye!

ttyS15 added 3 commits December 27, 2013 05:06

Merge pull request #2 from Suor/master

4af0c27

Merge from upstream

Re-implement redis transaction with spin-lock, required redis-server …

fbaf778

…> 2.6.12, not verified

Bug fixes, increase redis client version for "set .. nx"

c8c9c24

ttyS15 mentioned this pull request Dec 30, 2013

Feature: Multi-server support #35

Open

Suor reviewed Dec 30, 2013
View reviewed changes

Invalidations benchmark

d498dd1

ttyS15 added 2 commits January 20, 2014 19:16

Add atomic context with two different lock strategies: original and s…

ccf7f02

…oft lock (twem-proxy compatible)

Delete unused import

0b8fe6b

ttyS15 closed this Feb 7, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft changes for sharding cacheops #68

Draft changes for sharding cacheops #68

Uh oh!

ttyS15 commented Dec 30, 2013

Uh oh!

Suor Dec 30, 2013

Uh oh!

ttyS15 Dec 31, 2013

Uh oh!

Suor commented Dec 30, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

ttyS15 commented Jan 17, 2014

Uh oh!

ttyS15 commented Jan 20, 2014

Uh oh!

Suor commented Jan 26, 2014

Uh oh!

ttyS15 commented Feb 7, 2014

Uh oh!

Uh oh!

Draft changes for sharding cacheops #68

Draft changes for sharding cacheops #68

Uh oh!

Conversation

ttyS15 commented Dec 30, 2013

Uh oh!

Suor Dec 30, 2013

Choose a reason for hiding this comment

Uh oh!

ttyS15 Dec 31, 2013

Choose a reason for hiding this comment

Uh oh!

Suor commented Dec 30, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

ttyS15 commented Dec 31, 2013

Uh oh!

Suor commented Dec 31, 2013

Uh oh!

ttyS15 commented Jan 17, 2014

Uh oh!

ttyS15 commented Jan 20, 2014

Uh oh!

Suor commented Jan 26, 2014

Uh oh!

ttyS15 commented Feb 7, 2014

Uh oh!

Uh oh!