Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead lock if cancellation is requested while there is a PING request on the fly #29

Closed
nejati-deriv opened this issue Sep 22, 2022 · 9 comments

Comments

@nejati-deriv
Copy link

I'm not sure but problem seems to be here.
If I am not mistaken you used close_on_run_completion flag exactly to prevent this situation but I don't know why it leads to dead lock any way.

You can reproduce it with a loop like:

for (;;)
{
    auto timer = asio::steady_timer{ executor, std::chrono::seconds{ 1 } };
    co_await (connection.async_run(endpoint, asio::use_awaitable) || timer.async_wait(asio::use_awaitable));
}

if you change timer to something shorter than ping time like 800ms it will not lead to dead lock.

@mzimbres
Copy link
Collaborator

Can you please try what I said here #28 (comment) and let me know if the problem persists.

@mzimbres
Copy link
Collaborator

I can't reproduce this behaviour. I have used this code #28 (comment) with a timeout of 1s and 2s and it keeps printing retrying. A deadlock may occur for example when Redis for any reason sends unsolicited messages and you don't consume them, can you please add a coroutine to consume them, for example

net::awaitable<void> push_receiver(std::shared_ptr<connection> db)
{
   for (std::vector<node<std::string>> resp;;) {
      co_await db->async_receive_push(adapt(resp));
      print_push(resp);
      resp.clear();
   }
}

@nejati-deriv
Copy link
Author

Adding more connections increases the chance:

#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>

#include <aedis.hpp>
#include <chrono>
#include <iostream>

namespace net = boost::asio;
using namespace net::experimental::awaitable_operators;
using aedis::endpoint;
using connection = aedis::connection<>;

net::awaitable< void >
reconnect()
{
    auto db1   = connection{ co_await net::this_coro::executor };
    auto db2   = connection{ co_await net::this_coro::executor };
    auto db3   = connection{ co_await net::this_coro::executor };
    auto db4   = connection{ co_await net::this_coro::executor };
    auto timer = net::steady_timer{ co_await net::this_coro::executor };
    for (int i = 0;; i++)
    {
        timer.expires_after(std::chrono::milliseconds{ 1005 }); // Note extra 5ms
        endpoint ep{ "127.0.0.1", "6379" };
        co_await (db1.async_run(ep, net::use_awaitable) || db2.async_run(ep, net::use_awaitable) ||
                  db3.async_run(ep, net::use_awaitable) || db4.async_run(ep, net::use_awaitable) ||
                  timer.async_wait(net::use_awaitable));
        std::cout << i << std::endl;
    }
}

int
main()
{
    net::io_context ioc;
    net::co_spawn(ioc, reconnect(), net::detached);
    ioc.run();
}

@mzimbres
Copy link
Collaborator

Does this trigger a sanitizer check on your machine?

@nejati-deriv
Copy link
Author

Yes alloc-dealloc-mismatch and use-of-uninitialized-value with a gigantic error messages.
alloc-dealloc-mismatch.txt
use-of-uninitialized-value.txt

@mzimbres
Copy link
Collaborator

I have made many improvements in cancellation in master. I can't reproduce this problem anymore, so it might have been fixed.

@nejati-deriv
Copy link
Author

nejati-deriv commented Oct 24, 2022

Thanks for you work,
I can still reproduce it with the following code: (delay 1 ms)

#include <boost/asio.hpp>
#include <boost/asio/experimental/awaitable_operators.hpp>

#include <aedis.hpp>
#include <chrono>
#include <iostream>

namespace net = boost::asio;
using namespace net::experimental::awaitable_operators;
using aedis::endpoint;
using connection = aedis::connection<>;

net::awaitable< void >
reconnect()
{
    auto db1   = connection{ co_await net::this_coro::executor };
    auto db2   = connection{ co_await net::this_coro::executor };
    auto db3   = connection{ co_await net::this_coro::executor };
    auto db4   = connection{ co_await net::this_coro::executor };
    auto timer = net::steady_timer{ co_await net::this_coro::executor };
    for (int i = 0;; i++)
    {
        timer.expires_after(std::chrono::milliseconds{ 1 });   // Note 1 ms
        endpoint ep{ "172.17.0.1", "6379" };
        co_await (db1.async_run(ep, {}, net::use_awaitable) || db2.async_run(ep, {}, net::use_awaitable) ||
                  db3.async_run(ep, {}, net::use_awaitable) || db4.async_run(ep, {}, net::use_awaitable) ||
                  timer.async_wait(net::use_awaitable));
        std::cout << i << std::endl;
    }
}

int
main()
{
    net::io_context ioc;
    net::co_spawn(ioc, reconnect(), net::detached);
    ioc.run();
}

@mzimbres
Copy link
Collaborator

Thanks for insisting :) I learned many sutilities in the Asio cancellation mechanism. This commit should fix the problem bac27c1. Let me know if it works for you.

@nejati-deriv
Copy link
Author

Seems that fixed the problem, thanks 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants