Skip to content
This repository has been archived by the owner on Nov 20, 2022. It is now read-only.

LIBUV: Crashes while closing handles #79

Closed
paulra1 opened this issue Dec 20, 2020 · 8 comments
Closed

LIBUV: Crashes while closing handles #79

paulra1 opened this issue Dec 20, 2020 · 8 comments

Comments

@paulra1
Copy link

paulra1 commented Dec 20, 2020

Hi Folks:

My Libuv based Server performs all its functions correctly except for TCP connection termination.

Each TCP connection has uv_tcp_t connection handle and uv_poll_t handle whose allocation
and operation are explained below. When the Protocol_Task() thread needs to terminate
a connection, it must stop polling, terminate the TCP socket connection, and deallocate
the handles.

NOTE: I am using the GIT HUB distribution from the following link on Ubuntu Linux version 15.04.

https://github.com/nikhilm/uvbook

The Libuv software package look like version 1.3.0.

I have had to take extraordinary measures to make connection release reliable.
The relevant code is included at near end of this message and the extraordinary
measures are in the CLOSE_KLUDGE sections. The difficulty arises because the
Libuv loops are not used in the Protocol_Task() yet it must affect operations
on those loops to release handles. It would be nice if Libuv included an API
for releasing handles reliably which could be called from any task.

Connection release still fails about 15% of the time in which case a crash occurs
and the following diagnostic is displayed.

pexd: src/unix/core.c:210: uv__finish_close: Assertion `!(handle->flags & UV_CLOSED)' failed.

More diagnostic information follows. Do you know what causes this crash ?

I strongly suspect using Linux recv() to read data is not optimal when epoll() is
being used. My understanding is that there is a way to pass buffers to epoll() such that
data will automatically be inserted in them when a UV_READABLE event occurs. Do you have
any advice about this ?

An overview of my Server follows.

Best Regards,

Paul Romero

Multi-Connection TCP Server Functional Architecture Overview

There is a connection descriptor for each incoming TCP connection which contains all data
needed to manage the connection and perform the relevant functions.

When the main() process detects an incoming TCP connection, it sends a notification message to the
IO_Trigger_Task(). The IO_Trigger_Task() then sets up epoll() monitoring of incoming TCP data
for that connection.

Subsequently, the IO_Task() invokes poll_callback() when incoming data is available, reads a chunk
of data, and sends a protocol message to the Protocol_Task() when a complete protocol message is
recognized.

The Timer_Task() sends an expiration notification message to the Protocol_Task() when a protocol
timer expires.

The Protocol_Task() send messages to the Send_Op_Task() for transmission across the network.
It spawns a DB Operation Task to perform slow data base operations and the DB Operation Task
notifies the Protocol_Task() when the operation is complete and then terminates.

Loops of type uv_loop_t

  • Connect_Loop
  • Pool_Loop
  • Timer_Loop`

Tasks: All Libuv thread tasks run concurrently and are launched by main() at startup time.

  • main(): A Linux process that runs the Connect_Loop to detect incoming TCP connections.
    The make_incoming_connection() callback routine accepts incoming connections and
    allocates a uv_tcp_t handle on a per connection basis. (See main.c)

  • IO_Trigger_Task(): A Libuv thread that sets up epoll() plumbing for the IO_Task()
    when an incoming TCP connection occurs. It allocates a uv_poll_t handle, on a per
    connection basis, and calls uv_poll_start() to initiate epoll() operation with the
    Poll_Loop in the IO_Task(). It configures the handle to detect UV_READABLE events and
    handles them with the poll_callback() routine. However, it does not run the Poll_Loop.
    (Basically, this task just sets up plumbing.) (See network_io.c)

  • IO_Task(): A Libuv thread that runs the Poll_Loop to handle incoming TCP data, on a per
    connection basis. The poll_callback() routine executes and uses normal Linux recv() to read
    chunks of data, in non-blocking mode, when a UV_READABLE event occurs.
    (See network_io.c)

  • Timer_Task(): A Libuv thread that runs the Time_Loop to handle ticks, and whose main
    function is to detect protocol timer expiration. The tick duration is configured with
    is configured with uv_timer_init() and uv_timer_start(), and ticks are handled by the
    timer_callback() routine. (See timer.c)

  • Protocol_Task(): A Libuv thread that handles protocol messages sent to it by the following tasks
    on per connection basis: IO_Task(), Timer_Task(), DB Operation Tasks. DB Operation Libuv thread tasks
    are spawned by the Protocol_Task() to perform slow database operations and send a notification message
    to the Protocol_Task() upon completion of the operation. (See protocol.c and database.c)

  • Send_Op_Task(): A Libuv thread that transmits all network bound messages with normal
    Linux send() on a per connection basis. (See transmit.c)

Crash Diagnostics

The crash occurs when run() is executing in the IO_Task() in network_io.c according to the following
GBD stack trace.

#0 0x00007f281754c267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007f281754deca in __GI_abort () at abort.c:89
#2 0x00007f281754503d in __assert_fail_base (fmt=0x7f28176a7028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x41e093 "!(handle->flags & UV_CLOSED)", file=file@entry=0x41e068 "src/unix/core.c",
line=line@entry=210, function=function@entry=0x41e2b0 <PRETTY_FUNCTION.9522> "uv__finish_close") at assert.c:92
#3 0x00007f28175450f2 in __GI___assert_fail (assertion=assertion@entry=0x41e093 "!(handle->flags & UV_CLOSED)",
file=file@entry=0x41e068 "src/unix/core.c", line=line@entry=210,
function=function@entry=0x41e2b0 <PRETTY_FUNCTION.9522> "uv__finish_close") at assert.c:101
#4 0x000000000040c967 in uv__finish_close (handle=) at src/unix/core.c:210
#5 uv__run_closing_handles (loop=0x638080 <Poll_Loop>) at src/unix/core.c:259
#6 uv_run (loop=0x638080 <Poll_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:326
#7 0x0000000000404962 in IO_Task (arg=0x0) at network_io.c:226
#8 0x0000000000412ad7 in uv__thread_start (arg=) at src/unix/thread.c:49
#9 0x00007f2817bf06aa in start_thread (arg=0x7f2816d15700) at pthread_create.c:333
#10 0x00007f281761deed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

However, the GDB thread information indicates that RELEASE_CONNECTION(), in protocol.c, is executing
in the Protocol_Task() when the crash occurs.

Id Target Id Frame
6 Thread 0x7f2817516700 (LWP 3424) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
5 Thread 0x7f2816514700 (LWP 3426) pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
4 Thread 0x7f2818003700 (LWP 3423) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
3 Thread 0x7f2815512700 (LWP 3428) pthread_cond_wait@@GLIBC_2.3.2 ()
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
2 Thread 0x7f2815d13700 (LWP 3427) 0x0000000000404500 in RELEASE_CONNECTION (cdesc=0x6384c0 <Conn_Desc_Table>)
at protocol.c:357

  • 1 Thread 0x7f2816d15700 (LWP 3425) 0x00007f281754c267 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:55

Line 357 of protocol.c is as follows.

	while(WaitClose[cdesc->index]);

Wait_Close[] is only modified in two cases and only in the Protocol_Task().

  1. It is initialized to a handle address in RELEASE_CONNECTION() in the Protocol_Task().
  2. It is cleared in the uv_close() callback routine close_callback().

Code

#define CLOSE_KLUDGE

extern uv_loop_t Poll_Loop;
extern uv_loop_t Connect_Loop;

#ifdef CLOSE_KLUDGE
uv_handle_t *WaitClose[MAX_CONN_DESC] = { NULL };
#endif // CLOSE_KLUDGE

ROUTINE void close_callback(uv_handle_t *handle)
{
int k;

free(handle);

#ifdef CLOSE_KLUDGE
//
// Determine if the handle is being closed.
//
for(k = 0; k < MAX_CONN_DESC; k++)
{
if(WaitClose[k] == handle)
{
//
// Closure is complete.
//
WaitClose[k] = NULL;
break;
}
}
#endif // CLOSE_KLUDGE

return;

}

ROUTINE void RELEASE_CONNECTION(CONN_DESC *cdesc)
{
uv_async_t as_handle;
struct linger spec;

if(N_Sockets > 0)
	N_Sockets--;
//
// This causes immediate socket disconnection when it is closed.
//
spec.l_onoff = TRUE;
spec.l_linger = 0;
setsockopt(cdesc->fd, SOL_SOCKET, SO_LINGER, &spec, sizeof(spec) );

if(cdesc->poll_handle)
  {

#ifdef CLOSE_KLUDGE
WaitClose[cdesc->index] = (uv_handle_t *) cdesc->poll_handle;
#endif // CLOSE_KLUDGE
//
// Deactive and release the poll handle.
// You have stop the Poll_Loop to deactivate and deallocate the poll handle.
//
uv_stop(&Poll_Loop);

	uv_poll_stop(cdesc->poll_handle);
	uv_close((uv_handle_t *) cdesc->poll_handle, close_callback);
	//
	// Wake up the Poll_Loop in the IO_Task()
	//
	uv_async_init(&Poll_Loop, &as_handle, NULL);
	uv_async_send(&as_handle);
	uv_close((uv_handle_t *) &as_handle, NULL);

#ifdef CLOSE_KLUDGE
//
// Wait for the handle to be closed and deallocated.
//
while(WaitClose[cdesc->index]);
#endif // CLOSE_KLUDGE
}

if(cdesc->conn_handle)
  {

#ifdef CLOSE_KLUDGE
WaitClose[cdesc->index] = (uv_handle_t *) cdesc->conn_handle;
#endif // CLOSE_KLUDGE
//
// Close and deallocate the connect handle in order to close the socket connecction.
// You have to wake up the Connect_Loop for the close_callback()
// routine to execute.
//
uv_close((uv_handle_t *) cdesc->conn_handle, close_callback);
//
// Wake up the Connect_Loop in the main() process.
//
uv_async_init(&Connect_Loop, &as_handle, NULL);
uv_async_send(&as_handle);
uv_close((uv_handle_t *) &as_handle, NULL);
#ifdef CLOSE_KLUDGE
//
// Wait for the handle and socket connection to be release and closed.
//
while(WaitClose[cdesc->index]);
#endif // CLOSE_KLUDGE
}

ENTER_MUTEX(&Service_Q_Mutex);
DELETE_CONN(cdesc);
cdesc->fd = -1;
flush_msg(&cdesc->task_input_q);
EXIT_MUTEX(&Service_Q_Mutex);

return;

}

@paulra1
Copy link
Author

paulra1 commented Dec 20, 2020

Also, I am new to github and not sure how to attach files. Please advise.

@paulra1
Copy link
Author

paulra1 commented Dec 20, 2020

Hi Folks:

With limited testing the problem ceases to happen if you force uv_run() in the IO_Task()
enough to finish its pending work. As an interim measure I do this by making the
Protocol_Task() to yield the CPU after calling uv_stop() and up_poll_stop() as follows in
the RELEASE_CONNECTION() routine. This appears to cause IO_Task() to be scheduled and run
but I am not all all convinced this is a reliable technique.

	//
	// Deactive and release the poll handle.
	// You have stop the Poll_Loop to deactivate and deallocate the poll handle.
	//
	uv_stop(&Poll_Loop);

	uv_poll_stop(cdesc->poll_handle);

#ifdef CLOSE_KLUDGE2
//
// Try to let run() in the IO_Task() finish pending work by yielding the CPU.
//
for(k = 0; k < 10; k++) pthread_yield();
#endif // CLOSE_KLUDGE2
uv_close((uv_handle_t *) cdesc->poll_handle, close_callback);

Best Regards,

Paul R.

@paulra1
Copy link
Author

paulra1 commented Dec 22, 2020

Hi Folks:

I think I see the heart of my problem. Everything appears to work correctly
when you establish an incoming TCP connection and release it and the related
Libuv handles--the uv_tcp_t connection handle and the uv_poll_t poll handle.
(I revised the release code to do things the right way.)

The comments about coalescing of uv_async_send() calls in the documentation is somewhat misleading.
They should indicate that call with the same handle are synchronous. Also, I suspect that
uv_async_send() is not reentrant.

When you attempt another incoming connection the following things occur.

Notice in 2.2, below that uv_start_loop() executes without being called. This
doesn't make sense to me--at least on the surface. Can you think of a possible
reason this occurs ?

  1. The connection is successfully established--with uv_accept(), and the
    socket descriptor fd is the same as was used in the previous connection,
    in the main() process. (Occurs with uv_loop_t Session_Loop.)

     conn_handle = (uv_tcp_t *) malloc(sizeof(uv_tcp_t));
    

    if(conn_handle == NULL)
    {
    fprintf(stderr, "MAIN: No Connect Handle Memory\n");
    abort();
    }

    uv_tcp_init(&Connect_Loop, conn_handle);
    if(uv_accept(listen_handle, (uv_stream_t *) conn_handle) == 0)
    {
    uv_os_fd_t fd;

     uv_fileno((const uv_handle_t*) conn_handle, &fd);
    }
    

2.1) A poll handle is successfully allocated in the IO_Trigger_Task() thread.
(No loop involved.)

	poll_handle = (uv_poll_t *) malloc(sizeof(uv_poll_t));
	if(poll_handle == NULL)
	  {
		fprintf(stderr, "IO_TRIGGER_TASK: No Poll HAndle Memory\n");
		abort();
	  }

	uv_poll_init(&Poll_Loop, poll_handle, pm->info);
	if((r = uv_poll_start(poll_handle, UV_READABLE, poll_callback)) < 0)
	  {
		fprintf(stderr, "IO_TRIGGER_TASK: Polling Initiation Error %d: %s\n", r, uv_err_name(r));
		abort();
	  }

2.2) uv_poll_start() is invoked via a call.

	uv_poll_init(&Poll_Loop, poll_handle, pm->info);
	if((r = uv_poll_start(poll_handle, UV_READABLE, poll_callback)) < 0)
	  {
		fprintf(stderr, "IO_TRIGGER_TASK: Polling Initiation Error %d: %s\n", r, uv_err_name(r));
		abort();
	  }

2.3) uv_poll_start() executes again without being called !

This is what you see in GDB which is very strange since I know there is only
one instance of the IO_Trigger_Task() running and it was not called a second time
because the line before line 212 didn't execute a second time.

Breakpoint 1, IO_Trigger_Task (arg=0x0) at network_io.c:212
212 if((r = uv_poll_start(poll_handle, UV_READABLE, poll_callback)) < 0)
(gdb) bt
#0 IO_Trigger_Task (arg=0x0) at network_io.c:212
#1 0x0000000000413017 in uv__thread_start (arg=) at src/unix/thread.c:49
#2 0x00007ffff7bc26aa in start_thread (arg=0x7ffff64e6700) at pthread_create.c:333
#3 0x00007ffff75efeed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb) s
uv_poll_start (handle=0x7fffec0008c0, pevents=1, poll_cb=0x404599 <poll_callback>) at src/unix/poll.c:89
89 assert((pevents & ~(UV_READABLE | UV_WRITABLE)) == 0);
(gdb) s
86 int uv_poll_start(uv_poll_t* handle, int pevents, uv_poll_cb poll_cb) {
(gdb) bt
#0 uv_poll_start (handle=0x7fffec0008c0, pevents=1, poll_cb=0x404599 <poll_callback>) at src/unix/poll.c:86
#1 0x00000000004048bb in IO_Trigger_Task (arg=0x0) at network_io.c:212
#2 0x0000000000413017 in uv__thread_start (arg=) at src/unix/thread.c:49
#3 0x00007ffff7bc26aa in start_thread (arg=0x7ffff64e6700) at pthread_create.c:333
#4 0x00007ffff75efeed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
(gdb)

This is the relevant code of the IO_Trigger_Task() thread.

for(;;)
  {
	//
	// Wait for a message from the main() process.
	//
	pm  = WAIT_IO_Trigger_MSG();

	poll_handle = (uv_poll_t *) malloc(sizeof(uv_poll_t));
	if(poll_handle == NULL)
	  {
		fprintf(stderr, "IO_TRIGGER_TASK: No Poll HAndle Memory\n");
		abort();
	  }

	uv_poll_init(&Poll_Loop, poll_handle, pm->info);
	//
	// Start epoll() monitoring of the connection.
	//
	if((r = uv_poll_start(poll_handle, UV_READABLE, poll_callback)) < 0)
	  {
		fprintf(stderr, "IO_TRIGGER_TASK: Polling Initiation Error %d: %s\n", r, uv_err_name(r));
		abort();
	  }

	MSG_FREE(pm);
  }

2.4) The polling callback function never executes.

NOTE: The polling loop, Poll_Loop, of type uv_loop_t is already running and was started,
in the IO_Task() thread, at startup time as follows.

uv_loop_init(&Poll_Loop);
for(;;)
  {
	r = uv_run(&Poll_Loop, UV_RUN_DEFAULT);
	if(r)
		fprintf(stderr, "IO_TASK: Run Error %d\n", r);
  }

This is the sequence of operations used to free the first connection.

  1. Release the uv_poll_t poll handle in the IO_Task() from the Protocol_Task()

    //
    // This causes immediate socket disconnection when it is closed.
    //
    spec.l_onoff = TRUE;
    spec.l_linger = 0;
    setsockopt(cdesc->fd, SOL_SOCKET, SO_LINGER, &spec, sizeof(spec) );
    //
    // poll_release_proxy() executes in the IO_Task() and releases poll handle.
    //
    uv_async_init(&Poll_Loop, &cdesc->async_handle, poll_release_proxy);
    cdesc->async_handle.data = (void *) cdesc;
    uv_async_send(&cdesc->async_handle);

1.1) Wait for poll handle to be freed and then release the async. handle.

uv_close((uv_handle_t *) &cdesc->async_handle, NULL);
  1. Release the uv_tcp_t connect handle in the main() process from the Protocol_Task()

    uv_async_init(&Connect_Loop, &cdesc->async_handle, conn_release_proxy);
    cdesc->async_handle.data = (void *) cdesc;
    uv_async_send(&cdesc->async_handle);

2.1) Wait for the connect handle to be free and then release the async. handle.

uv_close((uv_handle_t *) &cdesc->async_handle, NULL);
  1. Do protocol bookkeeping.

This is the code of the proxy callback routines and the close callback routine.

//
// This routine executes asynchronously and frees a handle.
// It is invoked in the follows two cases.
//
// * When the main process invokes poll_release_proxy()
//
// * When the IO_Task invokes conn_release_proxy().
//
ROUTINE void close_callback(uv_handle_t *handle)
{
SDU *msg;

CONN_DESC *cdesc = (CONN_DESC *) handle->data;

free(handle);

ENTER_MUTEX(&Service_Q_Mutex);
//
// Set the state correctly and validate the state.
//
switch(cdesc->release_state)
{
case RS_POLL_HANDLE_PEND:
cdesc->release_state = RS_POLL_HANDLE_FREE;
break;
case RS_CONN_HANDLE_PEND:
cdesc->release_state = RS_CONN_HANDLE_FREE;
break;
default:
fprintf(stderr, "CLOSE_PROXY - BUG: Invalid Release State = %d\n", cdesc->release_state);
abort();
}
EXIT_MUTEX(&Service_Q_Mutex);

//
// Send a notification message to the Protocol_Task.
//
msg = MSG_ALLOC(0, FALSE);
msg->class = C_NOTIFY;
msg->type = T_HANDLE_FREE;
msg->info = 0;

SEND_SDU(cdesc, msg);

return;

}

//
// This routine is invoked by the IO_Task() in response to an async. wakeup by the Protocol_Task()
// during TCP connection termination. It release the resources used by the Poll_Loop.
//
ROUTINE void poll_release_proxy(uv_async_t *async_handle)
{
CONN_DESC *cdesc = (CONN_DESC *) async_handle->data;

//
// Stop polling operations before closing the handle.
//
uv_poll_stop(cdesc->poll_handle);
cdesc->poll_handle->data = (void *) cdesc;
uv_close((uv_handle_t *) cdesc->poll_handle, close_callback);

return;

}

//
// This routine is invoked by the main process in response to an async. wakeup by the Protocol_Task()
// during TCP connection termination. It release the resources used by the Connect_Loop.
//
ROUTINE void conn_release_proxy(uv_async_t *async_handle)
{
CONN_DESC *cdesc = (CONN_DESC *) async_handle->data;

cdesc->conn_handle->data = (void *) cdesc;
uv_close((uv_handle_t *) cdesc->conn_handle, close_callback);

return;

}

Best Regards,

Paul R.

@paulra1
Copy link
Author

paulra1 commented Dec 22, 2020

Hi Folks:

This is what happens at a lower level. Do you have insight about the cause ?
The relevant code segments and the GDB stack traces follow below.

The main() process and IO_Task() are executing concurrently.

NOTE: The IO_Trigger_Task() has been eliminated and the uv_start_poll() call is invoked in the IO_Task()
by a proxy routine and an async. wakeup in the main() process. The wakeup is done in the make_incoming_connection()
callback routine by the following code segment.

poll_handle->data = (void *) cdesc;
//
// Start epoll() monitoring of the connection. The poll_start_proxy()
// routine executes in the IO_Task().
//
uv_async_init(&Poll_Loop, &cdesc->async_handle, poll_start_proxy);
cdesc->async_handle.data = (void *) cdesc;
uv_async_send(&cdesc->async_handle);
  • The main() process is executing uv__finish_close() which is invoked via uv_run() and uv__run_closing_handles().
    It crashes in the call to uv__finish_close()

    This is the code from unix/core.c: Line 210 is the first assert() statement.


static void uv__finish_close(uv_handle_t* handle) {
/* Note: while the handle is in the UV_CLOSING state now, it's still possible

  • for it to be active in the sense that uv__is_active() returns true.

  • A good example is when the user calls uv_shutdown(), immediately followed

  • by uv_close(). The handle is considered active at this point because the

  • completion of the shutdown req is still pending.
    /
    #ifndef SUPRESS
    assert(handle->flags & UV_CLOSING);
    assert(!(handle->flags & UV_CLOSED));
    #endif /
    SUPRESS */
    handle->flags |= UV_CLOSED;

  • The IO_Task() thread is executing uv_poll_start() which is called from start_poll_proxy() in
    response to an async. request from the main() process. (uv__async_event() makes the call to
    start_poll_proxy()).

This is the code that executes in the IO_Task() from network_io.c

//
// Executes in IO_Task
//
ROUTINE void poll_start_proxy(uv_async_t *async_handle)
{
int r;

CONN_DESC *cdesc = (CONN_DESC *) async_handle->data;

uv_poll_init(&Poll_Loop, cdesc->poll_handle, cdesc->fd);
if((r = uv_poll_start(cdesc->poll_handle, UV_READABLE, poll_callback)) < 0)
  {
	fprintf(stderr, "PROXY: IO_TASK - Polling Initiation Error %d: %s\n", r, uv_err_name(r));
	abort();
  }
uv_close( (uv_handle_t *) async_handle, NULL);

return;

}

This is the code from unix/poll.c: Line 92 is the call to uv__poll_stop()

int uv_poll_start(uv_poll_t* handle, int pevents, uv_poll_cb poll_cb) {
int events;

assert((pevents & ~(UV_READABLE | UV_WRITABLE)) == 0);
assert(!(handle->flags & (UV_CLOSING | UV_CLOSED)));

uv__poll_stop(handle);

if (pevents == 0)
return 0;

events = 0;
if (pevents & UV_READABLE)
events |= UV__POLLIN;
if (pevents & UV_WRITABLE)
events |= UV__POLLOUT;

uv__io_start(handle->loop, &handle->io_watcher, events);
uv__handle_start(handle);
handle->poll_cb = poll_cb;

return 0;
}

main() Stack Trace

#0 0x00007ffff751e267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007ffff751feca in __GI_abort () at abort.c:89
#2 0x00007ffff751703d in __assert_fail_base (fmt=0x7ffff7679028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x41e678 "handle->flags & UV_CLOSING", file=file@entry=0x41e668 "src/unix/core.c",
line=line@entry=210, function=function@entry=0x41e8b0 <PRETTY_FUNCTION.9522> "uv__finish_close")
at assert.c:92
#3 0x00007ffff75170f2 in __GI___assert_fail (assertion=assertion@entry=0x41e678 "handle->flags & UV_CLOSING",
file=file@entry=0x41e668 "src/unix/core.c", line=line@entry=210,
function=function@entry=0x41e8b0 <PRETTY_FUNCTION.9522> "uv__finish_close") at assert.c:101
#4 0x000000000040cd60 in uv__finish_close (handle=) at src/unix/core.c:210
#5 uv__run_closing_handles (loop=0x830680 <Connect_Loop>) at src/unix/core.c:261
#6 uv_run (loop=0x830680 <Connect_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:328
#7 0x0000000000403e0f in main () at main.c:184

IO_Task() Stack Trace

#0 uv__poll_stop (handle=0x831bc0) at src/unix/poll.c:73
#1 0x000000000040e93f in uv_poll_start (handle=0x831bc0, pevents=1, poll_cb=0x404576 <poll_callback>)
at src/unix/poll.c:92
#2 0x00000000004047d4 in poll_start_proxy (async_handle=0x639468 <Conn_Desc_Table+40>) at network_io.c:146
#3 0x000000000040c1dd in uv__async_event (loop=0x639000 <Poll_Loop>, w=, nevents=)
at src/unix/async.c:89
#4 0x000000000040c29e in uv__async_io (loop=0x639000 <Poll_Loop>, w=0x6391c8 <Poll_Loop+456>,
events=) at src/unix/async.c:160
#5 0x000000000041590d in uv__io_poll (loop=loop@entry=0x639000 <Poll_Loop>, timeout=-1) at src/unix/linux-core.c:319
#6 0x000000000040cac7 in uv_run (loop=0x639000 <Poll_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:326
#7 0x000000000040484e in IO_Task (arg=0x0) at network_io.c:171
#8 0x0000000000412eb7 in uv__thread_start (arg=) at src/unix/thread.c:49
#9 0x00007ffff7bc26aa in start_thread (arg=0x7ffff6ce7700) at pthread_create.c:333
#10 0x00007ffff75efeed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Best Regards,

Paul R.

@paulra1
Copy link
Author

paulra1 commented Dec 24, 2020

Hi Folks:

After taking several wrong paths I isolated the problem.

It occurs when uv_close() is called as follows to close the async. handle
used to initiate an async. operation with uv_async_send().

 uv_close((uv_handle_t *) &cdesc->async_handle, NULL);

Now if another async. operation is initiated with the same handle,
after calling uv_async_init(), it triggers uv__finish_close() to be
invoked in both the main() process and IO_Task() thread concurrently.
Then the following familiar crash occurs at the assert().

#0 0x00007f256fdf0267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1 0x00007f256fdf1eca in __GI_abort () at abort.c:89
#2 0x00007f256fde903d in __assert_fail_base (fmt=0x7f256ff4b028 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x4261e0 "handle->flags & UV_HANDLE_CLOSING",
file=file@entry=0x426115 "src/unix/core.c", line=line@entry=254,
function=function@entry=0x426480 <PRETTY_FUNCTION.10077> "uv__finish_close") at assert.c:92
#3 0x00007f256fde90f2 in __GI___assert_fail (assertion=assertion@entry=0x4261e0 "handle->flags & UV_HANDLE_CLOSING",
file=file@entry=0x426115 "src/unix/core.c", line=line@entry=254,
function=function@entry=0x426480 <PRETTY_FUNCTION.10077> "uv__finish_close") at assert.c:101
#4 0x0000000000410149 in uv__finish_close (handle=) at src/unix/core.c:254
#5 uv__run_closing_handles (loop=0x839880 <Connect_Loop>) at src/unix/core.c:317
#6 uv_run (loop=0x839880 <Connect_Loop>, mode=UV_RUN_DEFAULT) at src/unix/core.c:395
#7 0x0000000000404b5f in main () at main.c:188

There are two obvious possible causes.

  1. The async. operation is not really complete.
  2. A bug in Libuv.

This problem stops occurring if you don't call uv_close(). This is not a problem in my
application, since the cdesc->async->handle is not dynamically allocated, but I am not sure
if the same is true for the Libuv software. (My application is Server that handles large numbers
of concurrent TCP connections and is a Linux Daemon.)

The Libuv documentation states that a uv_async_t handle remains activated until uv_close()
is called. Are there negative Libuv software consequences if an async. handle is left
activated ? My guess is no.

If this is a Libuv problem, perhaps dealing with it may be better to deal with it at the
documentation rather than code level.

It appears that caolescing of uv_async_send() calls is still an issue.

Best Regards,

Paul R.

PS: The reason I am considering using Libuv is that it provides a convenient epoll() wrapper
and epoll() allows very quick data reception without user space file descriptor polling or
the like.

@paulra1
Copy link
Author

paulra1 commented Dec 28, 2020

Hi Folks:

A .tar file containing my prototype Server code is attached to this note. Under moderate testing,
with 5 concurrent TCP connections, the Server behaves correctly. Look at the following items
in the code to understand the details of how it uses async. operations.

  • BOOL Async_Busy: A global variable protected by a Mutex.

  • uv_mutex_t Async_Mutex: A Mutex used in async. operations.

  • cdesc->async_handle: A per connection statically allocated per connection uv_async_t handle.

  • cdesc->release_state: A per connection state variable used to manage async. operations performed
    for the purpose of releasing Libuv resources during TCP socket connection release.

Note that LIVUV_RELEASE is defined, identifies the relevant code sections, and the following
routines are essential to async. operations.

  • ASYNC_WAIT(): Waits for an async. operation to complete.

  • ASYNC_BUSY(): Indicates whether an async. operation is in progress on not.

  • poll_start_proxy(): Starts polling for a specific uv_poll_t handle with uv_poll_start().

  • start_conn_release(): Stops polling of a a specific uv_poll_t handle and releases it,
    and termination of a TCP connection on a specific uv_tcp_t handle its release.

  • continue_conn_release(): Finishes release of uv_tcp_t handle and the underlying TCP connection.

  • close_callback(): Informs another task that a handle has been released.

What I need to do is have one task, A, prompt another task, B, to perform an operation
on an object owned by task B. The description below describes how my Server does so.
You indicated that uv_async_send() is not a queuing mechanism but suggested there is
a safe way to queue data for a loop orient task. I don't know how Libuv queues work
for such a task and how another task can convey "messages" to a loop oriented task.
Perhaps, there is a section of the documentation that describes such a mechanism
but I have not found it yet.

Could you point me in the right direction ? I think the procedure described below
is unsafe. Even though uv_async_send() is reentrant the fact that you need to use
uv_async_init(), which is not reentrant, to start async. operation effectively
makes it non-reentrant. Also, I think if you can prevent uv_asyn_init() from being
called while an async. operation is in progress you can circumvent the problem.

This is a logical description of what occurs in my Server. For purposes of simplicity,
assume we have only the following two tasks which could be a thread task or Linux main
procedure.

Task A: A task which can receive messages via a condition variable and input queue with uv_cond_wait().

Task B: A loop oriented task that is executing uv_run()

Then we perform the following operations in order to make task B perform the desired operation.

  1. Task A sends a message to task B.

Currently this is done with uv_async_send() after the proxy routine to
execute in task B had been specified with uv_async_init().

  1. Task B performs an operation on a object it owns.

The proxy routine executes in task B and performs the operation.

  1. Task B informs Task A the operation is complete.

Upon completion of the operation Task B sends a notification message to task A
with a conventional queue insertion and uv_cond_send().

Best Regards,

Paul R.

/tmp/code.tar

@paulra1
Copy link
Author

paulra1 commented Dec 29, 2020

Hi Folks:

A version of ASYNC_WAIT() with the right logic follows. As I'm sure you guessed,
after moderately heavy testing with the old incorrect version, the predictable
consequences ensued.

My basic strategy for managing the uv_async_send() "channel" is to treat it
as a global resource. The Async_Busy variable is used to guarantee that only
1 task can access only 1 instance of the channel at any given time.

//
// Wait for the async. channel to become available.
//
ROUTINE void ASYNC_WAIT()
{
struct timespec tsec;
BOOL done, wait;

//
// 20 Milliseconds.
//
tsec.tv_sec = 0;
tsec.tv_nsec = 20000000;		// 20 MS

done = FALSE;
while(done != TRUE)
  {
	wait = FALSE;

ENTER_MUTEX(&Async_Mutex);
if(Async_Busy)
wait = TRUE;
else
{
Async_Busy = TRUE;
done = TRUE;
}
EXIT_MUTEX(&Async_Mutex);
if(wait)
{
nanosleep(&tsec, NULL);
}
}

return;

}

Best Regards,

Paul R.

@nikhilm
Copy link
Owner

nikhilm commented Dec 30, 2020

This is not a repository for libuv. If you are reporting a bug in libuv, please do so at https://github.com/libuv/libuv. It is quite possible that the version of libuv included in the book has had several bugs fixed since this is book is really old.
Much of the book's content has been folded into the official libuv documentation.

@nikhilm nikhilm closed this as completed Jan 2, 2021
@nikhilm nikhilm mentioned this issue Jan 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants