Skip to content

Commit

Permalink
More translation of docs/en/io.md
Browse files Browse the repository at this point in the history
  • Loading branch information
zhujiashun committed Sep 26, 2017
1 parent 014f7fc commit 576c6b2
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions docs/en/io.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,21 @@
Generally there are three ways of IO operations:

- blocking IO: after the IO operation is issued, the current thread is blocked until the process of IO ends, which is a kind of synchronous IO, such as the default action of posix [read](http://linux.die.net/man/2/read) and [write](http://linux.die.net/man/2/write).
- non-blocking IO: If there is nothing to read or overcrowded to write, the API will return immediately with an error code. Non-blocking IO is often used with [poll](http://linux.die.net/man/2/poll), [select](http://linux.die.net/man/2/select), [epoll](http://linux.die.net/man/4/epoll) in Linux or [kqueue](https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2) in BSD.
- non-blocking IO: If there is nothing to read or overcrowded to write, the API will return immediately with an error code. Non-blocking IO is often used with IO multiplexing([poll](http://linux.die.net/man/2/poll), [select](http://linux.die.net/man/2/select), [epoll](http://linux.die.net/man/4/epoll) in Linux or [kqueue](https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2) in BSD).
- asynchronous IO: you call an API to start a read/write operation, and the framework calls you back when it is done, such as [OVERLAPPED](https://msdn.microsoft.com/en-us/library/windows/desktop/ms684342(v=vs.85).aspx) + [IOCP](https://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx) in Windows. Native AIO in Linux is only supported for files.

Non-blocking IO is usually used to increabse IO concurrency in Linux. When the IO concurrency is low, non-blocking IO is not necessarily more efficient than blocking IO, since blocking IO is handled completely by the kernel and system calls like read/write are highly optimized which are apparently more effective.
IO multiplexing is usually used to increabse IO concurrency in Linux. When the IO concurrency is low, IO multiplexing is not necessarily more efficient than blocking IO, since blocking IO is handled completely by the kernel and system calls like read/write are highly optimized which are apparently more effective. But with the increasement of IO concurrency, the drawbacks of blocking one thread in blocking IO is revealed: the kernel kept switching between threads to do effective works, and a cpu core may only do a little bit of works, immediately replaced by another thread, causing cpu cache not fully utilized. In addition a large number of threads will make performance of code dependent on thread-local variables significantly decreased, such as tcmalloc. Once malloc slows down, the overall performance of the program will often decrease. While IO multiplexing is typically composed of a small number of event dispatching threads and some worker threads that run user code, event dispatching and worker can run simultaneously at the same time and kernel can do the job without frequent switching. There is no need to have many threads, so the use of thread-local variables is also more adequate, in which time IO multiplexing is faster than blocking IO. But IO multiplexing also has its own problems, it needs to call more system calls, such as[epoll_ctl](http://man7.org/linux/man-pages/man2/epoll_ctl.2.html). Since a red-black tree is used inside epoll, epoll_ctl is not a very fast operation, especially in multi-threaded environment. Implementations dependent on epoll_ctl is often confronted with tricky scalability problem. IO multiplexing has to solve a lot of multi-thresaded problems, the code is much more complex than that using blocking IO.

# Receiving messages

A message is a fix-length binary data read from a connection, which may come from the request from upstream client or the reponse from downstream server. Brpc uses one or serveral [EventDispatcher](https://github.com/brpc/brpc/blob/master/src/brpc/event_dispatcher.cpp)(referred to as EDISP) waiting for events from any fd. Unlike the common IO threads, EDISP is not responsible for reading or writing. The problem of IO threads is that one thread can only read one fd at a given time, so some read requests may starve when many budy fds are assigned to one IO thread. Features like multi-tenant, flow scheduling and [Streaming RPC](streaming_rpc.md) will aggravate the problem. The occasional long delayed read at high load also slows down the reading of all fds in an IO thread, which has a great impact on usability.
A message is a fix-length binary data read from a connection, which may be a request from upstream clients or a response from downstream servers. Brpc uses one or serveral [EventDispatcher](https://github.com/brpc/brpc/blob/master/src/brpc/event_dispatcher.cpp)(referred to as EDISP) waiting for events from any fd. Unlike the common IO threads, EDISP is not responsible for reading or writing. The problem of IO threads is that one thread can only read one fd at a given time, so some read requests may starve when many budy fds are assigned to one IO thread. Features like multi-tenant, flow scheduling and [Streaming RPC](streaming_rpc.md) will aggravate the problem. The occasional long delayed read at high load also slows down the reading of all fds in an IO thread, which has a great impact on usability.

Because of a [bug](https://patchwork.kernel.org/patch/1970231/) of epoll and greate overhead of epoll_ctl, Edge triggered mode is used in EDISP. When receiving an event, an atomic variable related to the current fd is added by one. Only when the variable is zero before addition, a bthread is started to handle the data from the fd. The pthread in which EDISP runs is used to run this new created bthread, making it have better cache locality and read the data as fast as possible. While the bthread in which EDISP runs will be stolen to another pthread and keep running, this process is called work stealing scheduing in bthread. To understand exactly how that atomic variable works, you can read first[atomic instructions](atomic_instructions.md), then [Socket::StartInputEvent](https://github.com/brpc/brpc/blob/master/src/brpc/socket.cpp). These methods make contentions happened when reading the same fd [wait-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom).

[InputMessenger](https://github.com/brpc/brpc/blob/master/src/brpc/input_messenger.h) is responsible for cutting and handling messages and uses callbacks from user to handle different format of data. Parse is used to cut messages from binary data with nearly fixed running time; Process is used to parse messages further(such as deserialization using protobuf) and call users' callbacks with unfixed running time. InputMessenger will try to users' callbacks one by one. When a Parse successfully cut the next message, call the corresponding Process. Since there are often only one message format in one connection, InputMessenger will record the last choice to avoid try every time. If n(n > 1) messages are read from the fd, InputMessenger will launch n-1 bthreads to handle first n-1 messages respectively, and the last message will be processed in the current bthread.

It can be seen that the fd and messages from fd are processed concurrently in brpc, which makes brpc very good at handling large messages and can handle different sources of messages at high loads to reduce long tails.

# Sending Messages

A message is a fix-length binary data write to a connection, which may be a response to upstream clients or a request to downstream servers.

0 comments on commit 576c6b2

Please sign in to comment.