Skip to content

Commit

Permalink
replace baidu-rpc with brpc
Browse files Browse the repository at this point in the history
  • Loading branch information
gejun committed Sep 8, 2017
1 parent 0d5be86 commit 478b434
Show file tree
Hide file tree
Showing 85 changed files with 331 additions and 331 deletions.
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Let's see how the issues are solved.

* RPC needs serialization which is done by [protobuf](https://github.com/google/protobuf) pretty well. Users fill requests in format of protobuf::Message, do RPC, and fetch results from responses in protobuf::Message. protobuf has good forward and backward compatibility for users to change fields and build services incrementally. For http services, [json](http://www.json.org/) is used for serialization extensively.
* Establishment and re-using of connections are transparent to users, but users can make choices, say [different connection types](docs/cn/client.md#连接方式): short, pooled, single.
* Machines are discovered by Naming Service, which can be implemented by [DNS](https://en.wikipedia.org/wiki/Domain_Name_System), [ZooKeeper](https://zookeeper.apache.org/) or [etcd](https://github.com/coreos/etcd). Inside Baidu, we use BNS (Baidu Naming Service). baidu-rpc provides ["list://" and "file://" as well](docs/cn/client.md#名字服务). Users specify load balancing algorithms to choose one machine for each request from all machines, including: round-robin, randomized, [consistent-hashing](docs/cn/consistent_hashing.md)(murmurhash3 or md5) and [locality-aware](docs/cn/lalb.md).
* Machines are discovered by Naming Service, which can be implemented by [DNS](https://en.wikipedia.org/wiki/Domain_Name_System), [ZooKeeper](https://zookeeper.apache.org/) or [etcd](https://github.com/coreos/etcd). Inside Baidu, we use BNS (Baidu Naming Service). brpc provides ["list://" and "file://" as well](docs/cn/client.md#名字服务). Users specify load balancing algorithms to choose one machine for each request from all machines, including: round-robin, randomized, [consistent-hashing](docs/cn/consistent_hashing.md)(murmurhash3 or md5) and [locality-aware](docs/cn/lalb.md).
* RPC retries when the connection is broken. When server does not respond within given time, client fails with timeout error.

# Where can I use RPC?
Expand All @@ -30,15 +30,15 @@ Common doubts on RPC:
- I'm sending streaming data, which can't be processed by RPC. Actually many protocols in RPC can handle streaming data, including [ProgressiveReader in http](docs/cn/http_client.md#持续下载), streams in h2, [streaming rpc](docs/cn/streaming_rpc.md), and RTMP which is a specialized streaming protocol.
- I don't need replies. With some inductions, we know that in your scene, requests can be dropped at any stage, because the client is always unaware of the situation. Are you really sure this is acceptable? Even if you don't need the reply, we recommend sending back small-size replies, which are unlikely performance bottlenecks and probably valuable clues when debugging complex bugs.

# What is ![baidu-rpc](docs/images/logo.png)?
# What is ![brpc](docs/images/logo.png)?

A RPC framework used throughout [Baidu](http://ir.baidu.com/phoenix.zhtml?c=188488&p=irol-irhome), with more than 600,000 instances. Only C++ implementation is opensourced right now.

You can use it for:
* Build a server that can talk in multiple protocols (**on same port**), or access all sorts of services
* restful http/https, h2/h2c (compatible with [grpc](https://github.com/grpc/grpc), will be opensourced soon). using http in baidu-rpc is much more friendly than [libcurl](https://curl.haxx.se/libcurl/).
* restful http/https, h2/h2c (compatible with [grpc](https://github.com/grpc/grpc), will be opensourced soon). using http in brpc is much more friendly than [libcurl](https://curl.haxx.se/libcurl/).
* [redis](docs/cn/redis_client.md) and [memcached](docs/cn/memcache_client.md), thread-safe, more friendly and performant than the official clients
* [rtmp](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/rtmp.h)/[flv](https://en.wikipedia.org/wiki/Flash_Video)/[hls](https://en.wikipedia.org/wiki/HTTP_Live_Streaming), for building live-streaming services.
* [rtmp](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/rtmp.h)/[flv](https://en.wikipedia.org/wiki/Flash_Video)/[hls](https://en.wikipedia.org/wiki/HTTP_Live_Streaming), for building live-streaming services.
* hadoop_rpc(not opensourced yet)
* [rdma](https://en.wikipedia.org/wiki/Remote_direct_memory_access) support via [openucx](https://github.com/openucx/ucx) (will be opensourced soon)
* all sorts of protocols used in Baidu: baidu_std, [streaming_rpc](docs/cn/streaming_rpc.md), hulu_pbrpc, [sofa_pbrpc](https://github.com/baidu/sofa-pbrpc), nova_pbrpc, public_pbrpc, ubrpc, and nshead-based ones.
Expand All @@ -49,39 +49,39 @@ You can use it for:
* Use [combo channels](docs/cn/combo_channel.md) to simplify complicated client patterns declaratively, including sharded and parallel accesses.
* Debug services [via http](docs/cn/builtin_service.md), and run online profilers.
* Get [better latency and throughput](#better-latency-and-throughput).
* [Extend baidu-rpc](docs/cn/new_protocol.md) with the protocols used in your organization quickly, or customize components, including [naming services](docs/cn/load_balancing.md#名字服务) (dns, zk, etcd), [load balancers](docs/cn/load_balancing.md#负载均衡) (rr, random, consistent hashing)
* [Extend brpc](docs/cn/new_protocol.md) with the protocols used in your organization quickly, or customize components, including [naming services](docs/cn/load_balancing.md#名字服务) (dns, zk, etcd), [load balancers](docs/cn/load_balancing.md#负载均衡) (rr, random, consistent hashing)

# Advantages of baidu-rpc
# Advantages of brpc

### More friendly API

Only 3 (major) user headers: [Server](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/server.h), [Channel](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/channel.h), [Controller](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/controller.h), corresponding to server-side, client-side and parameter-set respectively. You don't have to worry about "How to initialize XXXManager", "How to layer all these components together", "What's the relationship between XXXController and XXXContext". All you to do is simple:
Only 3 (major) user headers: [Server](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/server.h), [Channel](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/channel.h), [Controller](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/controller.h), corresponding to server-side, client-side and parameter-set respectively. You don't have to worry about "How to initialize XXXManager", "How to layer all these components together", "What's the relationship between XXXController and XXXContext". All you to do is simple:

* Build service? include [brpc/server.h](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/server.h) and follow the comments or [examples](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/example/echo_c++/server.cpp).
* Build service? include [brpc/server.h](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/server.h) and follow the comments or [examples](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/example/echo_c++/server.cpp).

* Access service? include [brpc/channel.h](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/channel.h) and follow the comments or [examples](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/example/echo_c++/client.cpp).
* Access service? include [brpc/channel.h](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/channel.h) and follow the comments or [examples](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/example/echo_c++/client.cpp).

* Tweak parameters? Checkout [brpc/controller.h](http://icode.baidu.com/repo/baidu/opensource/baidu-rpc/files/master/blob/src/brpc/controller.h). Note that the class is shared by server and channel. Methods are separated into 3 parts: client-side, server-side and both-side.
* Tweak parameters? Checkout [brpc/controller.h](http://icode.baidu.com/repo/baidu/opensource/brpc/files/master/blob/src/brpc/controller.h). Note that the class is shared by server and channel. Methods are separated into 3 parts: client-side, server-side and both-side.

We tried to make simple things simple. Take naming service as an example, in older RPC implementations, you may need to copy a pile of obscure code to make it work, however in baidu-rpc accessing BNS is expressed as `Init("bns://node-name"...`, DNS is "http://domain-name" and local machine list is "file:///home/work/server.list". Without any explanation, you know what it means.
We tried to make simple things simple. Take naming service as an example, in older RPC implementations, you may need to copy a pile of obscure code to make it work, however in brpc accessing BNS is expressed as `Init("bns://node-name"...`, DNS is "http://domain-name" and local machine list is "file:///home/work/server.list". Without any explanation, you know what it means.

### Make services more reliable

baidu-rpc is extensively used in baidu-rpc, with more than 600,000 instances and 500 kinds of services, from map-reduce, table storages, high-performance computing, machine learning, indexing servers, ranking servers…. It's been proven.
brpc is extensively used in brpc, with more than 600,000 instances and 500 kinds of services, from map-reduce, table storages, high-performance computing, machine learning, indexing servers, ranking servers…. It's been proven.

baidu-rpc pays special attentions to development and maintenance efficency, you can [view internal status of servers](docs/cn/builtin_service.md) in web brower or with curl, you can analyze [cpu usages](docs/cn/cpu_profiler.md), [heap allocations](docs/cn/heap_profiler.md) and [lock contentions](docs/cn/contention_profiler.md) of services online, you can measure stats by [bvar](docs/cn/bvar.md), which is viewable in [/vars](docs/cn/vars.md).
brpc pays special attentions to development and maintenance efficency, you can [view internal status of servers](docs/cn/builtin_service.md) in web brower or with curl, you can analyze [cpu usages](docs/cn/cpu_profiler.md), [heap allocations](docs/cn/heap_profiler.md) and [lock contentions](docs/cn/contention_profiler.md) of services online, you can measure stats by [bvar](docs/cn/bvar.md), which is viewable in [/vars](docs/cn/vars.md).

### Better latency and throughput

Although almost all RPC implementations claim that they're "high-performant", the number are probably just numbers. Being really high-performant in different scenarios is difficult. To make users easier, baidu-rpc goes much deeper at performance than other implementations.
Although almost all RPC implementations claim that they're "high-performant", the number are probably just numbers. Being really high-performant in different scenarios is difficult. To make users easier, brpc goes much deeper at performance than other implementations.

* Reading and parsing requests from different clients is fully parallelized, and users don't need to distinguish between "IO-threads" and "Processing-threads". Other implementations probably have "IO-threads" and "Processing-threads" and hash file descriptors(fd) into IO-threads. When a IO-thread handles one of its fds, other fds in the thread can't be handled. If a message is large, other fds are significantly delayed. Although different IO-threads run in parallel, you won't have many IO-threads since they don't have too much to do generally except reading/parsing from fds. If you have 10 IO-threads, one fd may affect 10% of all fds, which is unacceptable to industrial online services (requiring 99.99% availability). The problem will be worse, when fds are distributed unevenly accross IO-threads (unfortunately common), or the service is multi-tenancy (common in cloud services). In baidu-rpc, reading from different fds is parallelized and even processing different messages from one fd is parallelized as well. Parsing a large message does not block other messages from the same fd, not to mention other fds. More details can be found [here](docs/cn/io.md#收消息).
* Reading and parsing requests from different clients is fully parallelized, and users don't need to distinguish between "IO-threads" and "Processing-threads". Other implementations probably have "IO-threads" and "Processing-threads" and hash file descriptors(fd) into IO-threads. When a IO-thread handles one of its fds, other fds in the thread can't be handled. If a message is large, other fds are significantly delayed. Although different IO-threads run in parallel, you won't have many IO-threads since they don't have too much to do generally except reading/parsing from fds. If you have 10 IO-threads, one fd may affect 10% of all fds, which is unacceptable to industrial online services (requiring 99.99% availability). The problem will be worse, when fds are distributed unevenly accross IO-threads (unfortunately common), or the service is multi-tenancy (common in cloud services). In brpc, reading from different fds is parallelized and even processing different messages from one fd is parallelized as well. Parsing a large message does not block other messages from the same fd, not to mention other fds. More details can be found [here](docs/cn/io.md#收消息).
* Writing into one fd and multiple fds are highly concurrent. When multiple threads write into the same fd (common for multiplexed connections), the first thread directly writes in-place and other threads submit their write requests in [wait-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom) manner. One fd can be written into 5,000,000 16-byte messages per second by a couple of highly-contended threads. More details can be found [here](docs/cn/io.md#发消息).
* Minimal locks. High-QPS services can utilize all CPU power on the machine. For example, [creating bthreads](docs/cn/memory_management.md) for processing requests, [setting up timeout](docs/cn/timer_keeping.md), [finding RPC contexts](docs/cn/bthread_id.md) according to response, [recording performance counters](docs/cn/bvar.md) are all highly concurrent. Users see very few contentions (via [contention profiler](docs/cn/contention_profiler.md)) caused by RPC framework even if the service runs at 500,000+ QPS.
* Server adjusts thread number according to load. Traditional implementations set number of threads according to latency to avoid limiting the throughput. baidu-rpc creates a new [bthread](docs/cn/bthread.md) for each request and ends the bthread when the request is done, which automatically adjusts thread number according to load.
* Server adjusts thread number according to load. Traditional implementations set number of threads according to latency to avoid limiting the throughput. brpc creates a new [bthread](docs/cn/bthread.md) for each request and ends the bthread when the request is done, which automatically adjusts thread number according to load.

Check out [benchmark](docs/cn/benchmark.md) for a comparison between baidu-rpc and other implementations.
Check out [benchmark](docs/cn/benchmark.md) for a comparison between brpc and other implementations.

# Try it!

Check out [Getting Started](docs/cn/getting_started.md) to start.
Check out [Getting Started](docs/cn/getting_started.md) to start.
4 changes: 2 additions & 2 deletions docs/cn/atomic_instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
- 一个依赖全局多生产者多消费者队列(MPMC)的程序难有很好的多核扩展性,因为这个队列的极限吞吐取决于同步cache的延时,而不是核心的个数。最好是用多个SPMC或多个MPSC队列,甚至多个SPSC队列代替,在源头就规避掉竞争。
- 另一个例子是全局计数器,如果所有线程都频繁修改一个全局变量,性能就会很差,原因同样在于不同的核心在不停地同步同一个cacheline。如果这个计数器只是用作打打日志之类的,那我们完全可以让每个线程修改thread-local变量,在需要时再合并所有线程中的值,性能可能有几十倍的差别。

做不到完全不共享,那就尽量少共享。在一些读很多的场景下,也许可以降低写的频率以减少同步cacheline的次数,以加快读的平均性能。一个相关的编程陷阱是避免false sharing:这指的是那些不怎么被修改的变量,由于同一个cacheline中的另一个变量被频繁修改,而不得不经常等待cacheline同步而显著变慢了。多线程中的变量尽量按访问规律排列,频繁被其他线程的修改要放在独立的cacheline中。要让一个变量或结构体按cacheline对齐,可以include <base/macros.h>然后使用BAIDU_CACHELINE_ALIGNMENT宏,用法请自行grep一下baidu-rpc的代码了解
做不到完全不共享,那就尽量少共享。在一些读很多的场景下,也许可以降低写的频率以减少同步cacheline的次数,以加快读的平均性能。一个相关的编程陷阱是避免false sharing:这指的是那些不怎么被修改的变量,由于同一个cacheline中的另一个变量被频繁修改,而不得不经常等待cacheline同步而显著变慢了。多线程中的变量尽量按访问规律排列,频繁被其他线程的修改要放在独立的cacheline中。要让一个变量或结构体按cacheline对齐,可以include <base/macros.h>然后使用BAIDU_CACHELINE_ALIGNMENT宏,用法请自行grep一下brpc的代码了解

# Memory fence

Expand Down Expand Up @@ -91,7 +91,7 @@ if (ready.load(std::memory_order_acquire)) {

# wait-free & lock-free

原子指令能为我们的服务赋予两个重要属性:[wait-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom)和[lock-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom)。前者指不管OS如何调度线程,每个线程都始终在做有用的事;后者比前者弱一些,指不管OS如何调度线程,至少有一个线程在做有用的事。如果我们的服务中使用了锁,那么OS可能把一个刚获得锁的线程切换出去,这时候所有依赖这个锁的线程都在等待,而没有做有用的事,所以用了锁就不是lock-free,更不会是wait-free。为了确保一件事情总在确定时间内完成,实时系统的关键代码至少是lock-free的。在我们广泛又多样的在线服务中,对时效性也有着严苛的要求,如果RPC中最关键的部分满足wait-free或lock-free,就可以提供更稳定的服务质量。比如,由于[fd](https://en.wikipedia.org/wiki/File_descriptor)只适合被单个线程操作,baidu-rpc中使用原子指令最大化了fd的读写的并发度,具体见[IO](io.md)。
原子指令能为我们的服务赋予两个重要属性:[wait-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom)和[lock-free](http://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom)。前者指不管OS如何调度线程,每个线程都始终在做有用的事;后者比前者弱一些,指不管OS如何调度线程,至少有一个线程在做有用的事。如果我们的服务中使用了锁,那么OS可能把一个刚获得锁的线程切换出去,这时候所有依赖这个锁的线程都在等待,而没有做有用的事,所以用了锁就不是lock-free,更不会是wait-free。为了确保一件事情总在确定时间内完成,实时系统的关键代码至少是lock-free的。在我们广泛又多样的在线服务中,对时效性也有着严苛的要求,如果RPC中最关键的部分满足wait-free或lock-free,就可以提供更稳定的服务质量。比如,由于[fd](https://en.wikipedia.org/wiki/File_descriptor)只适合被单个线程操作,brpc中使用原子指令最大化了fd的读写的并发度,具体见[IO](io.md)。

值得提醒的是,常见想法是lock-free或wait-free的算法会更快,但事实可能相反,因为:

Expand Down
Loading

0 comments on commit 478b434

Please sign in to comment.