Skip to content

Commit

Permalink
[RPC docs] Remove mention of TensorPipe's SHM and CMA backends as the…
Browse files Browse the repository at this point in the history
…y're not built (#41200)

Summary:
Pull Request resolved: pytorch/pytorch#41200

In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs.

Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role.
ghstack-source-id: 107458630

Test Plan: Docs only

Differential Revision: D22462158

fbshipit-source-id: 0d72fea11bcaab6d662184bbe7270529772a5e9b
  • Loading branch information
lw authored and facebook-github-bot committed Jul 9, 2020
1 parent a88099b commit dde3d5f
Showing 1 changed file with 2 additions and 6 deletions.
8 changes: 2 additions & 6 deletions docs/source/rpc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -203,12 +203,8 @@ The TensorPipe backend has been introduced in PyTorch v1.6 and is being actively
developed. At the moment, it only supports CPU tensors, with GPU support coming
soon. It comes with a TCP-based transport, just like Gloo. It is also able to
automatically chunk and multiplex large tensors over multiple sockets and
threads in order to achieve very high bandwidths. In addition to that, it packs
two Linux-specific transports for communication between processes on a same
machine (one based on ringbuffers stored in shared memory, the other on the
cross-memory attach syscalls) which can achieve lower latencies than TCP.
The agent will be able to pick the best transport on its own, with no
intervention required.
threads in order to achieve very high bandwidths. The agent will be able to pick
the best transport on its own, with no intervention required.

Example::

Expand Down

0 comments on commit dde3d5f

Please sign in to comment.