Do i still need to set kv_store in using mxnet? why? #80

ZHAIXINGZHAIYUE · 2019-08-08T07:22:44Z

Do i still need to set kv_store in using mxnet? why?

ymjiang · 2019-08-08T07:36:05Z

You don't need to. MXNet-BytePS's implementation bypasses kvstore.

ZHAIXINGZHAIYUE · 2019-08-08T08:49:34Z

您好，如果我在使用 mxnet native 进行分布式训练的时候，也单独设置scheduler节点， server节点，worker节点，让它们运行在单独的服务器上。这样的话，使用byteps mxnet 与使用 mxnet native 进行训练，效率上会有差异吗？如果有的话，差异主要来自哪里？谢谢。

ymjiang · 2019-08-08T09:07:52Z

There will be performance difference even if using the same setup as you said. We did many performance optimizations on BytePS. For example, compared to mxnet native, BytePS-mxnet eliminates some extra copy. BytePS also supports RDMA, which is obviously faster than mxnet-native TCP. We will have a technical report talking about these optimizations in the future.

ZHAIXINGZHAIYUE · 2019-08-08T09:12:37Z

thank you

bobzhuyb · 2019-08-08T18:16:29Z

Below are some numbers. The following experiments are performed on a public cloud with 20 Gbps networks. Each machine has 8 Tesla V100 16GB GPUs (with NVLink-enabled). The batch size is 32 for each GPU, and we use fp32 training. We calculate the "total images per second" as the metric.

ZHAIXINGZHAIYUE · 2019-08-09T02:32:43Z

@bobzhuyb the number of server is same? between mxnet-native and mxnet-byteps.

bobzhuyb · 2019-08-09T04:09:43Z

Yes. You can try them yourself. The original ps-lite implementation is pretty poor -- it is slower than Horovod, let alone BytePS.

ZHAIXINGZHAIYUE · 2019-12-06T06:33:18Z

你好，我这里还有一个问题。在使用原版mxnet 进行分布式训练的时候，不时的会遇到Check failed: (my_node_.port) != (-1) bind failed, 在byteps中，这个问题还存在吗？

ymjiang · 2019-12-07T02:54:36Z

@ZHAIXINGZHAIYUE I believe you won't have that problem if you configure byteps correctly. We never meet this when using byteps.

DeruiLiu mentioned this issue Aug 6, 2020

some question about to start server. Check failed: mr ibv_reg_mr failed: Cannot allocate memory #282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do i still need to set kv_store in using mxnet? why? #80

Do i still need to set kv_store in using mxnet? why? #80

ZHAIXINGZHAIYUE commented Aug 8, 2019

ymjiang commented Aug 8, 2019

ZHAIXINGZHAIYUE commented Aug 8, 2019

ymjiang commented Aug 8, 2019 •

edited

Loading

ZHAIXINGZHAIYUE commented Aug 8, 2019

bobzhuyb commented Aug 8, 2019

ZHAIXINGZHAIYUE commented Aug 9, 2019 •

edited

Loading

bobzhuyb commented Aug 9, 2019

ZHAIXINGZHAIYUE commented Dec 6, 2019

ymjiang commented Dec 7, 2019

Do i still need to set kv_store in using mxnet? why? #80

Do i still need to set kv_store in using mxnet? why? #80

Comments

ZHAIXINGZHAIYUE commented Aug 8, 2019

ymjiang commented Aug 8, 2019

ZHAIXINGZHAIYUE commented Aug 8, 2019

ymjiang commented Aug 8, 2019 • edited Loading

ZHAIXINGZHAIYUE commented Aug 8, 2019

bobzhuyb commented Aug 8, 2019

ZHAIXINGZHAIYUE commented Aug 9, 2019 • edited Loading

bobzhuyb commented Aug 9, 2019

ZHAIXINGZHAIYUE commented Dec 6, 2019

ymjiang commented Dec 7, 2019

ymjiang commented Aug 8, 2019 •

edited

Loading

ZHAIXINGZHAIYUE commented Aug 9, 2019 •

edited

Loading