Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DistEnv] Strategy for ray on multiple nodes, with sharding as option #1622

Open
Tracked by #2086
blythed opened this issue Dec 30, 2023 · 4 comments
Open
Tracked by #2086

[DistEnv] Strategy for ray on multiple nodes, with sharding as option #1622

blythed opened this issue Dec 30, 2023 · 4 comments

Comments

@blythed
Copy link
Collaborator

blythed commented Dec 30, 2023

@kartik4949 to add information, discussion points, diagrams, links.

@blythed blythed mentioned this issue Dec 30, 2023
9 tasks
@kartik4949
Copy link
Contributor

There are multiple ways to achieve model parallelism for general torch models.

  1. Deepspeed
  2. FSDP

The above are two most popular libraries which enable model parallelism.
These libraries are basically model parallelism algorithms with gpu inter communication support.

i.e Deepspeed will take you models and partition it into parts and then manage the communication between partitions on multiple gpu.
Source:
https://www.deepspeed.ai/inference/

So if user has a model he can use deep speed to shard it across multiple models.

Lets take 2 scenarios

  1. 1 Machine with 4 GPUS
  2. 2 Machine with 2 GPUs each (4 Total)

From now on, we will be referring to the above scenario list!

If user falls under scenario 1, Directly using deepspeed will suffice and he can achieve model parallelism, but im not sure if we will have a dashboard to view the process.
Here deep speed will create one process per GPU and partition the model for inference.

Now comes Scenario 2, if user has 2 machines, sharding the model on multiple machines might not be the best scenario as the inter node communication can become a bottleneck!

moreover inter node model parallelism with deepspeed requires some manual tasks like hostfile creation, etc

But, ray can handle the inter node/machine communication very elegantly
so the idea is what if we create a local intra machine gpu worker group with deepspeed and create this group on each node/machine

image

Lets take a look at above diagram.

The blue box is ray which takes a batch of input data and distributes across the two machines and each partitioned batch on the machine is given to deepspeed which has a copy of model sharded/distributed on 2 Gpus in that machine
so for e.g

If batch size is 16

8 batch size data (partition 1) will be given to machine 1 and model will be distritbued/paritioned on the 2 gpus in that machine with deepspeed.

same happens on machine 2.

The result it calculated and gathered back by ray and returned on client node.

@blythed
Copy link
Collaborator Author

blythed commented Dec 30, 2023

@kartik4949 great explanation!

Questions:

  1. how much support do we have for this scenario using vLLM?
  2. is there any difference between FSDP and deepspeed?

@jieguangzhou
Copy link
Collaborator

jieguangzhou commented Jan 2, 2024

@kartik4949 great explanation!

Questions:

  1. how much support do we have for this scenario using vLLM?
  2. is there any difference between FSDP and deepspeed?

The vLLM uses the method like this to support the tensor model parallel

It won't use what's here.

But our support for basic transformers or torch models can use this

@jieguangzhou
Copy link
Collaborator

Great
If we implement this, we'll be able to completely offload the model computation layer to a Ray cluster. Also, I suggest completing this in conjunction with the this #1604.

Otherwise, the model will still load onto the local machine, which would make this feature somewhat underwhelming.

@fnikolai fnikolai changed the title Strategy for ray on multiple nodes, with sharding as option [DistEnv] Strategy for ray on multiple nodes, with sharding as option May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants