Flyt

Flyt is a elastic GPU provisioning framework for Virtual Machines. It is built on top of Cricket and uses the same virtualization layer. Flyt has a distributed framework that can be used to provision GPU resources over a cluster of GPUs.

For Flyt to be able to insert the virtualization layer, the CUDA application has to link dynamically to the CUDA APIs. For this, you have to pass -cudart shared to nvcc during linking.

Dependencies

Flyt requires

CUDA Toolkit (E.g. CUDA 12.1)
rpcbind
libcrypto
libtirpc
patchelf

libtirpc and patchelf built as part of the main Makefile.

On the system where the Cricket server should be executed, the appropriate NVIDIA drivers should be installed.

Building

Before building FLyt, you need to edit the location of configuration files in control-managers/src/common/config.rs

git clone https://github.com/RWTH-ACS/cricket.git
cd cricket && git submodule update --init
LOG=INFO make

Environment variables for Makefile:

LOG: Log level. Can be one of DEBUG, INFO, WARNING, ERROR.
WITH_DEBUG: Use gcc debug flags for compilation

You can also build the binaries for the client and server separately by running make install-cpu-client or make install-cpu-server.

On the client you need to substiture the original cuda runtime library with the flyt runtime library. make install-client-lib can be used to install the client library. make restore-client-lib can be used to restore the original cuda runtime library.

Running a CUDA Application

You need to run following modules:

flyt-cluster-manager: This can be run on any machine in the cluster. It is responsible for managing the cluster of GPUs.
flyt-node-manager: This should be run as a daemon on machines where the GPU is available. Before running the node manager, make sure that mps is enabled on the GPU. Also you need to set the environment variable CUDA_MPS_ENABLE_PER_CTX_DEVICE_MULTIPROCESSOR_PARTITIONING=1 before running the node manager.
flyt-client-manager: This should be run on the Virtual Machine as a daemon.

You should update the configuration files to point to the correct IP addresses/ports and other configurations.

Furthermore, a mongo database should be available to store the initial requirements of Virtual Machines. The configuration file for the cluster manager should be updated to point to the database. A collection with the name vm_required_resources should be created in the database. The collection should have information in the following format:

{
    vm_ip: <The ip address of the Virtual Machine>,
    host_ip: <The ip address of the Host Machine of the VM>,
    compute_units: <The number of SM cores the VM should be allocated>,
    memory: <The amount of memory in GB the VM should be allocated>
}

Make sure that applications are linked with the shared cudart library. You can do this by passing -cudart shared to nvcc during linking.

You can launch the applications normally as any other CUDA application. The framework will take care of the rest.

Provioning Control

flytctl is a command line tool to interact with the Flyt framework. It should be run on the cluster manager machine. Use flytctl --help to get more information about the commands.

Contributing

File structue

control-managers: The control managers for the Flyt framework
cpu: The virtualization layer
- libtirpc: Transport Indepentend Remote Procedure Calls is requried for the virtualization layer
- patchelf: A tool to modify the dynamic linker and RPATH of an executable
tests: various CUDA applications to test cricket.

Acknowledgments

This work was done at the SynerG Lab of Department of Computer Science and Engineering of IIT Bombay with support from IBM Research Lab, India.

Name		Name	Last commit message	Last commit date
Latest commit History 484 Commits
assets		assets
control-managers		control-managers
cpu		cpu
docs		docs
gpu		gpu
submodules		submodules
tests		tests
utils		utils
.clang-format		.clang-format
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.gitmodules		.gitmodules
DCO.md		DCO.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
cluster-mgr-config.toml		cluster-mgr-config.toml
criu-restore-hook.sh		criu-restore-hook.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flyt

Dependencies

Building

Running a CUDA Application

Provioning Control

Contributing

File structue

Acknowledgments

About

Releases

Packages

Languages

License

cloudarxiv/flyt

Folders and files

Latest commit

History

Repository files navigation

Flyt

Dependencies

Building

Running a CUDA Application

Provioning Control

Contributing

File structue

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages