You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a server with 8 * A100 40G GPUs, and I try to run Alphafold3 with docker or bash, but nomatter how I set --gpu_device, only the first GPU is used.
Here is some test and outputs
nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:3D:00.0 Off | Off |
| N/A 31C P0 33W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:3E:00.0 Off | Off |
| N/A 33C P0 36W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:40:00.0 Off | Off |
| N/A 32C P0 37W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-PCIE-40GB On | 00000000:41:00.0 Off | Off |
| N/A 32C P0 35W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100-PCIE-40GB On | 00000000:B1:00.0 Off | Off |
| N/A 32C P0 39W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100-PCIE-40GB On | 00000000:B2:00.0 Off | Off |
| N/A 32C P0 36W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100-PCIE-40GB On | 00000000:B4:00.0 Off | Off |
| N/A 33C P0 35W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100-PCIE-40GB On | 00000000:B5:00.0 Off | Off |
| N/A 33C P0 33W / 250W | 1MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
use the model parameters.
Found local devices: [CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3), CudaDevice(id=4), CudaDevice(id=5), CudaDevice(id=6), CudaDevice(id=7)], using device 7: cuda:7
Building model from scratch...
Processing fold inputs.
Processing fold input #1
Processing fold input test_pre_load_GPU8
Checking we can load the model parameters...
Although, it is look like running well, only the first GPU is taken!
nvidia-smi
Sun Feb 2 16:14:26 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-PCIE-40GB On | 00000000:3D:00.0 Off | Off |
| N/A 32C P0 35W / 250W | 38855MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA A100-PCIE-40GB On | 00000000:3E:00.0 Off | Off |
| N/A 33C P0 38W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA A100-PCIE-40GB On | 00000000:40:00.0 Off | Off |
| N/A 33C P0 39W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA A100-PCIE-40GB On | 00000000:41:00.0 Off | Off |
| N/A 32C P0 38W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 4 NVIDIA A100-PCIE-40GB On | 00000000:B1:00.0 Off | Off |
| N/A 33C P0 41W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 5 NVIDIA A100-PCIE-40GB On | 00000000:B2:00.0 Off | Off |
| N/A 33C P0 39W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 6 NVIDIA A100-PCIE-40GB On | 00000000:B4:00.0 Off | Off |
| N/A 33C P0 37W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
| 7 NVIDIA A100-PCIE-40GB On | 00000000:B5:00.0 Off | Off |
| N/A 33C P0 35W / 250W | 425MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 3802743 C python 38846MiB |
| 1 N/A N/A 3802743 C python 416MiB |
| 2 N/A N/A 3802743 C python 416MiB |
| 3 N/A N/A 3802743 C python 416MiB |
| 4 N/A N/A 3802743 C python 416MiB |
| 5 N/A N/A 3802743 C python 416MiB |
| 6 N/A N/A 3802743 C python 416MiB |
| 7 N/A N/A 3802743 C python 416MiB |
+-----------------------------------------------------------------------------------------+
Can anyone help me?
The text was updated successfully, but these errors were encountered:
I have a server with 8 * A100 40G GPUs, and I try to run Alphafold3 with docker or bash, but nomatter how I set
--gpu_device
, only the first GPU is used.Here is some test and outputs
Here is my test.
Here is running log
Although, it is look like running well, only the first GPU is taken!
Can anyone help me?
The text was updated successfully, but these errors were encountered: