-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use lightning or pytorch-lightning #438
Comments
Hi. Thanks for trying out aligner. Can you try the dockerfile + the code on the dev branch? This has been resolved there |
My access env is in the docker, and can support docker in docker. But I will try the steps in dockerfile. Seems that the dockerfile uses BTW, can you check the #436 |
Yea, exactly. If you're running |
Seems the TensorRT-LLM requires cuda 12.5 in dev branch. Do you tried |
@terrykong Still got OOM with single A800 80G card or got stucked in command
|
Describe the bug
installed
pytorch-lightning=2.4.0, nemo_toolkit=2.1.0rc0
underNeMo-Aligner=0.5.0
error with
It seems that
resolve_and_create_trainer
inits alightning Trainer
but makes a judge withpytorch_lightning Trainer
underinit_using_ptl
.Does anyone occur with this problem.
I updated the code inside
nemo_aligner/utils/train_script_utils.py
and it will run forwardSteps/Code to reproduce bug
Expected behavior
A clear and concise description of what you expected to happen.
Environment overview (please complete the following information)
Docker
pip
docker pull
&docker run
commands usedEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Otherwise, please provide:
Additional context
Add any other context about the problem here.
Example: GPU model
The text was updated successfully, but these errors were encountered: