-
Notifications
You must be signed in to change notification settings - Fork 718
Insights: kubeflow/training-operator
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v1.9.0 v1.9.0 release
published
Jan 28, 2025
3 Pull requests merged by 3 people
-
Fix Kustomize patchesStrategicMerge deprecation warning
#2405 merged
Jan 28, 2025 -
[release-1.9] Rename paddlepaddle_defaults.go file name
#2400 merged
Jan 27, 2025 -
KEP-2170: Deploy JobSet in
kubeflow-system
namespace#2388 merged
Jan 27, 2025
2 Pull requests opened by 2 people
-
make explicit arg for pip args
#2403 opened
Jan 24, 2025 -
Add e2e tests for runtimes v2
#2406 opened
Jan 29, 2025
1 Issue closed by 1 person
-
PET_NNODES env var for PyTorchJobs is incorrect when elasticPolicy is set
#2277 closed
Jan 28, 2025
1 Issue opened by 1 person
-
Training Operator V2 Installation - Certificate error
#2404 opened
Jan 25, 2025
18 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
KEP-2170: Add PyTorch DDP MNIST training example
#2387 commented on
Jan 30, 2025 • 41 new comments -
Use env variable for the pytorch init-container image in case of usin…
#2366 commented on
Jan 30, 2025 • 4 new comments -
Remove the Training Operator V1 Source Code
#2389 commented on
Jan 30, 2025 • 0 new comments -
Update Dockerfile with python debian image in cmd/initializer_v2/dataset/Dockerfile
#2312 commented on
Jan 26, 2025 • 0 new comments -
WIP: Use SSA in TrainJob Controller
#2309 commented on
Jan 31, 2025 • 0 new comments -
Migrate to controller-runtime logger in mpi job controller
#2177 commented on
Jan 30, 2025 • 0 new comments -
Consider container image rename of `kubeflow/storage-initializer`
#2183 commented on
Jan 29, 2025 • 0 new comments -
Create GitHub Repository for Kubeflow Trainer
#2402 commented on
Jan 29, 2025 • 0 new comments -
ValueError: Please specify target_modules in peft_config
#2374 commented on
Jan 27, 2025 • 0 new comments -
Cannot fine-tune LLM without GPU - CUDA error and DDP initialization
#2371 commented on
Jan 27, 2025 • 0 new comments -
KEP-2170: Kubeflow Trainer V2 API
#2170 commented on
Jan 27, 2025 • 0 new comments -
KEP-2170: Add E2E tests for Kubeflow Training V2
#2213 commented on
Jan 27, 2025 • 0 new comments -
KEP-2170: Implement validations for TrainingRuntime and ClusterTrainingRuntime
#2219 commented on
Jan 27, 2025 • 0 new comments -
Support MLX on Kubernetes with Kubeflow
#2047 commented on
Jan 26, 2025 • 0 new comments -
KEP-2170: Implement Job Pipeline Framework plugins
#2290 commented on
Jan 26, 2025 • 0 new comments -
Use Debian images for Python components in the Training Operator V2
#2311 commented on
Jan 26, 2025 • 0 new comments -
[SDK] add option to specify pip flags
#2398 commented on
Jan 24, 2025 • 0 new comments -
Support Local Execution of Training Jobs
#2231 commented on
Jan 24, 2025 • 0 new comments