-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugin-nvidia: collect nvml process metrics #89
base: main
Are you sure you want to change the base?
plugin-nvidia: collect nvml process metrics #89
Conversation
Please rebase on the master branch of upstream. If the upstream remote (alumet-dev/alumet) is named
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you change the plugin-cgroupv2 in a PR about the NVML plugin ?
52114e5
to
d1d4991
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests need to be removed for now, you can keep them in git stash.
Also, it would be better to make this two separate commits:
- add the new measurements
- refactor in separate files
The goal is that it's easier to review because we see what has changed in the probe, in the current situation I just see "old file deleted" and "new files added".
But it's not a big deal this time, think about it next time ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some changes about the documentation. Unlike some other languages, the Rust convention is to write simple sentences that explain what the function does, not to list the parameters or returned result.
81a7c10
to
d9fe388
Compare
you need to run |
506d7ef
to
90a89b3
Compare
Please separate the work on jetson in another PR |
90a89b3
to
c4ba967
Compare
d9a596a
to
2caee29
Compare
2caee29
to
cea07db
Compare
[ DONE ]
Add process_utilization_metrics function, using the NVML "process_utilization_stats" method.
Allow to pooling running_compute_processes and running_graphics_processes by process PID and LocalMachine GPU consumers, for the final data feedback.
The major_utilization metric is now used globally and by PID for graphics and computing processes.
Used to evaluate frame buffer memory utilization.
The encoder_utilization metric is now used globally and by PID for graphics and computing processes.
The decoder_utilization metric is now used globally and by PID for graphics and computing processes.
The new added metric sm_utilization is used globally and by PID for graphics and computing processes.
It refers to the percentage of time that the Streaming Multiprocessors (SMs) of a GPU (SM 3D compute utilization).
Add GPU temperature monitoring.
Refactoring of "nvml.rs" code file in different files in a nvml directory :
Add unit tests in nvml directory and "lib.rs"