-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Understanding GPU utilization #870
Comments
GPU utilization is measured in Perf Analyzer and returned to MA as one of many metrics we capture and report to the user. The default objective to maximize is throughput and there can be a multitude of factors that cause the GPU utilization to be less than 100%. If you are interested in maximizing GPU utilization you can specify this as the objective (see config.md for documentation on how to do this) when profiling your model. Have you tried looking at the detailed report generated for the optimal configuration? This might point you in the right direction. It is also possible that you might need to change the maximum instance, batch size or concurrency that MA searches. I hope this helps. |
Thank you for your reply, Thanks |
@matthewkotila can you provide more details? |
@siretru you can find information about the GPU utilization metric that Perf Analyzer offers here: |
Hi However, this does not provide any information on how the average GPU utilization is calculated. Is it utilisation per time; per SMs occupied; ...? |
I'm having trouble interpreting some of the results...
After an Automatic Brute Search analysis, when I analyse the result_summary, I look at the Avegrage GPU Utilization.
How is this value determined? Is it in relation to the number of SMs (Stream MultiProcessors) used? Is it with dcmg or nvidia-smi? We know that it's quite complex to get a reliable measure of GPU usage (when using tools like Nvidia Nsight in particular), so I'd like to check the relevance of this metric.
What is the objective that is maximised in the Automatic Brute Search? Is it throughput?
My main question is :
I'm trying to understand why, for a given model, when the ideal model configuration is reached, my GPU is only being used at around 30%? What is the limiting factor (i.e. why can't we use more of the GPU to increase throughput)?
Thanks all!
The text was updated successfully, but these errors were encountered: