[BENCHMARK_APP] Add warning if performance mode is not set to max on …

…Windows (openvinotoolkit#14616) * [C++] Display a warning if Power Mode is not 'Max Performance' - rebase from master * [C++] Prevent std::max from colliding with windows.h max macro * [PYTHON/C++] Update README docs to reflect possible inaccuracies from optimization settings * [C++] Cleanup windows optimizations
anietieakpan · Dec 23, 2022 · f9796ee · f9796ee
1 parent 250e075
commit f9796ee
Show file tree

Hide file tree

Showing 2 changed files with 8 additions and 0 deletions.
diff --git a/samples/cpp/benchmark_app/README.md b/samples/cpp/benchmark_app/README.md
@@ -37,6 +37,10 @@ If not specified, throughput is used as the default. To set the hint explicitly,
 ./benchmark_app -m model.xml -hint throughput
 ```
 
+> **NOTE**
+It is up to the user to ensure the environment on which the benchmark is running is optimized for maximum performance.
+Otherwise, different results may occur when using the application in different environment settings (such as power optimization settings, processor overclocking, thermal throttling).
+
 #### Latency
 Latency is the amount of time it takes to process a single inference request. In applications where data needs to be inferenced and acted on as quickly as possible (such as autonomous driving), low latency is desirable. For conventional devices, lower latency is achieved by reducing the amount of parallel processing streams so the system can utilize as many resources as possible to quickly calculate each inference request. However, advanced devices like multi-socket CPUs and modern GPUs are capable of running multiple inference requests while delivering the same latency.
 

diff --git a/tools/benchmark_tool/README.md b/tools/benchmark_tool/README.md
@@ -35,6 +35,10 @@ benchmark_app -m model.xml -hint latency
 benchmark_app -m model.xml -hint throughput
 ```
 
+> **NOTE**
+It is up to the user to ensure the environment on which the benchmark is running is optimized for maximum performance.
+Otherwise, different results may occur when using the application in different environment settings (such as power optimization settings, processor overclocking, thermal throttling).
+
 #### Latency
 Latency is the amount of time it takes to process a single inference request. In applications where data needs to be inferenced and acted on as quickly as possible (such as autonomous driving), low latency is desirable. For conventional devices, lower latency is achieved by reducing the amount of parallel processing streams so the system can utilize as many resources as possible to quickly calculate each inference request. However, advanced devices like multi-socket CPUs and modern GPUs are capable of running multiple inference requests while delivering the same latency.