This client-server application is intended to measure the power consumed during the execution of the specified workload.
The server is intended to be run on the director (the machine on which PTDaemon runs), and the client is intended to be run on the SUT (system under test).
The client accepts a shell command to run, i.e. the workload. The power is measured by the server during the command execution on a client.
The command is run twice for each setting: the first time in ranging mode, and the second time is in testing mode.
Client-server sequence diagram: sequence.png
- Python 3.7 or newer
- Supported OS: Windows or Linux
- PTDaemon (on the server)
- On Linux:
ntpdate
, optional (see below). - On Windows: install
pywin32
python dependency (see below). - Assuming you are able to run the required inference submission. In the README we use ssd-mobilenet as an example.
To make sure the Loadgen logs and PTDaemon logs match, the system time should be synchronized on the client and the server. Both the client and the server have an option to configure the NTP server address to sync with before running a workload.
There are two options:
-
Sync the system time by yourself. You still need to specify NTP server both for the client and the server for the verification.
-
Let the script sync the system time. Runs automatically if the verification fails.
For the second option, you need to have the following prerequisites.
- Install
ntpdate
binary. Ubuntu package:ntpdate
. - Disable pre-existing NTP daemons if they are running.
On Ubuntu:
systemctl disable systemd-timesyncd; systemctl stop systemd-timesyncd; systemctl disable ntp; systemctl stop ntp
. - Root privileges are required. Either run the script as root or set up a passwordless
sudo
.
- Install
pywin32
:python -m pip install pywin32
. - Disable default Windows Time Service (
w32tm
). - Run the script as an administrator.
git clone https://github.com/mlcommons/power
The server requires the configuration file to be passed using the -c
command line argument.
A template of this file is provided below and in the server.template.conf
file.
# Server Configuration Template
# To use change the section of the config that you'd like to change.
[server]
# NTP server to sync with before each measurement.
# See "NTP" section in the README.md.
#ntpServer: ntp.example.com
# A directory to store output data. A relative or absolute path could be used.
# A new subdirectory will be created per each run.
# The name of this sub-directory consists of date, time, label, and mode (ranging/testing).
# The loadgen log is fetched from the client if the `--send-logs` option is enabled for the client.
# The name of the directory is determined by the workload script running on the SUT, e.g. `ssdmobilenet`.
# The power log, named `spl.txt`, is extracted from the full PTDaemon log (`ptdLogfile`)
outDir: D:\ptd-logs\
# (Optional) IP address and port that server listen on
# Defaults to "0.0.0.0 4950" if not set
#listen: 192.168.1.2 4950
# PTDaemon configuration.
# The following options are mapped to PTDaemon command line arguments.
# Please refer to SPEC PTDaemon Programmers Guide or `ptd -h` for the details.
[ptd]
# A path to PTDaemon executable binary.
ptd: D:\PTD\ptd-windows-x86.exe
# A path to a logfile that PTDaemon produces (`-l` option).
# Note that in the current implementation this file is considered temporary
# and may be overwritten.
logFile: logs_ptdeamon.txt
# (Optional) A port on that PTDaemon listens (`-p` option). Default is 8888.
#networkPort: 8888
# Power Analyzer numerical device type. Refer to `ptd -h` for the full list.
# 49 corresponds to Yokogawa WT310.
deviceType: 49
# interfaceFlag and devicePort describe the physical connection to the analyzer.
# interfaceFlag is either one of -n, -g, -y, -U, or empty.
# Refer to SPEC PTDaemon Programmers Guide or `ptd -h` for the details.
# Below are some examples of interfaceFlag and devicePort pairs.
# Use RS232 interface.
# Empty interfaceFlag corresponds to RS232.
interfaceFlag:
devicePort: COM1
# Use TCPIPv4 ethernet interface.
#interfaceFlag: -n
#devicePort: 192.168.1.123
# Use Yokogawa TMCTL for USB or ethernet interface.
# devicePort should be either the IP address or device serial number.
#interfaceFlag: -y
#devicePort: C2PH13047V
# (Optional) Channel number for multichannel analyzers operating in single channel mode. (`-c` option)
#channel: 1
Client command line arguments:
usage: client.py [-h] -a ADDR -w CMD -L INDIR -o OUTDIR -n ADDR [-p PORT] [-l LABEL] [-s] [-f] [-S]
PTD client
required arguments:
-a ADDR, --addr ADDR server address
-w CMD, --run-workload CMD a shell command to run under power measurement
-L INDIR, --loadgen-logs INDIR collect loadgen logs from INDIR
-o OUTDIR, --output OUTDIR put logs into OUTDIR (copied from INDIR)
-n ADDR, --ntp ADDR NTP server address
optional arguments:
-h, --help show this help message and exit
-p PORT, --port PORT server port, defaults to 4950
-l LABEL, --label LABEL a label to include into the resulting directory name
-s, --send-logs send loadgen logs to the server
-f, --force force remove loadgen logs directory (INDIR)
-S, --stop-server stop the server after processing this client
-
INDIR
is a directory to get loadgen logs from. The workload command should place inside this directory. -
LABEL
is a human-readable label. The label is used later both on the client and the server to distinguish between log directories. -
If
-s
/--send-logs
is enabled, then the loadgen log will be sent to the server and stored alongside the power log.
In these examples we have the following assumptions:
- The director IP address is 192.168.1.2.
- The current repository is cloned to
/path/to/mlcommons/power
. - Using
ntp.example.com
as an NTP server.
Start a server (on a director):
./server.py -c server-config.conf
Then on the SUT, provide a workload script for your particular workload and run it using client.py
.
Choose an option below for the example of workload script.
Example option 1: a dummy workload
Create a dummy workload script named dummy.sh
.
It does nothing but mimicking the real loadgen by creating empty loadgen log files in the dummy-loadgen-logs
directory.
#!/usr/bin/env bash
sleep 5
mkdir -p dummy-loadgen-logs
# Create empty files with the same names as loadgen do
touch dummy-loadgen-logs/mlperf_log_accuracy.json
touch dummy-loadgen-logs/mlperf_log_detail.txt
touch dummy-loadgen-logs/mlperf_log_summary.txt
touch dummy-loadgen-logs/mlperf_log_trace.json
Don't forget to chmod +x dummy.sh
.
Then start a client using ./dummy.sh
as a workload being measured.
Pass dummy-loadgen-logs
as a location of loadgen logs.
/path/to/mlcommons/power/ptd_client_server/client.py \
--addr 192.168.1.2 \
--output "client-output-directory" \
--run-workload "./dummy.sh"
--loadgen-logs "dummy-loadgen-logs" \
--label "mylabel" \
--send-logs \
--ntp ntp.example.com
Example option 2: loadgen benchmark
Source: https://github.com/mlcommons/inference/tree/master/loadgen/benchmark
Use the following script to build loadgen benchmark:
#!/usr/bin/env bash
echo "Building loadgen..."
if [ ! -e loadgen_build ]; then mkdir loadgen_build; fi;
cd loadgen_build && cmake ../.. && make -j && cd ..
echo "Building test program..."
if [ ! -e build ]; then mkdir build; fi;
g++ --std=c++11 -O3 -I.. -o repro.exe repro.cpp -Lloadgen_build -lmlperf_loadgen -lpthread
Create run_workload.sh
:
#!/usr/bin/env bash
if [ ! -e build ]; then mkdir build; fi;
./repro.exe 800000 0 4 2048
Don't forget to chmod +x run_workload.sh
.
Then start a client using ./run_workload.sh
as a workload being measured.
The benchmark is hardcoded to put its logs into the build
directory, so we specify it as a loadgen log location.
Run it from the same directory (loadgen/benchmark
).
/path/to/mlcommons/power/ptd_client_server/client.py \
--addr 192.168.1.2 \
--output "client-output-directory" \
--run-workload "./run_workload.sh" \
--loadgen-logs "build" \
--label "mylabel" \
--send-logs \
--ntp ntp.example.com
Example option 3: ssd-mobilenet
Source: https://github.com/mlcommons/inference/tree/master/vision/classification_and_detection
First, follow the instructions in the link above to build and run the ssd-mobilenet
inference benchmark.
You'll also need to download the corresponding model and datasets.
Then, use the following script to run the benchmark under the power measurement.
It uses ./run_local.sh
as the workload script.
The workload script stores its output in the directory ./output/tf-cpu/ssd-mobilenet
.
#!/usr/bin/env bash
# Don't forget to update the following paths
export MODEL_DIR=/path/to/model/dir
export DATA_DIR=/path/to/data/dir
cd /path/to/mlcommons/inference/vision/classification_and_detection
/path/to/mlcommons/power/ptd_client_server/client.py \
--addr 192.168.1.2 \
--output "client-output-directory" \
--run-workload "./run_local.sh tf ssd-mobilenet cpu --scenario Offline" \
--loadgen-logs "./output/tf-cpu/ssd-mobilenet" \
--label "mylabel" \
--send-logs \
--ntp ntp.example.com
All the options above store their output in the client-output-directory
directory, but you can specify any other directory.
After a successful run, you'll see these new files and directories on the server:
D:\ptd-logs
├── … (old entries skipped)
└── 2020-12-28_15-20-52_mylabel
├── client.json ← client summary
├── client.log ← client stdout log
├── ptd_logs.txt ← ptdaemon stdout log
├── ranging
│ ├── mlperf_log_accuracy.json ┐ ← loadgen log, if --send-logs is used.
│ ├── mlperf_log_detail.txt │ Produced by the workload script on
│ ├── mlperf_log_summary.txt │ the client.
│ ├── mlperf_log_trace.json ┘
│ └── spl.txt ← power log
├── server.json ← server summary
├── server.log ← server stdout log
└── testing
├── mlperf_log_accuracy.json ┐
├── mlperf_log_detail.txt │ ← loadgen log (same as above)
├── mlperf_log_summary.txt │
├── mlperf_log_trace.json ┘
└── spl.txt ← power log
And these on the SUT:
./client-output-directory
├── … (old entries skipped)
└── 2020-12-28_15-20-52_mylabel_ranging
├── client.json
├── client.log
├── ranging
│ ├── mlperf_log_accuracy.json ┐
│ ├── mlperf_log_detail.txt │ ← loadgen log
│ ├── mlperf_log_summary.txt │
│ └── mlperf_log_trace.json ┘
└── testing
├── mlperf_log_accuracy.json ┐
├── mlperf_log_detail.txt │ ← loadgen log
├── mlperf_log_summary.txt │
└── mlperf_log_trace.json ┘
spl.txt
consists of the following lines:
Time,28-12-2020 15:21:14.682,Watts,22.950000,Volts,228.570000,Amps,0.206430,PF,0.486400,Mark,2020-12-28_15-20-52_mylabel_testing
Time,28-12-2020 15:21:15.686,Watts,23.080000,Volts,228.440000,Amps,0.207320,PF,0.487400,Mark,2020-12-28_15-20-52_mylabel_testing
Time,28-12-2020 15:21:16.691,Watts,22.990000,Volts,228.520000,Amps,0.206740,PF,0.486500,Mark,2020-12-28_15-20-52_mylabel_testing
During the test, the client and the server maintain a persistent TCP connection.
In the case of unexpected client disconnection, the server terminates the power measurement and consider the test failed. The client intentionally doesn't perform an attempt to reconnect to make the test strict.
Additionally, TCP keepalive is used to detect a stale connection and don't let the server wait indefinitely in case if the client is powered off during the test or the network cable is cut. Keepalive packets are sent each 2 seconds, and we consider the connection broken after 10 missed keepalive responses.