Name	Name	Last commit message	Last commit date
Latest commit History 91 Commits
.cm	.cm
docker	docker
program	program
.ckr.json	.ckr.json
.gitignore	.gitignore
COPYRIGHT.txt	COPYRIGHT.txt
LICENSE.txt	LICENSE.txt
README.md	README.md

Arm NN ports for MLPerf Inference benchmarks

Getting started
Image classification
- Download and preprocess the ImageNet 2012 validation dataset
- MobileNet
  - Model
  - TFLite data (reference)
  - ArmNN Neon data
  - ArmNN OpenCL data
  - ArmNN Reference data
  - Validate data
- ResNet
  - Model
  - TFLite data (reference)
  - ArmNN Neon data
  - ArmNN OpenCL data
  - ArmNN Reference data
  - Validate data
Object detection
- Caveats
- Download and preprocess the COCO 2017 validation dataset
- SSD-MobileNet
  - Model
  - TFLite data (reference)
  - ArmNN Neon data
  - ArmNN OpenCL data
  - Validate data

Getting started

Install CK

Please follow the CK installation instructions.

Pull CK repositories

$ ck pull repo --url=https://github.com/arm-software/armnn-mlperf
$ ck list repo:*armnn*
ck-armnn
armnn-mlperf
$ ck list repo:*mlperf*
ck-mlperf
armnn-mlperf

NB: Remember to refresh all the repositories after any updates (e.g. bug fixes):

$ ck pull all

Install TFLite

$ ck install package --tags=lib,tflite,v1.13

Install ArmNN

To install ArmNN with full support (frontends: TF, TFLite, ONNX; backends: Reference, OpenCL, Neon):

$ ck install package --tags=lib,armnn,tf,tflite,onnx,neon,opencl,rel.19.02

NB: On a platform with only a couple of GB of RAM, you way wish to restrict the number of CPU build threads e.g. as follows:

$ ck install package --tags=lib,armnn,tflite,neon,opencl,rel.19.02 \
--env.CK_HOST_CPU_NUMBER_OF_PROCESSORS=4

If you would like to save time, you can build with TFLite frontend support only using one of the backend options below. For more details, please refer to the CK-ArmNN repository.

Option 1: Install ArmNN with TFLite, Neon and OpenCL support (recommended)

$ ck install package --tags=lib,armnn,tflite,neon,opencl,rel.19.02

Option 2: Install ArmNN with TFLite and Neon support

$ ck install package --tags=lib,armnn,tflite,neon,rel.19.02

Option 3: Install ArmNN with TFLite and OpenCL support

$ ck install package --tags=lib,armnn,tflite,opencl,rel.19.02

Option 4: ArmNN with TFLite and Reference support

$ ck install package --tags=lib,armnn,tflite,rel.19.02

Image classification

Please follow the MLPerf image classification instructions to install dependencies such as Python packages:

first, the common instructions;
then, the TFLite instructions.

Download and preprocess the ImageNet 2012 validation dataset

Full (50,000 images)

$ ck install package --tags=dataset,imagenet,val,original,full
$ ck install package --tags=dataset,imagenet,val,preprocessed,full

Minimal (500 images)

$ ck install package --tags=dataset,imagenet,val,original,min --no_tags=resized
$ ck install package --tags=dataset,imagenet,val,preprocessed

MobileNet

Model

Install the MobileNet model:

$ ck install package --tags=model,tflite,mlperf,mobilenet,non-quantized

TFLite data (reference)

Run on 500 images

$ ck benchmark program:image-classification-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-tflite-accuracy-500 \
--tags=image-classification,mlperf,mobilenet,tflite,accuracy,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

NB: You can also run the same command on the full ImageNet validation dataset of 50,000 images (see below).

Run on 50,000 images

$ ck benchmark program:image-classification-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-tflite-accuracy-50000 \
--tags=image-classification,mlperf,mobilenet,tflite,accuracy,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN Neon data

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_NEON \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-neon-500 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,neon,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_NEON \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-neon-50000 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,neon,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN OpenCL data

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_OPENCL \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-opencl-500 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,opencl,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_OPENCL \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-opencl-50000 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,opencl,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN Reference data (NOT RECOMMENDED)

NB: This validation can run on x86 or arm. However, it is completely unoptimized and hence extremely slow (e.g. 6.5 seconds per image on a Linaro HiKey960 board or 2.9 seconds per image on an Intel Xeon server).

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-500 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images

$ ck benchmark program:image-classification-armnn-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-mobilenet-armnn-tflite-accuracy-50000 \
--tags=image-classification,mlperf,mobilenet,armnn-tflite,accuracy,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Validate experimental data

To validate the equivalence of the optimized ArmNN implementation versus the reference TFLite one, we collected experimental data as above on two platforms:

A Linaro HiKey960 board (hikey): TFLite vs. ArmNN Neon vs. ArmNN OpenCL (500 and 50,000 images).
An Intel Xeon server (velociti): TFLite vs. ArmNN Reference (500 and 50,000 images).

The resulting experimental entries were archived e.g. as follows:

hikey$ ck list local:experiment:mlperf-mobilenet*accuracy*500
mlperf-mobilenet-tflite-accuracy-500
mlperf-mobilenet-armnn-tflite-accuracy-neon-500
mlperf-mobilenet-armnn-tflite-accuracy-opencl-500
hikey$ ck zip local:experiment:mlperf-mobilenet*accuracy*500 \
                --archive_name=mlperf-mobilenet-accuracy-500-hikey.zip

The archives were then uploaded to DropBox. You can follow instructions below to download the archives and validate the accuracy.

`hikey`

500 images

$ wget https://www.dropbox.com/s/9lz7yncy1xtqlvj/mlperf-mobilenet-accuracy-500-hikey.zip
$ ck add repo --zip=mlperf-mobilenet-accuracy-500-hikey.zip
$ ck list --repo_uoa=mlperf-mobilenet-accuracy-500-hikey --print_full
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-500
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-tflite-accuracy-500
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-500

TFLite vs. ArmNN Neon

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-tflite-accuracy-500 \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-500
...
{'epsilon': 1e-05,
 'max_delta': 7.000000000034756e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

TFLite vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-tflite-accuracy-500 \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-500
...
{'epsilon': 1e-05,
 'max_delta': 9.000000000036756e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

ArmNN Neon vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-500 \
mlperf-mobilenet-accuracy-500-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-500
...
{'epsilon': 1e-05,
 'max_delta': 6.0000000000060005e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

50,000 images

$ wget https://www.dropbox.com/s/3cdi3lx7jfxwse7/mlperf-mobilenet-accuracy-50000-hikey.zip
$ ck add repo --zip=mlperf-mobilenet-accuracy-50000-hikey.zip
$ ck list --repo_uoa=mlperf-mobilenet-accuracy-50000-hikey --print_full
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-50000
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-tflite-accuracy-50000
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-50000

TFLite vs. ArmNN Neon

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-tflite-accuracy-50000 \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-50000
...
{'epsilon': 1e-05,
 'max_delta': 1.3000000000040757e-05,
 'num_mismatched_classes': 10,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 20,
 'num_mismatched_probabilities': 19,
 'return': 0}

TFLite vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-tflite-accuracy-50000 \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-50000
...
{'epsilon': 1e-05,
 'max_delta': 1.4000000000014001e-05,
 'num_mismatched_classes': 8,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 20,
 'num_mismatched_probabilities': 18,
 'return': 0}

ArmNN Neon vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-neon-50000 \
mlperf-mobilenet-accuracy-50000-hikey:experiment:mlperf-mobilenet-armnn-tflite-accuracy-opencl-50000
...
Checking ILSVRC2012_val_00033823.JPEG...
- mismatched classes at index 2: 137 != 136
- mismatched classes at index 3: 136 != 137
...
{'epsilon': 1e-05,
 'max_delta': 8.000000000008e-06,
 'num_mismatched_classes': 2,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 1,
 'num_mismatched_probabilities': 0,
 'return': 0}

`velociti`

500 images

$ wget https://www.dropbox.com/s/j2rdh3uzhz3lqh7/mlperf-mobilenet-accuracy-500-velociti.zip
$ ck add repo --zip=mlperf-mobilenet-accuracy-500-velociti.zip
$ ck list --repo_uoa=mlperf-mobilenet-accuracy-500-velociti --print_full
mlperf-mobilenet-accuracy-500-velociti:experiment:mlperf-mobilenet-armnn-tflite-accuracy-500
mlperf-mobilenet-accuracy-500-velociti:experiment:mlperf-mobilenet-tflite-accuracy-500

TFLite vs. ArmNN Reference

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-500-velociti:experiment:mlperf-mobilenet-armnn-tflite-accuracy-500 \
mlperf-mobilenet-accuracy-500-velociti:experiment:mlperf-mobilenet-tflite-accuracy-500
...
{'epsilon': 1e-05,
 'max_delta': 7.000000000090267e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

50,000 images

$ wget https://www.dropbox.com/s/z5bx7aeocwdyrww/mlperf-mobilenet-accuracy-50000-velociti.zip
$ ck add repo --zip=mlperf-mobilenet-accuracy-50000-velociti.zip
$ ck list --repo_uoa=mlperf-mobilenet-accuracy-50000-velociti --print_full
mlperf-mobilenet-accuracy-50000-velociti:experiment:mlperf-mobilenet-armnn-tflite-accuracy-50000
mlperf-mobilenet-accuracy-50000-velociti:experiment:mlperf-mobilenet-tflite-accuracy-50000

TFLite vs. ArmNN Reference

$ ck compare_experiments mlperf \
mlperf-mobilenet-accuracy-50000-velociti:experiment:mlperf-mobilenet-armnn-tflite-accuracy-50000 \
mlperf-mobilenet-accuracy-50000-velociti:experiment:mlperf-mobilenet-tflite-accuracy-50000
...
{'epsilon': 1e-05,
 'max_delta': 1.2000000000012001e-05,
 'num_mismatched_classes': 2,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 14,
 'num_mismatched_probabilities': 17,
 'return': 0}

ResNet

Model

Install the ResNet model:

$ ck install package --tags=model,tflite,mlperf,resnet

More than one package or version found:

 0) model-tflite-mlperf-resnet-no-argmax  Version 1.5  (afb43014ef38f646)
 1) model-tflite-mlperf-resnet  Version 1.5  (d60d4e9a84151271)
 2) model-tflite-convert-from-tf (35e84375ac48dcb1), Variations: resnet

Please select the package to install [ hit return for "0" ]:

Option 0 will download a TFLite model preconverted from the TF model. During the conversion, the ArgMax operator causing an issue with ArmNN v19.02 was excluded.

Option 1 will download a TFLite model preconverted from the TF model, but including the ArgMax operator. This variant can be used once the above issue is resolved.

Option 2 will download the TF model and convert it to TFLite, while excluding the ArgMax operator. Since the conversion relies on a prebuilt version of TF, this option is only viable on x86. (This constraint can be relaxed, but building TF on Arm is not officially supported.)

TFLite data (reference)

Run on 500 images

$ ck benchmark program:image-classification-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-resnet-tflite-accuracy-500 \
--tags=image-classification,mlperf,resnet,tflite,accuracy,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

NB: You can also run the same command on the full ImageNet validation dataset of 50,000 images (see below).

Run on 50,000 images

$ ck benchmark program:image-classification-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-resnet-tflite-accuracy-50000 \
--tags=image-classification,mlperf,resnet,tflite,accuracy,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN Neon data

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_NEON \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-neon-500 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,neon,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_NEON \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-neon-50000 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,neon,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN OpenCL data

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_OPENCL \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-opencl-500 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,opencl,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images

$ ck benchmark program:image-classification-armnn-tflite --env.USE_OPENCL \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-opencl-50000 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,opencl,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN Reference data (NOT RECOMMENDED)

NB: This validation can run on x86 or arm. However, it is completely unoptimized and hence extremely slow (e.g. 34.8 seconds per image on a Linaro HiKey960 board or 16.9 seconds per image on an Intel Xeon server).

Run on 500 images

$ ck benchmark program:image-classification-armnn-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=500 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-500 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,500 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Run on 50,000 images (NOT RUN)

$ ck benchmark program:image-classification-armnn-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=50000 \
--record --record_repo=local --record_uoa=mlperf-resnet-armnn-tflite-accuracy-50000 \
--tags=image-classification,mlperf,resnet,armnn-tflite,accuracy,50000 \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Validate experimental data

To validate the equivalence of the optimized ArmNN implementation versus the reference TFLite one, we collected experimental data as above on two platforms:

A Linaro HiKey960 board (hikey): TFLite vs. ArmNN Neon vs. ArmNN OpenCL (500 and 50,000 images).
An Intel Xeon server (velociti): TFLite vs. ArmNN Reference (500 images only).

The resulting experimental entries were archived e.g. as follows:

hikey$ ck list local:experiment:mlperf-resnet*accuracy*500
mlperf-resnet-tflite-accuracy-500
mlperf-resnet-armnn-tflite-accuracy-neon-500
mlperf-resnet-armnn-tflite-accuracy-opencl-500
hikey$ ck zip local:experiment:mlperf-resnet*accuracy*500 \
                --archive_name=mlperf-resnet-accuracy-500-hikey.zip

The archives were then uploaded to DropBox. You can follow instructions below to download the archives and validate the accuracy.

`hikey`

500 images

$ wget https://www.dropbox.com/s/eod0bflxxzpudmr/mlperf-resnet-accuracy-500-hikey.zip
$ ck add repo --zip=mlperf-resnet-accuracy-500-hikey.zip
$ ck list --repo_uoa=mlperf-resnet-accuracy-500-hikey --print_full
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-500
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-500
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-tflite-accuracy-500

TFLite vs. ArmNN Neon

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-tflite-accuracy-500 \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-500
...
{'epsilon': 1e-05,
 'max_delta': 1.0999999999983245e-05,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 2,
 'num_mismatched_probabilities': 3,
 'return': 0}

TFLite vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-tflite-accuracy-500 \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-500
...
{'epsilon': 1e-05,
 'max_delta': 1.0999999999983245e-05,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 3,
 'num_mismatched_probabilities': 4,
 'return': 0}

ArmNN Neon vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-500 \
mlperf-resnet-accuracy-500-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-500
...
{'epsilon': 1e-05,
 'max_delta': 6.0000000000060005e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

50,000 images

$ wget https://www.dropbox.com/s/1yzv6unriqs18yb/mlperf-resnet-accuracy-50000-hikey.zip
$ ck add repo --zip=mlperf-resnet-accuracy-50000-hikey.zip
$ ck list --repo_uoa=mlperf-resnet-accuracy-50000-hikey --print_full
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-50000
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-50000
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-tflite-accuracy-50000

TFLite vs. ArmNN Neon

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-tflite-accuracy-50000 \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-50000
...
{'epsilon': 1e-05,
 'max_delta': 2.4000000000024002e-05,
 'num_mismatched_classes': 6,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 107,
 'num_mismatched_probabilities': 153,
 'return': 0}

TFLite vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-tflite-accuracy-50000 \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-50000
...
{'epsilon': 1e-05,
 'max_delta': 2.5000000000052758e-05,
 'num_mismatched_classes': 4,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 147,
 'num_mismatched_probabilities': 190,
 'return': 0}

ArmNN Neon vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-neon-50000 \
mlperf-resnet-accuracy-50000-hikey:experiment:mlperf-resnet-armnn-tflite-accuracy-opencl-50000
...
{'epsilon': 1e-05,
 'max_delta': 1.0000000000010001e-05,
 'num_mismatched_classes': 6,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 5,
 'num_mismatched_probabilities': 2,
 'return': 0}

`velociti`

500 images

$ wget https://www.dropbox.com/s/1jv4lpfp1ddr2j7/mlperf-resnet-accuracy-500-velociti.zip
$ ck add repo --zip=mlperf-resnet-accuracy-500-velociti.zip
$ ck list --repo_uoa=mlperf-resnet-accuracy-500-velociti --print_full
mlperf-resnet-accuracy-500-velociti:experiment:mlperf-resnet-armnn-tflite-accuracy-500
mlperf-resnet-accuracy-500-velociti:experiment:mlperf-resnet-tflite-accuracy-500

TFLite vs. ArmNN Reference

$ ck compare_experiments mlperf \
mlperf-resnet-accuracy-500-velociti:experiment:mlperf-resnet-armnn-tflite-accuracy-500 \
mlperf-resnet-accuracy-500-velociti:experiment:mlperf-resnet-tflite-accuracy-500
...
{'epsilon': 1e-05,
 'max_delta': 3.0000000000030003e-06,
 'num_mismatched_classes': 0,
 'num_mismatched_elementary_keys': 0,
 'num_mismatched_files': 0,
 'num_mismatched_probabilities': 0,
 'return': 0}

Object detection

Please follow the MLPerf object detection instructions to install dependencies such as Python packages:

first, the common instructions;
then, the TFLite instructions.

Caveats

TFLite

The SSD models require TFLite 1.13.1.

Python 3

The COCO API (used to evaluate object detection accuracy on the COCO dataset) requires Python 3. Since many embedded platforms use Python 2 by default (including HiKey960), care must be taken not to mix Python 3 and Python 2 packages. Therefore, all benchmarking commands below use the CK_PYTHON=python3 prefix to ensure CK runs under Python 3.

Download and preprocess the COCO 2017 validation dataset

$ ck install package --tags=object-detection,dataset,coco.2017,val,original,full
$ ck install package --tags=object-detection,dataset,coco.2017,preprocessed,full

SSD-MobileNet

Model

Install the SSD-MobileNet model:

$ ck install package --tags=model,tflite,mlperf,object-detection,ssd-mobilenet

TFLite data (reference)

$ CK_PYTHON=python3 ck benchmark program:object-detection-tflite \
--repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=5000 --env.CK_METRIC_TYPE=COCO \
--record --record_repo=local --record_uoa=mlperf-object-detection-ssd-mobilenet-tflite-accuracy \
--tags=mlperf,object-detection,ssd-mobilenet,tflite,accuracy \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN Neon data

$ CK_PYTHON=python3 ck benchmark program:object-detection-armnn-tflite --env.USE_NMS=regular --env.USE_NEON \
--speed --repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=5000 --env.CK_METRIC_TYPE=COCO \
--record --record_repo=local --record_uoa=mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-neon \
--tags=mlperf,object-detection,ssd-mobilenet,armnn-tflite,accuracy,neon \
--skip_print_timers --skip_stat_analysis --process_multi_keys

ArmNN OpenCL data

$ CK_PYTHON=python3 ck benchmark program:object-detection-armnn-tflite --env.USE_NMS=regular --env.USE_OPENCL \
--speed --repetitions=1 --env.CK_BATCH_SIZE=1 --env.CK_BATCH_COUNT=5000 --env.CK_METRIC_TYPE=COCO \
--record --record_repo=local --record_uoa=mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-opencl \
--tags=mlperf,object-detection,ssd-mobilenet,armnn-tflite,accuracy,opencl \
--skip_print_timers --skip_stat_analysis --process_multi_keys

Validate experimental data

To validate the equivalence of the optimized ArmNN implementation versus the reference TFLite one, we collected experimental data as above on:

A Linaro HiKey960 board (hikey): TFLite vs. ArmNN Neon vs. ArmNN OpenCL.

The resulting experimental entries were archived e.g. as follows:

hikey$ ck list local:experiment:mlperf-object-detection-ssd-mobilenet*accuracy*
...
hikey$ ck zip local:experiment:mlperf-object-detection-ssd-mobilenet*accuracy* \
                --archive_name=mlperf-object-detection-ssd-mobilenet-accuracy-hikey.zip

The archives were then uploaded to DropBox. You can follow instructions below to download the archives and validate the accuracy.

hikey

$ wget https://www.dropbox.com/s/jzpum9fedwgq8rd/mlperf-object-detection-ssd-mobilenet-accuracy-hikey.zip
$ ck add repo --zip=mlperf-object-detection-ssd-mobilenet-accuracy-hikey.zip
$ ck list --repo_uoa=mlperf-object-detection-ssd-mobilenet-accuracy-hikey --print_full
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-opencl
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-neon
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-tflite-accuracy

TFLite vs. ArmNN Neon

$ ck compare_experiments mlperf \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-tflite-accuracy \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-neon
...
{'delta_mAP': 0.0007913466136450775,
 'delta_recall': 0.001550580396864898,
 'epsilon_bbox': [1.0, 1.0, 1.0, 1.0],
 'epsilon_dist': 100.0,
 'epsilon_prob': 1e-05,
 'max_delta_bbox': [596.98, 490.93, 606.36, 583.04],
 'max_delta_dist': 173810.622875,
 'max_delta_prob': 0.49000000000000005,
 'num_mismatched_classes': 239,
 'num_mismatched_distances': 1353,
 'num_mismatched_files': 1166,
 'num_mismatched_probabilities': 442,
 'return': 0}

TFLite vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-tflite-accuracy \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-opencl
...
{'delta_mAP': 0.0007900779499596944,
 'delta_recall': 0.0015451924658304583,
 'epsilon_bbox': [1.0, 1.0, 1.0, 1.0],
 'epsilon_dist': 100.0,
 'epsilon_prob': 1e-05,
 'max_delta_bbox': [596.98, 490.93, 606.36, 583.04],
 'max_delta_dist': 173810.622875,
 'max_delta_prob': 0.49000000000000005,
 'num_mismatched_classes': 239,
 'num_mismatched_distances': 1353,
 'num_mismatched_files': 1167,
 'num_mismatched_probabilities': 444,
 'return': 0}

ArmNN Neon vs. ArmNN OpenCL

$ ck compare_experiments mlperf \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-neon \
mlperf-object-detection-ssd-mobilenet-accuracy-hikey:experiment:mlperf-object-detection-ssd-mobilenet-armnn-tflite-accuracy-opencl
...
{'delta_mAP': -1.2686636853831423e-06,
 'delta_recall': -5.387931034439575e-06,
 'epsilon_bbox': [1.0, 1.0, 1.0, 1.0],
 'epsilon_dist': 100.0,
 'epsilon_prob': 1e-05,
 'max_delta_bbox': [0.010000000000047748,
                    0.010000000000047748,
                    0.010000000000047748,
                    0.010000000000047748],
 'max_delta_dist': 3.1188250000122935,
 'max_delta_prob': 0.0010000000000000009,
 'num_mismatched_classes': 0,
 'num_mismatched_distances': 0,
 'num_mismatched_files': 7,
 'num_mismatched_probabilities': 7,
 'return': 0}

License

ARM-software/armnn-mlperf

Folders and files

Latest commit

History

Repository files navigation

Arm NN ports for MLPerf Inference benchmarks

Getting started

Install CK

Pull CK repositories

Install TFLite

Install ArmNN

Option 1: Install ArmNN with TFLite, Neon and OpenCL support (recommended)

Option 2: Install ArmNN with TFLite and Neon support

Option 3: Install ArmNN with TFLite and OpenCL support

Option 4: ArmNN with TFLite and Reference support

Image classification

Download and preprocess the ImageNet 2012 validation dataset

Full (50,000 images)

Minimal (500 images)

MobileNet

Model

TFLite data (reference)

Run on 500 images

Run on 50,000 images

ArmNN Neon data

Run on 500 images

Run on 50,000 images

ArmNN OpenCL data

Run on 500 images

Run on 50,000 images

ArmNN Reference data (NOT RECOMMENDED)

Run on 500 images

Run on 50,000 images

Validate experimental data

hikey

500 images

TFLite vs. ArmNN Neon

TFLite vs. ArmNN OpenCL

ArmNN Neon vs. ArmNN OpenCL

50,000 images

TFLite vs. ArmNN Neon

TFLite vs. ArmNN OpenCL

ArmNN Neon vs. ArmNN OpenCL

velociti

500 images

TFLite vs. ArmNN Reference

50,000 images

TFLite vs. ArmNN Reference

ResNet

Model

TFLite data (reference)

Run on 500 images

Run on 50,000 images

ArmNN Neon data

Run on 500 images

Run on 50,000 images

ArmNN OpenCL data

Run on 500 images

Run on 50,000 images

ArmNN Reference data (NOT RECOMMENDED)

Run on 500 images

Run on 50,000 images (NOT RUN)

Validate experimental data

hikey

500 images

TFLite vs. ArmNN Neon

TFLite vs. ArmNN OpenCL

ArmNN Neon vs. ArmNN OpenCL

50,000 images

TFLite vs. ArmNN Neon

TFLite vs. ArmNN OpenCL

ArmNN Neon vs. ArmNN OpenCL

velociti

500 images

TFLite vs. ArmNN Reference

Object detection

Caveats

TFLite

Python 3

`hikey`

`velociti`

`hikey`

`velociti`

Packages