2020 Q4 Objectives (OKRs)

This Quarter's HIGH-LEVEL THEME

CPU perf burndown. Bring up the perf burndown process and turn the crank a few times on CPU codegen. The initial workload we'll be targeting is the MobileBERT encoder. In parallel we will assess potential alternative workloads for next quarter.
GPU back-end. Continue to land critical infrastructure, take a moment to pause and evaluate where we stand on performance, and pursue any low-hanging perf fruit in anticipation of a GPU perf burndown in Q1. To the extent it makes sense, prioritize work that benefits both CPU and GPU perf.
Infrastructure. Continue to make critical improvements to build infrastructure that improve development velocity. Support the CPU perf burndown effort by ensuring that benchmarking can be performed easily and its results are meaningful and reliable.

SPECIFIC OKRs

P1 O: [Perf burndown] Bring up and execute perf burndown process for MobileBERT workload

P1 KR: IREE CPU codegen achieves near peak throughput on all of the MobileBERT matmul shapes
- At least the ones with > 4% weight in the above profile.
P1 KR: Ensure that IREE CPU codegen achieves decent performance on softmax. That means matching TFLite performance on the softmax layers in the MobileBERT model.
P1 KR: Able to benchmark and profile the whole MobileBERT workload in both TFLite and IREE and compare results.
P1 KR: Performed at least 2 cycles of: (a) assess whole workload performance (b) identify key source of performance delta between TFLite and IREE (c) fix the issue (d) repeat.
P1 KR: Prioritized list of key sources of performance delta between TFLite and IREE
- This list should be largely composed of issues that have non-trivial resolutions, like a re-design of a key IREE component. It should be updated on an ongoing basis.
P2 KR: MobileBERT end-to-end benchmark matches 80% of TFLite performance.
P2 KR: Sources of remaining perf gap in TFLite vs IREE on MobileBERT end-to-end benchmark characterized.

P1 O: [Perf burndown] Improve benchmarking and profiling tooling to support perf burndown

P1 KR: Able to micro-benchmark kernels using a shared, documented tool
- Improve the dump of input and output for each dispatch function.
- Dump dispatch functions to files
- Improve diffing tools
- Report performance for each dispatch function?
P1 KR: Able to perform IREE profiling using Tracy
- See iree-org#1886, https://github.com/wolfpld/tracy
P1 KR: Able to map time spent in execution to back to source using Tracy
- See iree-org#1199
- Source layer (source python, HLO, HAL, etc) is configurable at compile time.
P1 KR: Able to track compile-time performance-related statistics
- See iree-org#1409
- Initial stats to track: number of executables, the serialized size of constant data, the serialized size of the executables, the number of host readbacks (flow.tensor.load), backend specific stats like the number of split dispatches in the SPIR-V backend, dynamic shape info like the number of tensors with dynamic shapes that survive after shape propagation
P1 KR: Internal and external contributors able to confidently assess performance impact of a change.
- Like correctness, have tests to guard against performance regressions
- Either some kind of presubmit using a consistent environment or instructions for running something manually that we believe will offer a useful before/after signal.
- Include TFLite as a baseline.
- Build test suite and mechanism for tracking performance

P1 O: [Perf burndown] Identify target workload for IREE perf credibility burndown

Note: This is in preparation for a 2021Q1 objective: Establish IREE's credibility at delivering competitive production levels of performance on a realistic use case.

P1 KR: Defined criteria for evaluating workloads for the burndown.
P1 KR: Evaluated criteria (including performance analysis) for all candidate workloads.
P1 KR: Selected target workload for the CPU burndown.
P1 KR: Selected target workload for the GPU burndown in Q1.
- Representative for GPU. what to compare. What we can achieve in 1Q

P1 O: [Perf burndown] Add initial support for multi-threaded workloads

P1 KR: Selected a target multi-threaded workload for development and initial benchmarking
P1 KR: Able to benchmark a multi-threaded workload on CPU.
P1 KR: Use GPU tiling pass on CPU, as well as treating CPU as another device
P2 KR: Documentation of known issues / architectural challenges with current approach to multi-threading.

P1 O: [Infra] Improve OSS build infrastructure to support continued development

P1 KR: Minimal-effort merging process for integrating new LLVM commits with no dependency on TF
- Continuous build for OSS LLVM build files. Propose upstreaming to LLVM community.
P1 KR: IREE Core build doesn't depend on TF
- Requires integration with separate MLIR-HLO repo.
P1 KR: Extended build bot / lint coverage catches issues in OSS
- asan, clang-tidy, windows iff volunteer with windows machine, yapf, bazel android, unibeautify for formatting more generally(?), binary size.
P1 KR: General build health
- A bunch of small things that require attention. RBE warnings on diagnostics that are disabled, remove use of globs in cmake, keep up to date with Bazel versions, see if we can speedup Bazel bot builds by moving sandbox root to tmpfs, remove common include directories.
P2 KR: Check tests are fast in dbg mode
- Currently we've got quadratic growth for a slow path in swiftshader dbg. Potential solutions: IREE multi-module -> archive compilation support, conditional sharing of instances in tests.

P1 O: [User-facing] Prepare to support real-world use cases

Notes: Keep a pulse on deployment user journeys, continue to gather requirements from interested users, set ourselves on a path to production use on at least one platform.

P1 KR: A new sample application showing high-level IREE behavior
- Android/Java speech demo and/or desktop Vulkan image-to-image pipeline
P1 KR: A PRD that captures deployment requirements (such as platform and device support, async model download, etc.)
P2 KR: Identify and track list of potential customers, their use cases, and deployment requirements
P2 KR: Proof of concept deployment with at least one other team
- Focus on requirements gathering and prototyping, find what features we're missing (e.g. Android or Vulkan memory sharing, build configurations compatible with Stadia, etc.)

P1 O: [User-facing] Expand Java API to support a useful sample Android app

P1 KR: Support for different input and output types
P1 KR: Support for driver creation
P1 KR: Support for different module types, including: HAL, bytecode, tensorlist, and string
P2 KR: Design for supporting custom modules

P1 O: [Model support] Achieve target Tensorflow front-end fidelity

P1 KR: Full ASR Decoder to HLO
P1 KR: Investigate TF to HLO lowering without TF folding (e.g. linspace)
P2 KR: Transformer ASR to HLO
P2 KR: Documentation detailing unlowerable operations to HLO
P3 KR: Stretch: SASP rewrite for better shape inference

P1 O: [Model support] Implement remaining VMLA ops

P1 KR: FFT operations
P1 KR: Sort operation
P1 KR: Full ASR Decoder executing on VMLA
P2 KR: Transformer ASR executing on VMLA

P1 O: [CPU Codegen] Improve CPU codegen infrastructure

P1 KR: Lowering path MHLO ops --> linalg named Ops (library calls).
- Sort, TopK and similar indexing stye ops
P1 KR: Support Linalg fusion on buffers using stack allocations.
P1 KR: Improve AOT linking and support automatic toolchain discovery
- Link all executables in a single dylib.
- Support exporting/loading dylib to standalone binary.

P1 O: [MLIR codegen] Retargetable codegeneration (Vector dialect-based approach)

P1 KR: Develop mechanisms to distribute vector operation at workgroup level to vector operation at subgroup level / work item level
P1 KR: Handle distribution of producer-consumer vector operations to implement fusion

P1 O: [GPU Codegen] Improve generic GPU codegen performance

P1 KR: Using Linalg fusion on buffers
- Fuse operations like matmul/conv with its producers/consumers, using workgroup memory as intermediate tile storage. Examples: Elementwise -> Matmul -> Elementwise, Padding -> Conv/Pool operations
P1 KR: Fast matmul for mobile GPUs that don't have tensor core units
- - This applies work from Q2 to more architectures (e.g., ARM and Qualcomm GPUs)
- - Match handwritten vulkan kernel (On Pixel4: current iree matmul for a 1K matrix runs in 128ms and handwritten kernel runs in 19ms)
P1 KR: Using subgroup operations for reduction
P1 KR: Achieve reasonable performance for one model on one mobile GPU
- - Decide on a mobile GPU
- - Match TFLite MobileNetV2 performance (f32, imagenet)

P1 O: [GPU Codegen] Add support for targeting different GPU hardware

P1 KR: Define representation for Vulkan/SPIR-V targets
- Define at least one target for NVIDIA, AMD, Qualcomm, ARM
P1 KR: Use Vulkan/SPIR-V targets to guide GPU CodeGen
- Enable promotion when workgroup memory is available
- Choose proper tile parameters
- Choose proper cooperative matrix parameters
- Choose proper workgroup/subgroup size
- Goal is to have a prototype supporting the limited cases we currently have and have a plan to scale. Ex: Support both Mali/Adreno best parameters for tile/workgroup size. Potentially also support Nvidia Turing GPU.

P1 O: [GPU Codegen] Build infrastructure for performance improvement

P1 KR: Document clear profiling tools/flows
- - Mostly covering vendor/platform specific tools
- - For both desktop NVIDIA/AMD and Android AGI
P1 KR: Collect performance metrics of mobile GPUs
- - Get empirical data over data movement performance
- - Get empirical data over subgroup ops performance
- - Get empirical data over best tiling/workgroup size
- - Get empirical data over different matmul impls
- - Produce a repo for hosting such benchmarks
- - Produce a doc listing such results

P1 O: [Strategy] Define high-level strategy for quantization support in IREE

P1 KR: Strategy document describing 2021 roadmap, resourcing, and approach
P1 KR: Two initial quantization projects described and queued up for 2021 Q1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

objectives.md

objectives.md

2020 Q4 Objectives (OKRs)

This Quarter's HIGH-LEVEL THEME

SPECIFIC OKRs

P1 O: [Perf burndown] Bring up and execute perf burndown process for MobileBERT workload

P1 O: [Perf burndown] Improve benchmarking and profiling tooling to support perf burndown

P1 O: [Perf burndown] Identify target workload for IREE perf credibility burndown

P1 O: [Perf burndown] Add initial support for multi-threaded workloads

P1 O: [Infra] Improve OSS build infrastructure to support continued development

P1 O: [User-facing] Prepare to support real-world use cases

P1 O: [User-facing] Expand Java API to support a useful sample Android app

P1 O: [Model support] Achieve target Tensorflow front-end fidelity

P1 O: [Model support] Implement remaining VMLA ops

P1 O: [CPU Codegen] Improve CPU codegen infrastructure

P1 O: [MLIR codegen] Retargetable codegeneration (Vector dialect-based approach)

P1 O: [GPU Codegen] Improve generic GPU codegen performance

P1 O: [GPU Codegen] Add support for targeting different GPU hardware

P1 O: [GPU Codegen] Build infrastructure for performance improvement

P1 O: [Strategy] Define high-level strategy for quantization support in IREE

Files

objectives.md

Latest commit

History

objectives.md

File metadata and controls

2020 Q4 Objectives (OKRs)

This Quarter's HIGH-LEVEL THEME

SPECIFIC OKRs

P1 O: [Perf burndown] Bring up and execute perf burndown process for MobileBERT workload

P1 O: [Perf burndown] Improve benchmarking and profiling tooling to support perf burndown

P1 O: [Perf burndown] Identify target workload for IREE perf credibility burndown

P1 O: [Perf burndown] Add initial support for multi-threaded workloads

P1 O: [Infra] Improve OSS build infrastructure to support continued development

P1 O: [User-facing] Prepare to support real-world use cases

P1 O: [User-facing] Expand Java API to support a useful sample Android app

P1 O: [Model support] Achieve target Tensorflow front-end fidelity

P1 O: [Model support] Implement remaining VMLA ops

P1 O: [CPU Codegen] Improve CPU codegen infrastructure

P1 O: [MLIR codegen] Retargetable codegeneration (Vector dialect-based approach)

P1 O: [GPU Codegen] Improve generic GPU codegen performance

P1 O: [GPU Codegen] Add support for targeting different GPU hardware

P1 O: [GPU Codegen] Build infrastructure for performance improvement

P1 O: [Strategy] Define high-level strategy for quantization support in IREE