Name	Name	Last commit message	Last commit date
Latest commit History 202 Commits
antares	antares
backends	backends
codehub	codehub
docker	docker
engine	engine
frameworks	frameworks
graph_evaluator	graph_evaluator
hardware	hardware
lang	lang
public	public
tuner	tuner
.dockerignore	.dockerignore
AntaresIR.md	AntaresIR.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE.TXT	LICENSE.TXT
Makefile	Makefile
README.md	README.md

What is Antares:

Antares is an automatic engine for multi-platform kernel generation and optimization (targeting to CUDA/ROCm/CPU/DirectX12/Graphcore/OneAPI).
Antares simplifies most TVM's low-level features, making it easier for DNN developers to translate computation to Microsoft related platforms.
Antares follows "One Language Syntax for All Platforms" principle to reduce the description complexity on different platforms.

Antares Functionality:

Antares can convert computing operators from your DNN models into low-level source codes of the target device (e.g. kernels, shaders, ..).
Antares can also automatically tune and optimize these DNN operators on end-to-end device using efficient mechanisms and algorithms.

Helpful Use Cases:

You want to modify fine-grain DNN workloads, but Tensorflow/Pytorch's built-in implementation are limited.
You notice some operators are inefficent, and you want to replace it with a better one easily.
You can port your full DNN models into Window executable and get acceleration with DirectX12 + Intel/AMD/NVIDIA graphic cards.
You want to split fine-grain operator workloads into the local tile node of Graphcore, which benifits the on-ship memory usage and reduces BSP communication overhead.
Evaluate the compiler or potential runtime efficiency within Antares supported accelerators, e.g. A100.
Antares provides a large domain for researchers to develop on kernel optimizations, e.g. custom tuners, custom schedule policies, custom platforms, etc.

Install Antares:

sudo apt install docker.io

git clone https://github.com/microsoft/antares

cd antares/
sudo BACKEND=c-cuda make  # If you have NVIDIA GPU with CUDA driver installed
sudo BACKEND=c-rocm make  # If you have AMD GPU with ROCm driver installed

# If you need Antares to extend/boost Tensorflow-GPU operators, please also run:
sudo python3 ./frameworks/tensorflow/setup.py

# Reference - Recommended Installation Package Choices for Tensorflow 1.x & 2.x (tested in Ubuntu 20.04):
#   Tensorflow-1 for NVIDIA CUDA 10.0: python3 -m pip install --upgrade pip && python3 -m pip install tensorflow-gpu==1.15.4
#   Tensorflow-1 for NVIDIA CUDA 11.0: python3 -m pip install --upgrade pip && python3 -m pip install https://github.com/ghostplant/tensorflow-wheel-collections/releases/download/cuda-11/tensorflow_gpu-1.15.4_cuda11+nv-cp38-cp38-linux_x86_64.whl
#   Tensorflow-2 for NVIDIA CUDA 11.0: python3 -m pip install --upgrade pip && python3 -m pip install tensorflow-gpu==2.4.0
#   Tensorflow-1 for AMD ROCm 4.0:  python3 -m pip install tensorflow-rocm==1.15.9
#   Tensorflow-2 for AMD ROCm 4.0:  python3 -m pip install tensorflow-rocm==2.4.0

# If you need Antares to extend/boost Pytorch-GPU operators, please also run:
sudo python3 ./frameworks/pytorch/setup.py

# Reference - Recommended Installation Package Choices for Pytorch (tested in Ubuntu 20.04):
#   Pytorch for NVIDIA CUDA 10.0: python3 -m pip install torch==1.5.0 torchvision==0.6.0 -f https://download.pytorch.org/whl/torch_stable.html
#   Pytorch for NVIDIA CUDA 11.0: python3 -m pip install torch===1.7.1+cu110 torchvision===0.8.2+cu110 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html
#   Pytorch for AMD ROCm 4.0:  python3 -m pip install torch torchvision -f https://download.pytorch.org/whl/rocm4.0.1/torch_stable.html

Example with Tensorflow-GPU/Pytorch-GPU:

This example shows you an easy way to quickly add custom operators in Tensorflow/Pytorch, but the operator itself is not an optimized version (not tuned).

# First, launch the antares REST server (a CUDA example)

BACKEND=c-cuda make rest-server

Tensorflow Frontend Only (>= 1.15.x / >= 2.4.x):

# For Tensorflow CUDA frontend, execute the following python script:

import tensorflow as tf
from tensorflow.contrib import antares

if tf.version.VERSION.startswith('2.'):
  tf = tf.compat.v1
  tf.disable_eager_execution()

x = tf.get_variable('x', [128, 1024], tf.float32, initializer=tf.initializers.ones(tf.float32), trainable=False)
y = tf.get_variable('y', [1024, 1024], tf.float32, initializer=tf.initializers.ones(tf.float32), trainable=False)

op = antares.make_op(ir='dot_0[N, M] +=! data[N, K] * weight[K, M]', feed_dict={'data': x, 'weight': y}).tune(step=100, use_cache=True, timeout=600).emit()

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  print('The result of tensor `%s` is:\n%s' % (op, sess.run(op)))

Pytorch Frontend Only:

# For Pytorch frontend, execute the following python script:
import torch
from torch.contrib.antares.custom_op import CustomOp

device = torch.device("cuda")
dtype = torch.float32

kwargs = {'dtype': dtype,
          'device': device,
          'requires_grad': False}

x = torch.ones(128, 1024, **kwargs)
y = torch.ones(1024, 1024, **kwargs)

custom_op = CustomOp(ir='dot_0[N, M] +=! data[N, K] * weight[K, M]', feed_dict={'data': x, 'weight': y}).to(device, dtype).tune(step=100, use_cache=True, timeout=600).emit()

result = custom_op()
print('The result of tensor `%s` is:\n%s' % (result.id, result))

Codegen for More Backends:

Generally, you can generate SYCL source kernels that work for most Intel CPUs, e.g:

    BACKEND=c-sycl_intel COMPUTE_V1='- einstein_v2("output0[N, F, HO, WO] +=! input0[N, C, HO * 4 + KH, WO * 4 + KW] * input1[F, C, KH, KW] where HO in 55, WO in 55", input_dict={"input0": {"dtype": "float32", "shape": [64, 3, 227, 227]}, "input1": {"dtype": "float32", "shape": [96, 3, 11, 11]}});' make

To generate codes for Windows 10 with DX12 enabled, you can setup WSL1.0 and make the following setup in WSL1.0:

    sudo make install_host
    BACKEND=c-hlsl_win64 COMPUTE_V1='- einstein_v2("output0[N, F, HO, WO] = input0[N] where F in 32, HO in 2, WO in 2", input_dict={"input0": {"dtype": "float32", "shape": [16]}})' make

For multi-core CPU (c-mcpu) or single-core CPU (c-scpu):

    BACKEND=c-mcpu COMPUTE_V1='- einstein_v2("output0[N, C, H, W] = input0[N, H, W, C]", input_dict={"input0": {"dtype": "float32", "shape": [32, 229, 229, 3]}})' make

Documentation for Advanced Examples:

For more syntax usage or examples, please follow documentation here: Antares IR & Examples

Antares can support multi-line statements as long as they are fuse-able, for example of ConvReluBias:

    conv_out[N, F, HO, WO] +=! input_data[N, C, HO + KH, WO + KW] * kernel[KH, KW, C, F] where HO in 256, WO in 256;

    conv_bias[N, F, HO, WO] = conv_out[N, F, HO, WO] + bias[0, F, 0, 0];

    output0[N, F, HO, WO] = conv_bias[N, F, HO, WO].when(conv_bias[N, F, HO, WO] > 0.0, 0.0);

Current Feature Table:

	HIP-C(c-rocm/c-rocm_win64)	CUDA(c-cuda/c-cuda_win64)	CPU(c-mcpu/c-scpu)	DirectX12(c-hlsl_win64)	Graphcore(c-gc)	Intel OneAPI(c-sycl_intel)
Deploy Environment	Linux/WSL1	Linux	Linux	WSL1	Linux	Linux
Target Device	AMDGPU	NVGPU	Generic CPU	Generic Graphic Card	IPU Device	Intel CPU/HD Graphic/FPGA
Global schedules	Y	Y	Y	Y	Y	Y
Local schedules	Y	Y	Y	Y		Y
Head fusion	Y	Y	Y	Y	Y	Y
Tail fusion	Y	Y		Y
Evaluator	Y	Y	Y	Y	Y	Y
Tensorflow Plugin	Y	Y
Pytorch Plugin	Y	Y
Multi Kernel Eval	Y	Y

For non Tensorflow/Pytorch users:

How to Tune Expressions Manually and Get Tuned Source Code:

Firstly, you need to describe what kind of computing logic according to standard Antares IR, and set the IR string to environmental variable COMPUTE_V1. Plus environmental variable BACKEND to select the target backend type, these 2 environment settings can help you quickly generate a reference kernel code, regardless of the execution performance. If you want to further optimize the operator automatically, you just need to add one more variable in your first-run examples: STEP=1000, which means Antares will take 1000 chances to try and search a potenially faster kernel version. For example,

    STEP=100 BACKEND=c-cuda COMPUTE_V1='- einstein_v2("output0[N, F, HO, WO] +=! input0[N, C, HO * 4 + KH, WO * 4 + KW] * input1[F, C, KH, KW] where HO in 55, WO in 55", input_dict={"input0": {"dtype": "float32", "shape": [64, 3, 227, 227]}, "input1": {"dtype": "float32", "shape": [96, 3, 11, 11]}});' make

Tuning will take several times to finish. As long as your environment is correctly configured, you will finally get a JSON-format configuration which represents the best kernel version Antares found, then you can do 2 things:

Re-evalutation on the Antares-tuned case by adding CONFIG variable, whose content is exactly the JSON-format configuration you get from your last corresponding tuning reports:

    CONFIG='{"..": [..], ..}' COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' BACKEND=c-cuda make

If you want to save the kernel code, you need to append COMMIT=1 for your case, like:

    COMMIT=1 CONFIG='{"..": [..], ..}' COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' BACKEND=c-cuda make

The generated kernel code will be saved in codehub folder as a determistic filename.

Environment variable COMMIT works in not only re-evalutation command, but also tuning command, e.g.:

    COMMIT=1 STEP=100 BACKEND=c-cuda COMPUTE_V1='- einstein_v2("output0[N, F, HO, WO] +=! input0[N, C, HO * 4 + KH, WO * 4 + KW] * input1[F, C, KH, KW] where HO in 55, WO in 55", input_dict={"input0": {"dtype": "float32", "shape": [64, 3, 227, 227]}, "input1": {"dtype": "float32", "shape": [96, 3, 11, 11]}});' make

If a same case (with same COMPUTE_V1 value) has been tuned and saved in history already, the setting of COMMIT=1 will block you from tuning it again to avoid the overwritten of history kernel code in codehub. But You can still set COMMI=force to allow such overwritten.

About Microsft Open Source

For more information about Microsoft Open Source Policy, please see Microsoft Open Source Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is Antares:

Antares Functionality:

Helpful Use Cases:

Install Antares:

Example with Tensorflow-GPU/Pytorch-GPU:

Codegen for More Backends:

Documentation for Advanced Examples:

Current Feature Table:

For non Tensorflow/Pytorch users:

How to Tune Expressions Manually and Get Tuned Source Code:

About Microsft Open Source

About

Releases

Packages

Languages

License

dxml/antares

Folders and files

Latest commit

History

Repository files navigation

What is Antares:

Antares Functionality:

Helpful Use Cases:

Install Antares:

Example with Tensorflow-GPU/Pytorch-GPU:

Codegen for More Backends:

Documentation for Advanced Examples:

Current Feature Table:

For non Tensorflow/Pytorch users:

How to Tune Expressions Manually and Get Tuned Source Code:

About Microsft Open Source

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages