-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
4 changed files
with
191 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,178 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# ACCL Compression Support\n", | ||
"In general, ACCL is datatype-agnostic as most of its function involves data movement without any actual interaction with the data values themselves. However, there are two exceptions to this rule:\n", | ||
"* Elementwise operations (e.g. SUM) performed by ACCL instances on buffers during reduction-type collectives \n", | ||
"* Datatype conversions when the source and destination of a transfer are of different data types. In this scenario we call the lower-precision buffer is compressed.\n", | ||
"\n", | ||
"To support these elementwise operations and conversions, ACCL must be configured with a reduction plugin and coversion plugins respectively. Each of these plugins is a free-running Vitis kernel. Reduction plugins take two operand AXI Streams and produce one result AXI Stream, and may implement multiple functions internally, selected by an operation ID provided as side-band to the operands on the TDEST signal of AXI Stream. Conversion plugins take one operand AXI Stream as input and produce a result AXI Stream by applying an arbitrary conversion function specified by a function ID on the operand's TDEST. \n", | ||
"\n", | ||
"Example reduction and conversion plugins are provided in the ACCL repo. The example reduction plugin supports five data types: FP16/32/64 and INT32/64. The example compression plugin converts between floating-point single-precision (FP32) and half-precision (FP16). Together, these plugins enable six datatype configurations - let's see what they are:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from pyaccl.accl import ACCL_DEFAULT_ARITH_CONFIG\n", | ||
"\n", | ||
"for key in ACCL_DEFAULT_ARITH_CONFIG:\n", | ||
" print(f\"Uncompressed dtype: {key[0]}\\nCompressed dtype: {key[1]}\\n{str(ACCL_DEFAULT_ARITH_CONFIG[key])}\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Five of these configurations are homogeneous, i.e. operate on buffers of identical data types. One is heterogeneous and can operate on combinations of FP32 and FP16 buffers, e.g. source buffers of a primitive can be FP32 and results FP16 or vice-versa, by utilizing the conversion plugin. \n", | ||
"\n", | ||
"The key points of the ACCL datatype configuration are:\n", | ||
"* bytes per element for the compressed and uncompressed datatype. In the case of homogeneous configurations, these are the same datatype.\n", | ||
"* ratio of compressed elements to uncompressed elements, i.e. how many uncompressed buffer elements are consumed in the conversion process to produce one compressed element. For elementwise conversion e.g. FP32 to FP16, this ratio is 1. For block floating point formats, this ratio could be higher.\n", | ||
"* whether arithmetic should be performed on the compressed data - for higher throughput - or uncompressed data - for higher precision. ACCL determines the order of conversions required to meet this specifications for each primitive and collective. \n", | ||
"* function IDs to be provided to the plugins when performing compression, decompression, and reduction.\n", | ||
"\n", | ||
"Notice that in the ACCL default FP32/FP16 compression configuration, arithmetic is perfomed on the lower-precision FP16 datatype. Let's initialize two ACCL instances and see how the FP16 compression feature works." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"RUN_ON_HARDWARE = True\n", | ||
"XCLBIN = \"axis3x.xclbin\"\n", | ||
"\n", | ||
"from pyaccl import accl\n", | ||
"\n", | ||
"if RUN_ON_HARDWARE:\n", | ||
" accl0 = accl(2, 0, xclbin=XCLBIN, cclo_idx=0)\n", | ||
" accl1 = accl(2, 1, xclbin=XCLBIN, cclo_idx=1)\n", | ||
"else:\n", | ||
" accl0 = accl(2, 0, sim_mode=True)\n", | ||
" accl1 = accl(2, 1, sim_mode=True)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Operating on buffers of different data types\n", | ||
"\n", | ||
"Let's do a reduction between a two NumPy FP32 bufferas, with the result stored in a FP16 buffer. First we'll allocate these buffers using the `dtype` optional argument to `allocate()`, paint the buffers with high-precision data, then perform the local reduction. Since in this mixed-precision scenario ACCL by default performs arithmetic in FP16, the sum-combine is equivalent to the following sequence of operations:\n", | ||
"1. convert `op0` and `op1` to FP16\n", | ||
"2. perform the sum in FP16\n", | ||
"3. store the result in `res`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import numpy as np\n", | ||
"from pyaccl import ACCLReduceFunctions\n", | ||
"\n", | ||
"op0 = accl0.allocate((10,), dtype=np.float32)\n", | ||
"op0[:] = [np.pi*i for i in range(10)]\n", | ||
"op1 = accl0.allocate((10,), dtype=np.float32)\n", | ||
"op1[:] = [1.1*i for i in range(10)]\n", | ||
"res = accl0.allocate((10,), dtype=np.float16)\n", | ||
"\n", | ||
"accl0.combine(10, ACCLReduceFunctions.SUM, op0, op1, res)\n", | ||
"\n", | ||
"print(op0+op1)\n", | ||
"print((op0+op1).astype(np.float16))\n", | ||
"print((op0.astype(np.float16)+op1.astype(np.float16)).astype(np.float16))\n", | ||
"print(res)\n", | ||
"np.sum(np.abs(np.subtract(op0+op1, res)))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Notice how the result is slightly different depending on whether we perform the sum in FP32 and FP16. The ACCL result is slightly different than the NumPy result due to differences in the underlying floating point ALUs on the FPGA and CPU respectively." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Compressing data over the wire\n", | ||
"\n", | ||
"In addition to local conversions, users can specify FP16 compression for traffic across the backend (typically Ethernet) link between ACCL instances even when all buffers are FP32. This feature reduces network traffic and latency, but as expected, there is a loss of precision of data during transport. Let's compress data for a simple send-receive pair. We need to utilize the `compress_dtype` optional argument for both `send()` and `recv()`. Please note that the compression settings must match for the receive operation to identify the received buffer." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"src = accl0.allocate((10,))\n", | ||
"dst = accl1.allocate((10,))\n", | ||
"src[:] = [1.1111*i for i in range(10)]\n", | ||
"dst[:] = [0.0 for i in range(10)]\n", | ||
"\n", | ||
"accl0.send(src, len(src), 1, tag=0, compress_dtype=np.dtype('float16'))\n", | ||
"accl1.recv(dst, len(dst), 0, tag=0, compress_dtype=np.dtype('float16'))\n", | ||
"\n", | ||
"print(src)\n", | ||
"print(dst)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## De-Initialize ACCL instances\n", | ||
"The `deinit()` function clears all internal data structures in the ACCL instance." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"accl0.deinit()\n", | ||
"accl1.deinit()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3.8.10 64-bit", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.10" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "916dbcbb3f70747c44a77c7bcd40155683ae19c65e1c03b4aa3499c5328201f1" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters