OpenCL host code for optimized interfacing with Xilinx Devices.
Examples Table :
Example | Description | Key Concepts/Keywords |
---|---|---|
concurrent_kernel_execution | This example will demonstrate how to use multiple and out of order command queues to simultaneously execute multiple kernels on an FPGA. | Key Concepts Keywords |
copy_buffer | This Copy Buffer example demonstrate how one buffer can be copied from another buffer. | Key Concepts
Keywords
|
data_transfer | This example illustrates several ways to use the OpenCL API to transfer data to and from the FPGA | Key Concepts
Keywords
|
debug_profile | This is simple example of vector addition and printing profile data (wall clock time taken between start and stop). It also dump a waveform file which can be reloaded to vivado to see the waveform. Run command 'vivado -source ./scripts/open_waveform.tcl -tclargs <device_name>-<kernel_name>.<target>.<device_name>.wdb' to launch waveform viewer. User can also update batch to gui in xrt.ini file to see the live waveform while running application. The example also demonstrates the use of hls::print to print a format string/int/double argument to standard output, and to the simulation log in cosim and HW_EMU. | Key Concepts Keywords
|
device_only_buffer | This example will demonstrate how to create buffers in global memory which are not mapped to host. The device only memory allocation is done through the host code. The kernel can read data from device memory and write result to device memory. | Key Concepts
Keywords
|
device_query | This Example prints the OpenCL properties of the platform and its devices using OpenCL CPP APIs. It also displays the limits and capabilities of the hardware. | Key Concepts
|
errors | This example discuss the different reasons for errors in OpenCL and how to handle them at runtime. | Key Concepts
Keywords
|
errors_cpp | This example discuss the different reasons for errors in OpenCL C++ and how to handle them at runtime. | Key Concepts
Keywords
|
hbm_large_buffers | This is a simple example of vector addition to describe how HBM pseudo-channels can be grouped to handle buffers larger than 256 MB. | Key Concepts
Keywords |
hbm_rama_ip | This is host application to test HBM interface bandwidth for buffers > 256 MB with pseudo random 1024 bit data access pattern, mimicking Ethereum Ethash workloads. Design contains 4 compute units of Kernel, 2 with and 2 without RAMA IP. Each compute unit reads 1024 bits from a pseudo random address in each of 2 pseudo channel groups and writes the results of a simple mathematical operation to a pseudo random address in 2 other pseudo channel groups. Each buffer is 1 GB large requiring 4 HBM banks. Since the first 2 CUs requires 4 buffers each and are then used again by the other 2 CUs, the .cfg file is allocating the buffers to all the 32 HBM banks. The host application runs the compute units concurrently to measure the overall bandwidth between kernel and HBM Memory. | Key Concepts
Keywords |
hbm_simple | This is a simple example of vector addition to describe how to use HLS kernels with HBM (High Bandwidth Memory) for achieving high throughput. | Key Concepts
Keywords |
host_memory_copy_buffer | This is simple host memory example to describe how host-only memory can be copied to device-only memory and vice-versa. | Key Concepts Keywords
|
host_memory_copy_kernel | This is a Host Memory Example to describe how data can be copied between host-only buffer and device-only buffer using User Copy Kernel. | Key Concepts Keywords
|
host_memory_simple | This is simple host memory example to describe how a user kernel can access the host memory. The host memory allocation is done through the host code. The kernel reads data from host memory and writes result to host memory. | Key Concepts
Keywords
|
iops_test | This is simple test design to measure Input/Output Operations per second. In this design, a simple kernel is enqueued many times and measuring overall IOPS. | Key Concepts
Keywords |
mult_compute_units | This is simple Example of Multiple Compute units to showcase how a single kernel can be instantiated into Multiple compute units. Host code will show how to use multiple compute units and run them concurrently. | Key Concepts Keywords |
multiple_cus_asymmetrical | This is simple example of vector addition to demonstrate how to connect each compute unit to different banks and how to use these compute units in host applications | Key Concepts |
overlap | This examples demonstrates techniques that allow user to overlap Host(CPU) and FPGA computation in an application. It will cover asynchronous operations and event object. | Key Concepts
Keywords |
p2p_bandwidth | This is simple example to test data transfer between SSD and FPGA. | Key Concepts
Keywords
|
p2p_fpga2fpga | This is simple example to explain P2P transfer between two FPGA devices. | Key Concepts
Keywords
|
p2p_overlap_bandwidth | This is simple example to test Synchronous and Asyncronous data transfer between SSD and FPGA. | Key Concepts
Keywords
|
p2p_simple | This is simple example of vector increment to describe P2P between FPGA and NVMe SSD. | Key Concepts
Keywords
|
streaming_free_running_k2k | This is simple example which demonstrate how to use and configure a free running kernel. | Key Concepts Keywords |
streaming_k2k_mm | This is a simple kernel to kernel streaming Vector Add and Vector Multiply C Kernel design with 2 memory mapped input to kernel 1, 1 Stream output from kernel 1 to input of kernel 2, 1 memory mapped input to kernel 2, and 1 memory mapped output that demonstrates on how to process a stream of data for computation between two kernels. This design also illustrates how to set FIFO depth for AXIS connections i.e. for the stream connecting the two kernels | Key Concepts Keywords |