Name	Name	Last commit message	Last commit date
parent directory ..
concurrent_kernel_execution	concurrent_kernel_execution
copy_buffer	copy_buffer
data_transfer	data_transfer
debug_profile	debug_profile
device_only_buffer	device_only_buffer
device_query	device_query
errors	errors
errors_cpp	errors_cpp
hbm_large_buffers	hbm_large_buffers
hbm_rama_ip	hbm_rama_ip
hbm_simple	hbm_simple
host_memory_copy_buffer	host_memory_copy_buffer
host_memory_copy_kernel	host_memory_copy_kernel
host_memory_simple	host_memory_simple
iops_test	iops_test
mult_compute_units	mult_compute_units
multiple_cus_asymmetrical	multiple_cus_asymmetrical
overlap	overlap
p2p_bandwidth	p2p_bandwidth
p2p_fpga2fpga	p2p_fpga2fpga
p2p_overlap_bandwidth	p2p_overlap_bandwidth
p2p_simple	p2p_simple
streaming_free_running_k2k	streaming_free_running_k2k
streaming_k2k_mm	streaming_k2k_mm
README.rst	README.rst
summary.json	summary.json
summary.mk	summary.mk

Host Examples

OpenCL host code for optimized interfacing with Xilinx Devices.

Examples Table :

Example	Description	Key Concepts/Keywords
concurrent_kernel_execution	This example will demonstrate how to use multiple and out of order command queues to simultaneously execute multiple kernels on an FPGA.	Key Concepts Concurrent execution Out of Order Command Queues Multiple Command Queues Keywords CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE setCallback
copy_buffer	This Copy Buffer example demonstrate how one buffer can be copied from another buffer.	Key Concepts Copy Buffer Keywords cl::CommandQueue enqueueCopyBuffer enqueueWriteBuffer enqueueReadBuffer enqueueMigrateMemObjects
data_transfer	This example illustrates several ways to use the OpenCL API to transfer data to and from the FPGA	Key Concepts OpenCL Host APIs Data Transfer Write Buffers Read Buffers Map Buffers Async Memcpy Keywords enqueueWriteBuffer enqueueReadBuffer enqueueMapBuffer enqueueUnmapMemObject enqueueMigrateMemObjects
debug_profile	This is simple example of vector addition and printing profile data (wall clock time taken between start and stop). It also dump a waveform file which can be reloaded to vivado to see the waveform. Run command 'vivado -source ./scripts/open_waveform.tcl -tclargs <device_name>-<kernel_name>.<target>.<device_name>.wdb' to launch waveform viewer. User can also update batch to gui in xrt.ini file to see the live waveform while running application. The example also demonstrates the use of hls::print to print a format string/int/double argument to standard output, and to the simulation log in cosim and HW_EMU.	Key Concepts Use of Profile API Waveform Dumping and loading Keywords debug_mode=gui/batch user_range user_event hls::print
device_only_buffer	This example will demonstrate how to create buffers in global memory which are not mapped to host. The device only memory allocation is done through the host code. The kernel can read data from device memory and write result to device memory.	Key Concepts Device only buffer Keywords CL_MEM_HOST_NO_ACCESS
device_query	This Example prints the OpenCL properties of the platform and its devices using OpenCL CPP APIs. It also displays the limits and capabilities of the hardware.	Key Concepts OpenCL API Querying device properties
errors	This example discuss the different reasons for errors in OpenCL and how to handle them at runtime.	Key Concepts OpenCL API Error handling Keywords CL_SUCCESS CL_DEVICE_NOT_FOUND CL_DEVICE_NOT_AVAILABLE
errors_cpp	This example discuss the different reasons for errors in OpenCL C++ and how to handle them at runtime.	Key Concepts OpenCL Host API Error handling Keywords CL_SUCCESS CL_DEVICE_NOT_FOUND CL_DEVICE_NOT_AVAILABLE CL_INVALID_VALUE CL_INVALID_KERNEL_NAME CL_INVALID_BUFFER_SIZE
hbm_large_buffers	This is a simple example of vector addition to describe how HBM pseudo-channels can be grouped to handle buffers larger than 256 MB.	Key Concepts High Bandwidth Memory Multiple HBM Pseudo-channel Groups Keywords HBM
hbm_rama_ip	This is host application to test HBM interface bandwidth for buffers > 256 MB with pseudo random 1024 bit data access pattern, mimicking Ethereum Ethash workloads. Design contains 4 compute units of Kernel, 2 with and 2 without RAMA IP. Each compute unit reads 1024 bits from a pseudo random address in each of 2 pseudo channel groups and writes the results of a simple mathematical operation to a pseudo random address in 2 other pseudo channel groups. Each buffer is 1 GB large requiring 4 HBM banks. Since the first 2 CUs requires 4 buffers each and are then used again by the other 2 CUs, the .cfg file is allocating the buffers to all the 32 HBM banks. The host application runs the compute units concurrently to measure the overall bandwidth between kernel and HBM Memory.	Key Concepts High Bandwidth Memory Multiple HBM Pseudo-channels Random Memory Access Linear Feedback Shift Register RAMA IP Keywords HBM ra_master_interface
hbm_simple	This is a simple example of vector addition to describe how to use HLS kernels with HBM (High Bandwidth Memory) for achieving high throughput.	Key Concepts High Bandwidth Memory Multiple HBM pseudo-channels Keywords HBM XCL_MEM_TOPOLOGY cl_mem_ext_ptr_t trace_memory trace_buffer_size opencl_trace
host_memory_copy_buffer	This is simple host memory example to describe how host-only memory can be copied to device-only memory and vice-versa.	Key Concepts host memory Keywords XCL_MEM_EXT_HOST_ONLY CL_MEM_HOST_NO_ACCESS enqueueCopyBuffer
host_memory_copy_kernel	This is a Host Memory Example to describe how data can be copied between host-only buffer and device-only buffer using User Copy Kernel.	Key Concepts host memory Keywords XCL_MEM_EXT_HOST_ONLY CL_MEM_HOST_NO_ACCESS enqueueMapBuffer
host_memory_simple	This is simple host memory example to describe how a user kernel can access the host memory. The host memory allocation is done through the host code. The kernel reads data from host memory and writes result to host memory.	Key Concepts host memory address translation unit Keywords XCL_MEM_EXT_HOST_ONLY HOST[0]
iops_test	This is simple test design to measure Input/Output Operations per second. In this design, a simple kernel is enqueued many times and measuring overall IOPS.	Key Concepts Input/Output Operations per second Keywords CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
mult_compute_units	This is simple Example of Multiple Compute units to showcase how a single kernel can be instantiated into Multiple compute units. Host code will show how to use multiple compute units and run them concurrently.	Key Concepts Multiple Compute Units Keywords nk
multiple_cus_asymmetrical	This is simple example of vector addition to demonstrate how to connect each compute unit to different banks and how to use these compute units in host applications	Key Concepts Multiple Compute Units Task Level Parallelism
overlap	This examples demonstrates techniques that allow user to overlap Host(CPU) and FPGA computation in an application. It will cover asynchronous operations and event object.	Key Concepts OpenCL Host API Synchronize Host and FPGA Asynchronous Processing Events Asynchronous memcpy Keywords cl_event cl::CommandQueue CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE enqueueMigrateMemObjects
p2p_bandwidth	This is simple example to test data transfer between SSD and FPGA.	Key Concepts P2P SmartSSD XDMA Keywords XCL_MEM_EXT_P2P_BUFFER pread pwrite
p2p_fpga2fpga	This is simple example to explain P2P transfer between two FPGA devices.	Key Concepts P2P Multi-FPGA Execution XDMA Keywords XCL_MEM_EXT_P2P_BUFFER
p2p_overlap_bandwidth	This is simple example to test Synchronous and Asyncronous data transfer between SSD and FPGA.	Key Concepts P2P SmartSSD XDMA Keywords XCL_MEM_EXT_P2P_BUFFER pread pwrite
p2p_simple	This is simple example of vector increment to describe P2P between FPGA and NVMe SSD.	Key Concepts P2P NVMe SSD SmartSSD Keywords XCL_MEM_EXT_P2P_BUFFER pread pwrite O_DIRECT O_RDWR
streaming_free_running_k2k	This is simple example which demonstrate how to use and configure a free running kernel.	Key Concepts Free Running Kernel Keywords ap_ctrl_none stream_connect
streaming_k2k_mm	This is a simple kernel to kernel streaming Vector Add and Vector Multiply C Kernel design with 2 memory mapped input to kernel 1, 1 Stream output from kernel 1 to input of kernel 2, 1 memory mapped input to kernel 2, and 1 memory mapped output that demonstrates on how to process a stream of data for computation between two kernels. This design also illustrates how to set FIFO depth for AXIS connections i.e. for the stream connecting the two kernels	Key Concepts Read/Write Stream Create/Release Stream AXIS FIFO depth Keywords stream_connect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

host

host

README.rst

Host Examples

Files

host

Directory actions

More options

Directory actions

More options

Latest commit

History

host

Folders and files

parent directory

README.rst

Host Examples