Skip to content

Latest commit

 

History

History
937 lines (721 loc) · 40.9 KB

OpenCLCOnVulkan.md

File metadata and controls

937 lines (721 loc) · 40.9 KB

OpenCL C 1.2 Language on Vulkan

Overview

The OpenCL C 1.2 language provides an expressive variant of the C language with which to program heterogeneous architectures. There is already a significant body of code in the wild written in the OpenCL C language - both in open source and proprietary software. This document explains how the OpenCL C language is mapped onto an implementation of the Vulkan standard for high-performance graphics and compute.

The following subjects are covered:

  • Which SPIR-V features are used.
  • How the Vulkan API makes use of the Vulkan variant of SPIR-V produced.
  • How OpenCL C language constructs are mapped down onto Vulkan's variant of SPIR-V.
  • Restrictions on the OpenCL C language as is to be consumed by a Vulkan implementation.

Deprecation Notice

Jan 2023:

Transparent pointers are not supported anymore. clspv only supports opaque pointers. If you encounter regressions in behaviour, please open an issue.

Older:

The descriptor map and it's C++ API are deprecated. Clspv is transitioning to embedding equivalent information directly into the SPIR-V binary via a non-semantic instruction set. The old '-descriptormap' option is no longer accepted by clspv. That file can still be generated by running the new 'clspv-reflection' executable. It will produce an equivalent descriptor map.

SPIR-V Features

The SPIR-V as produced from the OpenCL C language can make use of the following additional extensions:

  • SPV_KHR_variable_pointers - to enable the support of more expressive pointers that the OpenCL C language can make use of.
  • SPV_KHR_storage_buffer_storage_class - required by SPV_KHR_variable_pointers, to enable use of the StorageBuffer storage class.
  • SPV_KHR_non_semantic_info - to enable embedding reflection information directly into the binary.

The SPIR-V as produced from the OpenCL C language can make use of the following capabilities:

  • Shader as we are targeting the OpenCL C language at a Vulkan implementation.
  • VariablePointersStorageBuffer, from the SPV_KHR_variable_pointers extension.
  • VariablePointers, from the SPV_KHR_variable_pointers extension.
    • Note: the compiler attempts to add the minimal variable pointers capability required.
  • Int8 if char or uchar types (or composites of them) are used.
  • Int16 if short or ushort types (or composites of them) are used.
  • Int64 if long or ulong types (or composites of them) are used.
  • Float16 if the half type (or composites of it) is used.
    • Note: this requires enabling the cl_khr_fp16 extension in the source.
  • Float64 if the double type (or composites of it) is used.
  • ImageStorageWriteWithoutFormat if write_only images are used.
  • ImageQuery if any image query is used.
  • Image1D if a write_only image is used.
  • Sampled1D if a read_only image is used.
  • GroupNonUniform (see cl_khr_subgroups below)

The command-line switch '-spv-version' can be used to specify the SPIR-V output version. Only '1.0' and '1.3' are currently supported, corresponding with vk versions '1.0' and '1.1' respectively.

Vulkan Interaction

A Vulkan implementation that is to consume the SPIR-V produced from the OpenCL C language must conform to the following the rules:

  • If the short/ushort types are used in the OpenCL C:
    • The shaderInt16 field of VkPhysicalDeviceFeatures must be set to true.
  • If images are used in the OpenCL C:
    • The shaderStorageImageReadWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
    • The shaderStorageImageWriteWithoutFormat field of VkPhysicalDeviceFeatures must be set to true.
  • The implementation must support extensions VK_KHR_storage_buffer_storage_class and VK_KHR_variable_pointers:
    • A call to vkCreateDevice() where the ppEnabledExtensionNames field of VkDeviceCreateInfo contains extension strings "VK_KHR_storage_buffer_storage_class" and "VK_KHR_variable_pointers" must succeed.
  • If the implementation does not support VK_KHR_shader_non_semantic_info then shaders should be stripped of non-semantic instruction before loading the SPIR-V module.

Storage Capabilities

In order to pass 8- or 16-bit types in the shader interface, Vulkan requires the appropriate bits are set in VkPhysicalDevice8BitStorageFeaturesKHR for 8-bit types or VkPhysicalDevice16BitStorageFeaturesKHR (or the appropriate Vulkan 1.1 or 1.2 feature structs). By default clspv assumes all feature bits are enabled, but provides options to disallow 8- or 16-bit interfaces. -no-8bit-storage reflects 8-bit storage features and -no-16bit-storage reflects 16-bit storage features. Each option can take the following values (can be specified in a comma-separated list or multiple times):

  • ssbo: Represents the storage buffer feature bit.
  • ubo: Represents the uniform and storage buffer feature bit.
  • pushconstant: Represents the push constant feature bit.

For example, if your device only supports storageBuffer16BitAccess (and no 8-bit interfaces), pass the following on the command line:

-no-16bit-storage=ubo,pushconstant -no-8bit-storage=ssbo,ubo,pushconstant

Descriptor Type Mappings

OpenCL C kernel argument types are mapped to Vulkan descriptor types in the following way:

  • If the argument to the kernel is a read only image, the matching Vulkan descriptor set type is VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE.
  • If the argument to the kernel is a write only image, the matching Vulkan descriptor set type is VK_DESCRIPTOR_TYPE_STORAGE_IMAGE.
  • If the argument to the kernel is a sampler, the matching Vulkan descriptor set type is VK_DESCRIPTOR_TYPE_SAMPLER.
  • If the argument to the kernel is a constant or global pointer type, the matching Vulkan descriptor set type is VK_DESCRIPTOR_TYPE_STORAGE_BUFFER. If option -constant-args-ubo' is used and the kernel has constant pointer types, set VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER.
  • If the argument to the kernel is a plain-old-data type, the matching Vulkan descriptor set type is VK_DESCRIPTOR_TYPE_STORAGE_BUFFER by default. If the option -pod-ubo is used the descriptor set type isVK_DESCRIPTOR_TYPE_UNIFORM_BUFFER. If the option -pod-pushconstant is used the arg is instead passed via push constants.

Note: -pod-ubo and -pod-pushconstant are exclusive options.

Note: By default, all plain-old-data kernel arguments are collected into a single structure to be passed in to the compute shader as a single resource. If -cluster-pod-kernel-args=0 is specified, each plain-old-data argument will be passed in a via a unique resource.

Note: -pod-pushconstant cannot be specified with -cluster-pod-kernel-args=0.

OpenCL C Modifications

Some OpenCL C language features that are not natively expressible in Vulkan's variant of SPIR-V, require a subtle mapping to how Vulkan SPIR-V represents the corresponding functionality.

Compilation

An additional preprocessor macro VULKAN is set, to allow developers to guard OpenCL C functionality based on whether the Vulkan API is being targeted or not. This value is set to 100, to match Vulkan version 1.0.

Kernels

OpenCL C language kernels take the form:

void kernel foo(global int* a, global float* b, uint c, float2 local* d);

SPIR-V tracks OpenCL C language kernels using OpEntryPoint opcodes that denote the entry-points where an API interacts with a compute kernel.

Vulkan's variant of SPIR-V requires that the entry-points be void return functions, and that they take no arguments. To pass data into Vulkan SPIR-V shaders, OpVariables are declared outside of the functions, and decorated with DescriptorSet and Binding decorations, to denote that the shaders can interact with their data.

The default way to map an OpenCL C language kernel to a Vulkan SPIR-V compute shader is as follows:

  • All literal samplers use descriptor set 0.
  • By default, all kernels in the translation unit use the same descriptor set number, either 0, 1, or 2.
    • Use option -distinct-kernel-descriptor-sets to get the old behaviour, where each kernel is assigned its own descriptor set number, such that the first kernel has descriptor set 0, and each subsequent kernel is an increment of 1 from the previous.
  • Except for pointer-to-local arguments, each kernel argument is assigned a descriptor binding in that kernel's corresponding DescriptorSet.
  • If the argument to the kernel is a global or constant pointer, it is placed into a SPIR-V OpTypeStruct that is decorated with Block, and an OpVariable of this structure type is created and decorated with the corresponding DescriptorSet and Binding, using the StorageBuffer storage class.
  • If the argument to the kernel is a plain-old-data type, it is placed into a SPIR-V OpTypeStruct that is decorated with Block, and an OpVariable of this structure type is created and decorated with the corresponding DescriptorSet and Binding, using the StorageBuffer storage class.
  • If the argument to the kernel is an image or sampler, an OpVariable of the OpTypeImage or OpTypeSampler type is created and decorated with the corresponding DescriptorSet and Binding, using the UniformConstant storage class.
  • If the argument to the kernel is a pointer to type T in __local storage, then no descriptor is generated. Instead, that argument is mapped to a variable in Workgroup storage class, of type array-of-T. The array size is specified by an integer specialization constant. The specialization ID is reported in the descriptor map file, generated via the -descriptormap option.

The shaders produced use the GLSL450 memory model. As such, there is an assumption of no aliasing by default. The compiler does not generate Aliased decorations currently. Users should be aware of this and ensure they are not relying on aliasing.

Embedded Reflection Instructions

Clspv embeds reflection information via use of the NonSemantic.ClspvReflection non-semantic extended instruction set. It requires SPV_KHR_non_semantic_info. If your Vulkan implementation does not support VK_KHR_shader_non_semantic_info, the reflection instructions should be stripped before loading the module. SPIRV-Tools's optimizer has a transformation to strip non-semantic instructions (use the --strip-reflect option).

The reflection instructions replace the descriptor map.

Descriptor map (DEPRECATED)

The compiler can report the descriptor set and bindings used for literal samplers and for the kernel arguments, and also array sizing information for pointer-to-local arguments. Run clspv-reflection <spirv> -o <descriptor map> to produce the descriptor map.

The descriptor map is a text file with comma-separated values.

Consider this example:

// First kernel in the translation unit.
kernel void foo(global int* a, float f, global float* b, uint c) {...}

It generates the following descriptor map:

kernel_decl,foo
kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod,argSize,4
kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,2,offset,0,argKind,buffer
kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,3,offset,0,argKind,pod,argSize,4
spec_constant,workgroup_size_x,spec_id,0
spec_constant,workgroup_size_y,spec_id,1
spec_constant,workgroup_size_z,spec_id,2

The kernels in the module module are declared as follows:

  • kernel_decl
  • kernel name

For kernel arguments of types pointer-to-global, pointer-to-constant, and plain-old-data types, the fields are:

  • kernel to indicate a kernel argument
  • kernel name
  • arg to indicate a kernel argument
  • argument name
  • argOrdinal to indicate a kernel argument ordinal position field
  • the argument's 0-based position in the kernel's parameter list
  • descriptorSet
  • the DescriptorSet value
  • binding
  • the Binding value
  • offset
  • The byte offset inside the storage buffer where you should write the argument value. This will always be zero, unless you cluster plain-old-data kernel arguments. (See below.)
  • argKind
  • a string describing the kind of argument, one of:
    • buffer - OpenCL buffer
    • buffer_ubo - OpenCL constant buffer. Sent in a uniform buffer.
    • pod - Plain Old Data, e.g. a scalar, vector, or structure. Sent in a storage buffer.
    • pod_ubo - Plain Old Data, e.g. a scalar, vector, or structure. Sent in a uniform buffer.
    • pod_pushconstant - Plain Old Data, e.g. a scalar, vector or structure. Sent in push constants.
    • ro_image - Read-only image
    • wo_image - Write-only image
    • sampler - Sampler
  • argSize
  • only present for plain-old-data kernel arguments.

Module-wide specialization constants are specified as follows:

  • spec_constant to describe the use of a module-wide specialization constant
  • specialization constant type (e.g x dimension of WorkgroupSize)
  • spec_id
  • the SpecId value

Consider this example, which uses pointer-to-local arguments:

kernel void foo(local float* L, global float* A, local float4 *L2) {...}

It generates the following descriptor map:

kernel_decl,foo
kernel,foo,arg,L,argOrdinal,0,argKind,local,arrayElemSize,4,arrayNumElemSpecId,3
kernel,foo,arg,A,argOrdinal,1,descriptorSet,0,binding,0,offset,0,argKind,buffer
kernel,foo,arg,L2,argOrdinal,2,argKind,local,arrayElemSize,16,arrayNumElemSpecId,4
spec_constant,workgroup_size_x,spec_id,0
spec_constant,workgroup_size_y,spec_id,1
spec_constant,workgroup_size_z,spec_id,2

The kernels in the module module are declared as follows:

  • kernel_decl
  • kernel name

For kernel arguments of type pointer-to-local, the fields are:

  • kernel to indicate a kernel argument
  • kernel name
  • arg to indicate a kernel argument
  • argument name
  • argOrdinal to indicate a kernel argument ordinal position field
  • the argument's 0-based position in the kernel's parameter list
  • argKind
  • local to indicate a pointer-to-local argument
  • arrayElemSize
  • the number of bytes in each element of the array
  • arrayNumElemSpecId
  • the specialization constant ID used to specify the number of elements to allocate for the array in Workgroup storage. Specifically, it is the SpecId decoration on the integer constant that specficies the array size. (This number is always at least 3 so that specialization IDs 0, 1, and 2 can be use for the workgroup size dimensions along x, y, and z.)

Module-wide specialization constants are specified as follows:

  • spec_constant to describe the use of a module-wide specialization constant
  • specialization constant type (e.g x dimension of WorkgroupSize)
  • spec_id
  • the SpecId value

Notes: Each pointer-to-local argument is assigned its own array type and specialization constant to size the array. Unless you override the array size specialization constant at pipeline creation time, the array will only have one element.

Sending in plain-old-data kernel arguments in uniform buffers

Normally plain-old-data arguments are passed into the kernel via a storage buffer. Use option -pod-ubo to pass these parameters in via a uniform buffer. These can be faster to read in the shader.

When option -pod-ubo is used, the descriptor map list the argKind of a plain-old-data argument as pod_ubo rather than the default of pod.

Sending in plain-old-data kernel arguments in push constants

Normally plain-old-data arguments are passed into the kernel via a storage buffer. Use the option -pod-pushconstant to pass these parameters in via push constants. The option -cluster-pod-kernel-args=0 cannot be specified. Push constants are intended to provide a fast read path and should be faster to access than a buffer.

When the option -pod-pushconstant is used, the descriptor map lists the argKind of plain-old-data arguments as pod_pushconstant rather than the default of pod. There is no descriptorset or binding information for push constants.

Note: Vulkan implementations have limited push constant storage (default is 128B). clspv provides the option -max-pushconstant-size to specify (in bytes) the implementation limit for push constants. This is validated at the start of the compile.

Note: clspv also stores image metadata (to perform functions like get_image_channel_order) into pod_pushconstant. This is done late in the compilation flow, thus if those added bytes exceeds the limit for push constants, it will not be detected and will create an undefined behavior (most probably a crash in the runtime trying to push more than what is possible).

Note: Module scope push constanst are currently incompatible with plain-old-data arguments sent as push constants.

TODO(alan-baker): See #529 for the overall plan to address this.

The descriptor map entry for kernel arguments will not contain descriptorSet or binding entries. For example, an integer arg f to kernel foo might look like:

kernel,foo,arg,f,argOrdinal,1,offset,0,argKind,pod_pushconstant,argSize,4

Sending in pointer-to-constant kernel arguments in uniform buffers

Normally pointer-to-constant kernel arguments are passed into the kernel via a storage buffer. Use option -constant-args-ubo to pass these parameters in via a uniform buffer. Uniform buffers can be faster to read in the shader.

The compiler will generate an error if the layout of the buffer does not satisfy the Standard Uniform Buffer Layout rules of the Vulkan specification (see section 15.5.4).

Clustering plain-old-data kernel arguments to save descriptors

Descriptors can be scarce. So the compiler also has an option -cluster-pod-kernel-args which can be used to reduce the number of descriptors. When the option is used:

  • All plain-old-data (POD) kernel arguments are collected into a single struct and passed into the compute shader via a single storage buffer resource.
  • The binding numbers are assigned as previously, except:
    • Binding numbers for non-POD arguments are assigned as if there were no POD arguments.
    • The binding number for the struct containing the POD arguments is one more than the highest non-POD argument.

By default this option is enabled. To disable this behavior, pass -cluster-pod-kernel-args=0 to the compiler.

Example descriptor set mapping

For example:

// First kernel in the translation unit.
void kernel foo(global int* a, float f, global float* b, uint c);

In the default case, the bindings are:

  • a is mapped to a storage buffer with descriptor set 0, binding 0
  • b is mapped to a storage buffer with descriptor set 0, binding 1
  • f and c are POD arguments, so they are mapped to the first and second members of a struct, and that struct is mapped to a storage buffer with descriptor set 0 and binding 2

If -cluster-pod-kernel-args=0 is used:

  • a is mapped to a storage buffer with descriptor set 0, binding 0
  • f is mapped to a storage buffer with descriptor set 0, binding 1
  • b is mapped to a storage buffer with descriptor set 0, binding 2
  • c is mapped to a storage buffer with descriptor set 0, binding 3

That is, compiling as follows:

clspv foo.cl -descriptormap=myclusteredmap

will produce the following in myclusteredmap:

kernel,foo,arg,a,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
kernel,foo,arg,b,argOrdinal,2,descriptorSet,0,binding,1,offset,0,argKind,buffer
kernel,foo,arg,f,argOrdinal,1,descriptorSet,0,binding,2,offset,0,argKind,pod,argSize,4
kernel,foo,arg,c,argOrdinal,3,descriptorSet,0,binding,2,offset,4,argKind,pod,argSize,4

If foo were the second kernel in the translation unit, then its arguments would also use descriptor set 0. If foo were the second kernel in the translation unit and option -distinct-kernel-descriptor-sets is used, then its arguments would use descriptor set 1.

TODO(dneto): Give an example using images.

Module-wide Specialization Constants

The descriptor map includes entries for specialization constants that are used module-wide. The following specialization constants are currently generated:

  • workgroup\_size\_x: The x-dimension of workgroup size.
  • workgroup\_size\_y: The y-dimension of workgroup size.
  • workgroup\_size\_z: The z-dimension of workgroup size.
  • work_dim: The work dimensions.
  • global\_offset\_x: The x-dimension of global offset.
  • global\_offset\_y: The y-dimension of global offset.
  • global\_offset\_z: The z-dimension of global offset.

Module scope constants

By default, each module-scope variable in __constant address space is mapped to a SPIR-V variable in Private address space, with an intializer. This works only for simple scenarios, where:

  • The variable is small, so it's reasonable to fit in a single invocations private registers, and
  • The variable is only read, and in particular its address is not taken.

In more general cases, use compiler option -module-constants-in-storage-buffer. In this case:

  • All module-scope constants are collected into a single SPIR-V storage buffer variable in its own descriptor set.
  • The intialization data are written to the descriptor map, and the host program must fill the buffer with that data before the kernel executes.

Consider this example kernel a.cl:

typedef struct {
  char c;
  uint a;
  float f;
} Foo;
__constant Foo ppp[3] = {{'a', 0x1234abcd, 1.0}, {'b', 0xffffffff, 1.5}, {0}};

kernel void foo(global uint* A, uint i) { *A = ppp[i].a; }

Compiling as follows:

clspv a.cl -descriptormap=map -module-constants-in-storage-buffer

Produces the following in file map:

constant,descriptorSet,1,binding,0,hexbytes,61000000cdab34120000803f62000000ffffffff0000c03f000000000000000000000000
kernel,foo,arg,A,argOrdinal,0,descriptorSet,0,binding,0,offset,0,argKind,buffer
kernel,foo,arg,i,argOrdinal,1,descriptorSet,0,binding,1,offset,0,argKind,pod

The initialization data are in the line starting with constant, and its fields are:

  • constant to indicate constant initialization data
  • descriptorSet
  • the DescriptorSet value
  • binding
  • the Binding value
  • kind
  • buffer to indicate the use of a storage buffer
  • hexbytes to indicate the next field is the data, as a sequence of bytes in hexadecimal
  • a sequence of bytes expressed in hexadecimal notation, presented in order from lowest address to highest address.

Take a closer look at the hexadecimal bytes in the example. They are:

  • 61: ASCII character 'a'
  • 000000: zero padding to satisfy alignment for the 32-bit integer value that follows
  • cdab3412: the integer value 0x1234abcd in little-endian format
  • 0000803f: the float value 1.0
  • 62: ASCII character 'b'
  • 000000: zero padding to satisfy alignment for the 32-bit integer value that follows
  • ffffffff: the integer value 0xffffffff
  • 0000c03f: the float value 1.5
  • 000000000000000000000000: 12 zero bytes representing the zero-initialized third Foo value.

Module Scope Push constants

Some features, when enabled, require values to be passed by the application via push constants. For each value requested by clspv, an entry will be present in the descriptor map. The format is:

  • pushconstant to indicate the entry is a push constant
  • name to indicate the name of the requested push constant follows
  • the name of the push constant requested
  • offset to indicate the offset at which the push constant is expected follows
  • the offset within push constants at which the entry is expected
  • size to indicate that the total size follows
  • the total size of the push constant that will be used.

Here is a list of the push constants currently supported:

  • global_offset: the 3D global offset used by get_global_offset() and in global ID calculations. A vector of 3 32-bit integer values. Lower dimensions come first in memory.
  • enqueued_local_size: the 3D local work size returned by get_enqueued_local_size(). A vector of 3 32-bit integer values. Lower dimensions come first in memory.
  • global_size: the 3D global size of the NDRange as returned by get_global_size(). A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.
  • num_workgroups: the 3D number of work groups in the NDRange as returned by get_num_groups(). A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.
  • region_offset: the sum of the global ID offset into the NDRange for this uniform region and the global offset of the NDRange. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.
  • region_group_offset: the 3D group ID offset into the NDRange for this region. A vector of 3 32-bit integer values. Lower dimensions come first in memory. Only required when non-uniform NDRanges are supported.

Attributes

The following attributes are ignored in the OpenCL C source, and thus have no functional impact on the produced SPIR-V:

  • __attribute__((work_group_size_hint(X, Y, Z)))
  • __attribute__ ((endian(host)))
  • __attribute__ ((endian(device)))
  • __attribute__((vec_type_hint(<typen>)))

The __attribute__((reqd_work_group_size(X, Y, Z))) kernel attribute specifies the work-group size that must be used with that kernel.

Work-Group Size

The OpenCL C language allows the work-group size to be set just before executing the kernel on the device, at clEnqueueNDRangeKernel() time. Vulkan requires that the work-group size be specified no later than when the VkPipeline is created, which in OpenCL terms corresponds to when the cl_kernel is created.

To allow for the maximum flexibility to developers who are used to specifying the work-group size in the host API and not in the device-side kernel language, we can use specialization constants to allow for setting the work-group size at VkPipeline creation time.

If all kernels in the OpenCL C source use the reqd_work_group_size attribute, then that attribute will specify the work-group size that must be used and the required values will be set in the SPIR-V via the LocalSize execution mode.

If at least one of the kernels does not use the reqd_work_group_size attribute, the Vulkan SPIR-V produced by the compiler will contain specialization constants as follows:

  • The x dimension of the work-group size is stored in a specialization constant that is decorated with the SpecId, whose value defaults to 1.
  • The y dimension of the work-group size is stored in a specialization constant that is decorated with the SpecId, whose value defaults to 1.
  • The z dimension of the work-group size is stored in a specialization constant that is decorated with the SpecId, whose value defaults to 1.

In either case, metadata that can be used by the runtime or host code is recorded in the generated SPIR-V module, using either PropertyRequiredWorkgroupSize and/or SpecConstantWorkgroupSize non-semantic instructions.

Types

Signed Integer Types

Signed integer types are mapped down onto their unsigned equivalents in SPIR-V as produced from OpenCL C.

Packed Structs

Packed structs are not normally mapped to Vulkan shaders. If the -rewrite-packed-structs option is passed, packed structs will be transformed to another struct that holds one member which is an array of chars of the same size of the original packed struct, so that we preserve the struct size and layout on compilation. eg: <{ int, char }> -> <{ [5 x char] }>.

OpenCL C Built-In Functions

OpenCL C language built-in functions are mapped, where possible, onto their GLSL 4.5 built-in equivalents. For example, the OpenCL C language built-in function tan() is mapped onto GLSL's built-in function tan().

Common Functions

The OpenCL C built-in sign() function does not differentiate between a signed and unsigned 0.0 input value, nor does it return 0.0 if the input value is a NaN.

Integer Functions

The OpenCL C built-in mad24() and mul24() functions do not perform their operations using 24-bit integers. Instead, they use 32-bit integers, and thus have no performance-improving characteristics over normal 32-bit integer arithmetic.

Work-Item Functions

The OpenCL C work-item functions map to Vulkan SPIR-V as follows:

  • get_work_dim() is mapped to the work_dim spec constant when -work-dim is enabled (on by default), otherwise it always returns 3.
  • get_global_size() is implemented by multiplying the result from get_local_size() by the result from get_num_groups().
  • get_global_id() is mapped to a SPIR-V variable decorated with GlobalInvocationId. The global offset is added to that variable when -global-offset or -global-offset-push-constant is enabled.
  • get_local_size() is mapped to a SPIR-V variable decorated with WorkgroupSize.
  • get_local_id() is mapped to a SPIR-V variable decorated with LocalInvocationId.
  • get_num_groups() is mapped to a SPIR-V variable decorated with NumWorkgroups.
  • get_group_id() is mapped to a SPIR-V variable decorated with WorkgroupId.
  • get_global_offset() is mapped to the global_offset spec constant or push constant when -global-offset or -global-offset-push-constant is enabled, otherwise it always returns 0. Spec constants are used unless -global-offset-push-constant is specified or the language is set to OpenCL C++ or OpenCL 2.0.

OpenCL C Restrictions

Some OpenCL C language features that have no expressible equivalents in Vulkan's variant of SPIR-V are restricted.

Kernels

OpenCL C language kernels must not be called from other kernels.

Pointers of type half must not be used as kernel arguments.

Types

Boolean

Booleans are an abstract type - they have no known compile-time size. Using a boolean type as the argument to the sizeof() operator will result in an undefined value. The boolean type must not be used to form global, or constant variables, nor be used within a struct or union type in the global, or constant address spaces.

8-Bit Types

The char, char2, char3, uchar, uchar2, and uchar3 types can be used. To disable general support for these types, use -int8=0.

64-Bit Types

The double, double2, double3 and double4 types must not be used.

Events

The event_t type must not be used.

Pointers

Pointers are an abstract type - they have no known compile-time size. Using a pointer type as the argument to the sizeof() operator will result in an undefined value.

Pointer-to-integer casts must not be used.

Integer-to-pointer casts must not be used.

Pointers must not be compared for equality or inequality.

Recursive Struct Types

Recursively defined struct types must not be used.

Pointer-Sized Types

Since pointers have no known compile-time size, the pointer-sized types size_t, ptrdiff_t, uintptr_t, and intptr_t do not represent types that are the same size as a pointer. Instead, those types are mapped to 32-bit integer types.

Built-In Functions

For any OpenCL C language built-in functions that are mapped onto their GLSL 4.5 built-in equivalents, the precision requirements of the OpenCL C language built-ins are not necessarily honoured. In general, Clspv authors expect implementations will satisfy the relaxed precision requirements described in the OpenCL C specification. This means kernels will operate as if compiled with --cl-fast-relaxed-math. For higher performance (lower accuracy) variants of some builtin functions, clspv also provides the --cl-native-math option. This option goes beyond fast-relaxed math and provides no precision guarantees (similar to the native_ functions in OpenCL).

Atomic Functions

The atomic_xchg() built-in function that takes a floating-point argument must not be used.

OpenCL 2.0 Atomic Functions

The OpenCL 2.0 atomic functions are supported with the following exceptions:

  • atomic_flag functions are not supported
  • atomic initialization functions and macros are not supported
  • memory_order_seq_cst is weakened to acquire for loads, release for stores and acquire release for read-modify-write operations
  • memory_scope_all_svm_devices and memory_scope_all_devices are not supported
  • atomic_compare_exchange_weak* is implemented as atomic_compare_exchange_strong*
  • Due to Vulkan restrictions, only 32-bit integer types are currently supported

Conversions

All supported.

Math Functions

All supported.

Integer Functions

All supported.

Relational Functions

All supported.

Vector Data Load and Store Functions

All supported.

Note: When 16-bit storage support is not assumed, both vload_half and vstore_half assume the pointers are aligned to 4 bytes, not 2 bytes. See issue 6.

Miscellaneous Vector Functions

The shuffle(), shuffle2() and vec_step() built-in functions are supported.

Printf

The printf() built-in function is supported when the -enable-printf flag is passed to clspv.

The printf() built-in function is supported by adding a program-scope printf buffer, in the form of a storage buffer, to the module when one or more kernels in a program may use printf.

The printf buffer is notionally a buffer of 32-bit unsigned integers. The first integer value represents the offset from the second integer value to the next free location in the buffer. This value is atomically incremented by the printf implementation to allocate regions of the buffer.

The values written to the allocated region by the printf implementation, in the order they are written, are:

  • The unique ID associated with the given printf call site.
  • The value of zero or more arguments as they were passed subsequent to the printf format string. The stored values of the arguments follow the rules described below.

Printf argument values

  • Arguments that are not unsigned integers are bit-cast to integers and stored as their bit representation.
  • Values must have a size that is a multiple of 4 bytes. Arguments that do not meet this requirement after the usual promotion rules are zero-extended.
  • Arguments larger than 4 bytes are stored across multiple 4 byte values.
  • String literal arguments (that appear after the usual format string) are handled specially. They are treated like printf format strings and given a program-wide unique ID. This ID is stored instead of the actual value of the string literal.

Printf reflection data

The NonSemantic.ClspvReflection instructions are used to provide the necessary information for a runtime to interpret the values in the printf buffer. The two relevant instructions describe:

  • The size, descriptor set and binding of the program-scope printf buffer.
  • Each unique printf ID, the associated string (which is either a format string or string literal argument), and the storage size of each associated argument. For printf IDs representing string literal arguments, there are always zero associated arguments.

Image Read and Write Functions

All supported.

Reading 3D images with a unormalized sampler is not allowed in Vulkan. Thus OpenCL source code performing this operation is emulated by normalizing the coordinate before reading the image using a normalized sampler.

The normalization of the coordinate can introduce an accuracy issue as the floating point division used is not correctly rounded.

cl_khr_subgroups extension

The OpenCL extension cl_khr_subgroups requires SPIR-V 1.3 or greater and translates the built-in functions to GroupNonUniform operations and Builtin constants as follows:

  • get_sub_group_size() is mapped to BuiltInSubgroupSize constant. Requires CapabilityGroupNonUniform capability.
  • get_num_sub_groups() is mapped to BuiltInNumSubgroups constant. Requires CapabilityGroupNonUniform capability.
  • get_sub_group_id() is mapped to BuiltInSubgroupId constant. Requires CapabilityGroupNonUniform capability.
  • get_sub_group_local_id() is mapped to BuiltInSubgroupLocalInvocationId constant. Requires CapabilityGroupNonUniform capability.
  • sub_group_broadcast() is mapped to OpGroupNonUniformBroadcast operation. Requires CapabilityGroupNonUniformBallot capability. For SPIR-V version < 1.5 a constant laneId is required.
  • sub_group_all() is mapped to OpGroupNonUniformAll operation. Requires CapabilityGroupNonUniformVote capability.
  • sub_group_any() is mapped to OpGroupNonUniformAny operation. Requires CapabilityGroupNonUniformVote capability.
  • sub_group_<group_op>_add() is mapped to OpGroupNonUniformIAdd operation for Integer types and OpGroupNonUniformFAdd operation for Float types. Requires CapabilityGroupNonUniformArithmetic capability.
  • sub_group_<group_op>_min() is mapped to OpGroupNonUniformSMin operation for Signed-Integer types, OpGroupNonUniformUMin operation for Unsigned-Integer types and OpGroupNonUniformFMin operation for Float types. Requires CapabilityGroupNonUniformArithmetic capability.
  • sub_group_<group_op>_max() is mapped to OpGroupNonUniformSMax operation for Signed-Integer types, OpGroupNonUniformUMax operation for Unsigned-Integer types and OpGroupNonUniformFMax operation for Float types. Requires CapabilityGroupNonUniformArithmetic capability.
  • sub_group_barrier() is mapped to OpControlBarrier with an execution scope of Subgroup. If no memory scope is specified, Subgroup is used. The memory semantics depend on the flags on the barrier.
  • get_max_sub_group_size() is mapped to a specilization constant whose value must be set by the runtime or application.
  • get_enqueued_num_sub_groups() computes its return value from that of get_max_sub_group_size() and get_enqueued_local_size().

The group_op qualifier translates as follows:

  • reduce maps to GroupOperationReduce.
  • scan_exclusive maps to GroupOperationExclusiveScan.
  • scan_inclusive maps to GroupOperationInclusiveScan.

These extension built-in functions are not supported:

  • sub_group_reserve_read_pipe()
  • sub_group_reserve_write_pipe()
  • sub_group_commit_read_pipe()
  • sub_group_commit_write_pipe()
  • get_kernel_sub_group_count_for_ndrange()
  • get_kernel_max_sub_group_size_for_ndrange()

Numerical Compliance

Clspv is not able to reach full accuracy requirements on the supported builtin functions in all cases. Instead, currently, it is able to meet the requirements tested by OpenCL CTS under relaxed precision requirements. In order to achieve this goal, some functions are implemented using the GLSL.std.450 extended or core instructions, some are implemented in the builtin library and some are emulated by the compiler.

Note: Clspv has been tested against five Vulkan implementations from different vendors and is able to achieve the relaxed accuracy requirements broadly.

Implementations Using Core and Extended Instructions

add, subtract, divide, multiply, assignment, not, acos, asin, ceil, cos, cosh, exp, exp2, exp10, fabs, floor, fmax, fmin, log, log2, log10, mad, pow, powr, rint, rsqrt, sin, sinh, tan, trunc, half_cos, half_exp, half_exp2, half_exp10, half_log, half_log2, half_log10, half_powr, half_rsqrt, half_sin, half_tan, clamp, degrees, mix, radians, sign, smoothstep, step, cross, dot, normalize, fast_distance, fast_length, fast_normalize.

Implementations Using Builtin Library

acosh, asinh, atanh, cbrt, erfc, erf, fma, fmod, fract, frexp, hypot, ilogb, ldexp, lgamma, lgamma_r, logb, maxmag, modf, nan, nextafter, remainder, remquo, rootn, sqrt, tanh, tgamma, half_divide, half_recip, half_sqrt, distance, length, atan, atan2pi, atanpi, atan2.

Note: fma has a very high runtime cost unless compiling with -cl-native-math. Its accuracy requirements are not relaxed by -cl-fast-relaxed-math and the library implementation emulates it using integers.

Note: acosh, asinh, atanh, atan, atan2, atanpi and atan2pi, fma, fmod, fract, frexp, ldexp, rsqrt, half_sqrt, sqrt, tanh, distance, and length are all implemented using core or extended instructions when compiling with -cl-native-math.

Implementations Using Emulation

acospi, asinpi, copysign, cospi, expm1, fdim, log1p, pown, round, sincos, sinpi, tanpi.

Known Conformance Issues

  • frexp fails using Swiftshader as an implementation due to an internal error.
  • ldexp and rsqrt fail to meet accuracy requirements on some Adreno GPUs.