8. Shaders

A shader specifies programmable operations that execute for each vertex, control point, tessellated vertex, primitive, fragment, or workgroup in the corresponding stage(s) of the graphics and compute pipelines.

Graphics pipelines include vertex shader execution as a result of primitive assembly, followed, if enabled, by tessellation control and evaluation shaders operating on patches, geometry shaders, if enabled, operating on primitives, and fragment shaders, if present, operating on fragments generated by Rasterization. In this specification, vertex, tessellation control, tessellation evaluation and geometry shaders are collectively referred to as vertex processing stages and occur in the logical pipeline before rasterization. The fragment shader occurs logically after rasterization.

Only the compute shader stage is included in a compute pipeline. Compute shaders operate on compute invocations in a workgroup.

Shaders can read from input variables, and read from and write to output variables. Input and output variables can be used to transfer data between shader stages, or to allow the shader to interact with values that exist in the execution environment. Similarly, the execution environment provides constants that describe capabilities.

Shader variables are associated with execution environment-provided inputs and outputs using built-in decorations in the shader. The available decorations for each stage are documented in the following subsections.

8.1. Shader Modules

Shader modules contain shader code and one or more entry points. Shaders are selected from a shader module by specifying an entry point as part of pipeline creation. The stages of a pipeline can use shaders that come from different modules. The shader code defining a shader module must be in the SPIR-V format, as described by the Vulkan Environment for SPIR-V appendix.

Shader modules are represented by VkShaderModule handles:

// Provided by VK_VERSION_1_0
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkShaderModule)

To create a shader module, call:

// Provided by VK_VERSION_1_0
VkResult vkCreateShaderModule(
    VkDevice                                    device,
    const VkShaderModuleCreateInfo*             pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkShaderModule*                             pShaderModule);
  • device is the logical device that creates the shader module.

  • pCreateInfo is a pointer to a VkShaderModuleCreateInfo structure.

  • pAllocator controls host memory allocation as described in the Memory Allocation chapter.

  • pShaderModule is a pointer to a VkShaderModule handle in which the resulting shader module object is returned.

Once a shader module has been created, any entry points it contains can be used in pipeline shader stages as described in Compute Pipelines and Graphics Pipelines.

If the shader stage fails to compile VK_ERROR_INVALID_SHADER_NV will be generated and the compile log will be reported back to the application by VK_EXT_debug_report if enabled.

Valid Usage (Implicit)
Return Codes
Success
  • VK_SUCCESS

Failure
  • VK_ERROR_OUT_OF_HOST_MEMORY

  • VK_ERROR_OUT_OF_DEVICE_MEMORY

  • VK_ERROR_INVALID_SHADER_NV

The VkShaderModuleCreateInfo structure is defined as:

// Provided by VK_VERSION_1_0
typedef struct VkShaderModuleCreateInfo {
    VkStructureType              sType;
    const void*                  pNext;
    VkShaderModuleCreateFlags    flags;
    size_t                       codeSize;
    const uint32_t*              pCode;
} VkShaderModuleCreateInfo;
  • sType is the type of this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • flags is reserved for future use.

  • codeSize is the size, in bytes, of the code pointed to by pCode.

  • pCode is a pointer to code that is used to create the shader module. The type and format of the code is determined from the content of the memory addressed by pCode.

Valid Usage
  • codeSize must be greater than 0

  • If pCode is a pointer to SPIR-V code, codeSize must be a multiple of 4

  • pCode must point to either valid SPIR-V code, formatted and packed as described by the Khronos SPIR-V Specification or valid GLSL code which must be written to the GL_KHR_vulkan_glsl extension specification

  • If pCode is a pointer to SPIR-V code, that code must adhere to the validation rules described by the Validation Rules within a Module section of the SPIR-V Environment appendix

  • If pCode is a pointer to GLSL code, it must be valid GLSL code written to the GL_KHR_vulkan_glsl GLSL extension specification

  • pCode must declare the Shader capability for SPIR-V code

  • pCode must not declare any capability that is not supported by the API, as described by the Capabilities section of the SPIR-V Environment appendix

  • If pCode declares any of the capabilities listed in the SPIR-V Environment appendix, one of the corresponding requirements must be satisfied

Valid Usage (Implicit)
  • sType must be VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO

  • pNext must be NULL or a pointer to a valid instance of VkShaderModuleValidationCacheCreateInfoEXT

  • The sType value of each struct in the pNext chain must be unique

  • flags must be 0

  • pCode must be a valid pointer to an array of uint32_t values

// Provided by VK_VERSION_1_0
typedef VkFlags VkShaderModuleCreateFlags;

VkShaderModuleCreateFlags is a bitmask type for setting a mask, but is currently reserved for future use.

To use a VkValidationCacheEXT to cache shader validation results, add a VkShaderModuleValidationCacheCreateInfoEXT structure to the pNext chain of the VkShaderModuleCreateInfo structure, specifying the cache object to use.

The VkShaderModuleValidationCacheCreateInfoEXT struct is defined as:

// Provided by VK_EXT_validation_cache
typedef struct VkShaderModuleValidationCacheCreateInfoEXT {
    VkStructureType         sType;
    const void*             pNext;
    VkValidationCacheEXT    validationCache;
} VkShaderModuleValidationCacheCreateInfoEXT;
  • sType is the type of this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • validationCache is the validation cache object from which the results of prior validation attempts will be written, and to which new validation results for this VkShaderModule will be written (if not already present).

Valid Usage (Implicit)
  • sType must be VK_STRUCTURE_TYPE_SHADER_MODULE_VALIDATION_CACHE_CREATE_INFO_EXT

  • validationCache must be a valid VkValidationCacheEXT handle

To destroy a shader module, call:

// Provided by VK_VERSION_1_0
void vkDestroyShaderModule(
    VkDevice                                    device,
    VkShaderModule                              shaderModule,
    const VkAllocationCallbacks*                pAllocator);
  • device is the logical device that destroys the shader module.

  • shaderModule is the handle of the shader module to destroy.

  • pAllocator controls host memory allocation as described in the Memory Allocation chapter.

A shader module can be destroyed while pipelines created using its shaders are still in use.

Valid Usage
  • If VkAllocationCallbacks were provided when shaderModule was created, a compatible set of callbacks must be provided here

  • If no VkAllocationCallbacks were provided when shaderModule was created, pAllocator must be NULL

Valid Usage (Implicit)
  • device must be a valid VkDevice handle

  • If shaderModule is not VK_NULL_HANDLE, shaderModule must be a valid VkShaderModule handle

  • If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure

  • If shaderModule is a valid handle, it must have been created, allocated, or retrieved from device

Host Synchronization
  • Host access to shaderModule must be externally synchronized

8.2. Shader Execution

At each stage of the pipeline, multiple invocations of a shader may execute simultaneously. Further, invocations of a single shader produced as the result of different commands may execute simultaneously. The relative execution order of invocations of the same shader type is undefined. Shader invocations may complete in a different order than that in which the primitives they originated from were drawn or dispatched by the application. However, fragment shader outputs are written to attachments in rasterization order.

The relative execution order of invocations of different shader types is largely undefined. However, when invoking a shader whose inputs are generated from a previous pipeline stage, the shader invocations from the previous stage are guaranteed to have executed far enough to generate input values for all required inputs.

8.3. Shader Memory Access Ordering

The order in which image or buffer memory is read or written by shaders is largely undefined. For some shader types (vertex, tessellation evaluation, and in some cases, fragment), even the number of shader invocations that may perform loads and stores is undefined.

In particular, the following rules apply:

  • Vertex and tessellation evaluation shaders will be invoked at least once for each unique vertex, as defined in those sections.

  • Fragment shaders will be invoked zero or more times, as defined in that section.

  • The relative execution order of invocations of the same shader type is undefined. A store issued by a shader when working on primitive B might complete prior to a store for primitive A, even if primitive A is specified prior to primitive B. This applies even to fragment shaders; while fragment shader outputs are always written to the framebuffer in rasterization order, stores executed by fragment shader invocations are not.

  • The relative execution order of invocations of different shader types is largely undefined.

Note

The above limitations on shader invocation order make some forms of synchronization between shader invocations within a single set of primitives unimplementable. For example, having one invocation poll memory written by another invocation assumes that the other invocation has been launched and will complete its writes in finite time.

The Memory Model appendix defines the terminology and rules for how to correctly communicate between shader invocations, such as when a write is Visible-To a read, and what constitutes a Data Race.

Applications must not cause a data race.

8.4. Shader Inputs and Outputs

Data is passed into and out of shaders using variables with input or output storage class, respectively. User-defined inputs and outputs are connected between stages by matching their Location decorations. Additionally, data can be provided by or communicated to special functions provided by the execution environment using BuiltIn decorations.

In many cases, the same BuiltIn decoration can be used in multiple shader stages with similar meaning. The specific behavior of variables decorated as BuiltIn is documented in the following sections.

8.5. Task Shaders

Task shaders operate in conjunction with the mesh shaders to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Its primary purpose is to create a variable amount of subsequent mesh shader invocations.

Task shaders are invoked via the execution of the programmable mesh shading pipeline.

The task shader has no fixed-function inputs other than variables identifying the specific workgroup and invocation. The only fixed output of the task shader is a task count, identifying the number of mesh shader workgroups to create. The task shader can write additional outputs to task memory, which can be read by all of the mesh shader workgroups it created.

8.5.1. Task Shader Execution

Task workloads are formed from groups of work items called workgroups and processed by the task shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Task shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

8.6. Mesh Shaders

Mesh shaders operate in workgroups to produce a collection of primitives that will be processed by subsequent stages of the graphics pipeline. Each workgroup emits zero or more output primitives and the group of vertices and their associated data required for each output primitive.

Mesh shaders are invoked via the execution of the programmable mesh shading pipeline.

The only inputs available to the mesh shader are variables identifying the specific workgroup and invocation and, if applicable, any outputs written to task memory by the task shader that spawned the mesh shader’s workgroup. The mesh shader can operate without a task shader as well.

The invocations of the mesh shader workgroup write an output mesh, comprising a set of primitives with per-primitive attributes, a set of vertices with per-vertex attributes, and an array of indices identifying the mesh vertices that belong to each primitive. The primitives of this mesh are then processed by subsequent graphics pipeline stages, where the outputs of the mesh shader form an interface with the fragment shader.

8.6.1. Mesh Shader Execution

Mesh workloads are formed from groups of work items called workgroups and processed by the mesh shader in the current graphics pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Mesh shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

The global workgroups may be generated explcitly via the API, or implicitly through the task shader’s work creation mechanism.

8.7. Vertex Shaders

Each vertex shader invocation operates on one vertex and its associated vertex attribute data, and outputs one vertex and associated data. Graphics pipelines using primitive shading must include a vertex shader, and the vertex shader stage is always the first shader stage in the graphics pipeline.

8.7.1. Vertex Shader Execution

A vertex shader must be executed at least once for each vertex specified by a draw command. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view. During execution, the shader is presented with the index of the vertex and instance for which it has been invoked. Input variables declared in the vertex shader are filled by the implementation with the values of vertex attributes associated with the invocation being executed.

If the same vertex is specified multiple times in a draw command (e.g. by including the same index value multiple times in an index buffer) the implementation may reuse the results of vertex shading if it can statically determine that the vertex shader invocations will produce identical results.

Note

It is implementation-dependent when and if results of vertex shading are reused, and thus how many times the vertex shader will be executed. This is true also if the vertex shader contains stores or atomic operations (see vertexPipelineStoresAndAtomics).

8.8. Tessellation Control Shaders

The tessellation control shader is used to read an input patch provided by the application and to produce an output patch. Each tessellation control shader invocation operates on an input patch (after all control points in the patch are processed by a vertex shader) and its associated data, and outputs a single control point of the output patch and its associated data, and can also output additional per-patch data. The input patch is sized according to the patchControlPoints member of VkPipelineTessellationStateCreateInfo, as part of input assembly. The size of the output patch is controlled by the OpExecutionMode OutputVertices specified in the tessellation control or tessellation evaluation shaders, which must be specified in at least one of the shaders. The size of the input and output patches must each be greater than zero and less than or equal to VkPhysicalDeviceLimits::maxTessellationPatchSize.

8.8.1. Tessellation Control Shader Execution

A tessellation control shader is invoked at least once for each output vertex in a patch. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

Inputs to the tessellation control shader are generated by the vertex shader. Each invocation of the tessellation control shader can read the attributes of any incoming vertices and their associated data. The invocations corresponding to a given patch execute logically in parallel, with undefined relative execution order. However, the OpControlBarrier instruction can be used to provide limited control of the execution order by synchronizing invocations within a patch, effectively dividing tessellation control shader execution into a set of phases. Tessellation control shaders will read undefined values if one invocation reads a per-vertex or per-patch attribute written by another invocation at any point during the same phase, or if two invocations attempt to write different values to the same per-patch output in a single phase.

8.9. Tessellation Evaluation Shaders

The Tessellation Evaluation Shader operates on an input patch of control points and their associated data, and a single input barycentric coordinate indicating the invocation’s relative position within the subdivided patch, and outputs a single vertex and its associated data.

8.9.1. Tessellation Evaluation Shader Execution

A tessellation evaluation shader is invoked at least once for each unique vertex generated by the tessellator. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

8.10. Geometry Shaders

The geometry shader operates on a group of vertices and their associated data assembled from a single input primitive, and emits zero or more output primitives and the group of vertices and their associated data required for each output primitive.

8.10.1. Geometry Shader Execution

A geometry shader is invoked at least once for each primitive produced by the tessellation stages, or at least once for each primitive generated by primitive assembly when tessellation is not in use. A shader can request that the geometry shader runs multiple instances. A geometry shader is invoked at least once for each instance. If the subpass includes multiple views in its view mask, the shader may be invoked separately for each view.

8.11. Fragment Shaders

Fragment shaders are invoked as the result of rasterization in a graphics pipeline. Each fragment shader invocation operates on a single fragment and its associated data. With few exceptions, fragment shaders do not have access to any data associated with other fragments and are considered to execute in isolation of fragment shader invocations associated with other fragments.

8.11.1. Fragment Shader Execution

Fragment shaders are invoked for each fragment generated by rasterization, or as helper invocations.

For fragment shaders invoked by fragments, the following rules apply:

  • A fragment shader must not be executed if a fragment operation that executes before fragment shading discards the fragment.

  • A fragment shader may not be executed if:

    • An implementation determines that another fragment shader, invoked by a subsequent primitive in primitive order, overwrites all results computed by the shader (including writes to storage resources).

    • Any other fragment operation discards the fragment, and the shader does not write to any storage resources.

  • Otherwise, at least one fragment shader must be executed.

    • If sample shading is enabled and multiple invocations per fragment are required, additional invocations must be executed as specified.

    • If a shading rate image is used and multiple invocations per fragment are required, additional invocations must be executed as specified.

    • Each covered sample must be included in at least one fragment shader invocation.

Note

Multiple fragment shader invocations may be executed for the same fragment for any number of implementation dependent reasons. When there is more than one fragment shader invocation per fragment, the association of samples to invocations is implementation-dependent. Stores and atomics performed by these additional invocations have the normal effect.

For example, if the subpass includes multiple views in its view mask, a fragment shader may be invoked separately for each view.

Similarly, if the render pass has a fragment density map attachment, more than one fragment shader invocation may be invoked for each covered sample. Such additional invocations are only produced if VkPhysicalDeviceFragmentDensityMapPropertiesEXT::fragmentDensityInvocations is VK_TRUE. Implementations may generate these additional fragment shader invocations in order to make transitions between fragment areas with different fragment densities more smooth.

Note

Relative ordering of execution of different fragment shader invocations is explicitly not defined.

8.11.2. Early Fragment Tests

An explicit control is provided to allow fragment shaders to enable early fragment tests. If the fragment shader specifies the EarlyFragmentTests OpExecutionMode, additional per-fragment tests are performed prior to fragment shader execution.

If the fragment shader additionally specifies the PostDepthCoverage OpExecutionMode, the value of a variable decorated with the SampleMask built-in reflects the coverage after the early fragment tests. Otherwise, it reflects the coverage before the early fragment tests.

If early fragment tests are enabled, any depth value computed by the fragment shader has no effect.

8.11.3. Fragment Shader Interlock

In normal operation, it is possible for more than one fragment shader invocation to be executed simultaneously for the same pixel if there are overlapping primitives. If the fragmentShaderSampleInterlock, fragmentShaderPixelInterlock, or fragmentShaderShadingRateInterlock features are enabled, it is possible to define a critical section within the fragment shader that is guaranteed to not run simultaneously with another fragment shader invocation for the same sample(s) or pixel(s). It is also possible to control the relative ordering of execution of these critical sections across different fragment shader invovations.

If the FragmentShaderSampleInterlockEXT, FragmentShaderPixelInterlockEXT, or FragmentShaderShadingRateInterlockEXT capabilities are declared in the fragment shader, the OpBeginInvocationInterlockEXT and OpEndInvocationInterlockEXT instructions must be used to delimit a critical section of fragment shader code.

To ensure each invocation of the critical section is executed in primitive order, declare one of the PixelInterlockOrderedEXT, SampleInterlockOrderedEXT, or ShadingRateInterlockOrderedEXT execution modes. If the order of execution of each invocation of the critical section does not matter, declare one of the PixelInterlockUnorderedEXT, SampleInterlockUnorderedEXT, or ShadingRateInterlockUnorderedEXT execution modes.

The PixelInterlockOrderedEXT and PixelInterlockUnorderedEXT execution modes provide mutual exclusion in the critical section for any pair of fragments corresponding to the same pixel, or pixels if the fragment covers more than one pixel. With sample shading enabled, these execution modes are treated like SampleInterlockOrderedEXT or SampleInterlockUnorderedEXT respectively.

The SampleInterlockOrderedEXT and SampleInterlockUnorderedEXT execution modes only provide mutual exclusion for pairs of fragments that both cover at least one common sample in the same pixel; these are recommended for performance if shaders use per-sample data structures. If these execution modes are used in single-sample mode they are treated like PixelInterlockOrderedEXT or PixelInterlockUnorderedEXT respectively.

The ShadingRateInterlockOrderedEXT and ShadingRateInterlockUnorderedEXT execution modes provide mutual exclusion for pairs of fragments that both have at least one common sample in the same pixel, even if none of the common samples are covered by both fragments. With sample shading enabled, these execution modes are treated like SampleInterlockOrderedEXT or SampleInterlockUnorderedEXT respectively.

8.12. Compute Shaders

Compute shaders are invoked via vkCmdDispatch and vkCmdDispatchIndirect commands. In general, they have access to similar resources as shader stages executing as part of a graphics pipeline.

Compute workloads are formed from groups of work items called workgroups and processed by the compute shader in the current compute pipeline. A workgroup is a collection of shader invocations that execute the same shader, potentially in parallel. Compute shaders execute in global workgroups which are divided into a number of local workgroups with a size that can be set by assigning a value to the LocalSize execution mode or via an object decorated by the WorkgroupSize decoration. An invocation within a local workgroup can share data with other members of the local workgroup through shared variables and issue memory and control flow barriers to synchronize with other members of the local workgroup.

8.13. Interpolation Decorations

Interpolation decorations control the behavior of attribute interpolation in the fragment shader stage. Interpolation decorations can be applied to Input storage class variables in the fragment shader stage’s interface, and control the interpolation behavior of those variables.

Inputs that could be interpolated can be decorated by at most one of the following decorations:

  • Flat: no interpolation

  • NoPerspective: linear interpolation (for lines and polygons)

Fragment input variables decorated with neither Flat nor NoPerspective use perspective-correct interpolation (for lines and polygons).

The presence of and type of interpolation is controlled by the above interpolation decorations as well as the auxiliary decorations Centroid and Sample.

A variable decorated with Flat will not be interpolated. Instead, it will have the same value for every fragment within a triangle. This value will come from a single provoking vertex. A variable decorated with Flat can also be decorated with Centroid or Sample, which will mean the same thing as decorating it only as Flat.

For fragment shader input variables decorated with neither Centroid nor Sample, the assigned variable may be interpolated anywhere within the fragment and a single value may be assigned to each sample within the fragment.

If a fragment shader input is decorated with Centroid, a single value may be assigned to that variable for all samples in the fragment, but that value must be interpolated to a location that lies in both the fragment and in the primitive being rendered, including any of the fragment’s samples covered by the primitive. Because the location at which the variable is interpolated may be different in neighboring fragments, and derivatives may be computed by computing differences between neighboring fragments, derivatives of centroid-sampled inputs may be less accurate than those for non-centroid interpolated variables. The PostDepthCoverage execution mode does not affect the determination of the centroid location.

If a fragment shader input is decorated with Sample, a separate value must be assigned to that variable for each covered sample in the fragment, and that value must be sampled at the location of the individual sample. When rasterizationSamples is VK_SAMPLE_COUNT_1_BIT, the fragment center must be used for Centroid, Sample, and undecorated attribute interpolation.

Fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision floating-point type must be decorated with Flat.

When the VK_AMD_shader_explicit_vertex_parameter device extension is enabled inputs can be also decorated with the CustomInterpAMD interpolation decoration, including fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision floating-point type. Inputs decorated with CustomInterpAMD can only be accessed by the extended instruction InterpolateAtVertexAMD and allows accessing the value of the input for individual vertices of the primitive.

When the fragmentShaderBarycentric feature is enabled, inputs can be also decorated with the PerVertexNV interpolation decoration, including fragment shader inputs that are signed or unsigned integers, integer vectors, or any double-precision floating-point type. Inputs decorated with PerVertexNV can only be accessed using an extra array dimension, where the extra index identifies one of the vertices of the primitive that produced the fragment.

8.14. Ray Generation Shaders

A ray generation shader is similar to a compute shader. Its main purpose is to execute ray tracing queries using OpTraceRayKHR instructions and process the results.

8.14.1. Ray Generation Shader Execution

One ray generation shader is executed per ray tracing dispatch. Its location in the shader binding table (see Shader Binding Table for details) is passed directly into vkCmdTraceRaysKHR using the raygenShaderBindingTableBuffer and raygenShaderBindingOffset parameters.

8.15. Intersection Shaders

Intersection shaders enable the implementation of arbitrary, application defined geometric primitives. An intersection shader for a primitive is executed whenever its axis-aligned bounding box is hit by a ray.

Like other ray tracing shader domains, an intersection shader operates on a single ray at a time. It also operates on a single primitive at a time. It is therefore the purpose of an intersection shader to compute the ray-primitive intersections and report them. To report an intersection, the shader calls the OpReportIntersectionKHR instruction.

An intersection shader communicates with any-hit and closest shaders by generating attribute values that they can read. Intersection shaders cannot read or modify the ray payload.

8.15.1. Intersection Shader Execution

The order in which intersections are found along a ray, and therefore the order in which intersection shaders are executed, is unspecified.

The intersection shader of the closest AABB which intersects the ray is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated.

8.16. Any-Hit Shaders

The any-hit shader is executed after the intersection shader reports an intersection that lies within the current [tmin,tmax] of the ray. The main use of any-hit shaders is to programmatically decide whether or not an intersection will be accepted. The intersection will be accepted unless the shader calls the OpIgnoreIntersectionKHR instruction. Any-hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can read or modify the ray payload.

8.16.1. Any-Hit Shader Execution

The order in which intersections are found along a ray, and therefore the order in which any-hit shaders are executed, is unspecified.

The any-hit shader of the closest hit is guaranteed to be executed at some point during traversal, unless the ray is forcibly terminated.

8.17. Closest Hit Shaders

Closest hit shaders have read-only access to the attributes generated by the corresponding intersection shader, and can read or modify the ray payload. They also have access to a number of system-generated values. Closest hit shaders can call OpTraceRayKHR to recursively trace rays.

8.17.1. Closest Hit Shader Execution

Exactly one closest hit shader is executed when traversal is finished and an intersection has been found and accepted.

8.18. Miss Shaders

Miss shaders can access the ray payload and can trace new rays through the OpTraceRayKHR instruction, but cannot access attributes since they are not associated with an intersection.

8.18.1. Miss Shader Execution

A miss shader is executed instead of a closest hit shader if no intersection was found during traversal.

8.19. Callable Shaders

Callable shaders can access a callable payload that works similarly to ray payloads to do subroutine work.

8.19.1. Callable Shader Execution

A callable shader is executed by calling OpExecuteCallableKHR from an allowed shader stage.

8.20. Static Use

A SPIR-V module declares a global object in memory using the OpVariable instruction, which results in a pointer x to that object. A specific entry point in a SPIR-V module is said to statically use that object if that entry point’s call tree contains a function containing a memory instruction or image instruction with x as an id operand. See the “Memory Instructions” and “Image Instructions” subsections of section 3 “Binary Form” of the SPIR-V specification for the complete list of SPIR-V memory instructions.

Static use is not used to control the behavior of variables with Input and Output storage. The effects of those variables are applied based only on whether they are present in a shader entry point’s interface.

8.21. Scope

A scope describes a set of shader invocations, where each such set is a scope instance. Each invocation belongs to one or more scope instances, but belongs to no more than one scope instance for each scope.

The operations available between invocations in a given scope instance vary, with smaller scopes generally able to perform more operations, and with greater efficiency.

8.21.1. Cross Device

All invocations executed in a Vulkan instance fall into a single cross device scope instance.

Whilst the CrossDevice scope is defined in SPIR-V, it is disallowed in Vulkan. API synchronization commands can be used to communicate between devices.

8.21.2. Device

All invocations executed on a single device form a device scope instance.

If the vulkanMemoryModel and vulkanMemoryModelDeviceScope features are enabled, this scope is represented in SPIR-V by the Device Scope, which can be used as a Memory Scope for barrier and atomic operations.

If both the shaderDeviceClock and vulkanMemoryModelDeviceScope features are enabled, using the Device Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same device scope instance.

There is no method to synchronize the execution of these invocations within SPIR-V, and this can only be done with API synchronization primitives.

Invocations executing on different devices in a device group operate in separate device scope instances.

8.21.3. Queue Family

Invocations executed by queues in a given queue family form a queue family scope instance.

This scope is identified in SPIR-V as the QueueFamily Scope if the vulkanMemoryModel feature is enabled, or if not, the Device Scope, which can be used as a Memory Scope for barrier and atomic operations.

If the shaderDeviceClock feature is enabled, but the vulkanMemoryModelDeviceScope feature is not enabled, using the Device Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same queue family scope instance.

There is no method to synchronize the execution of these invocations within SPIR-V, and this can only be done with API synchronization primitives.

Each invocation in a queue family scope instance must be in the same device scope instance.

8.21.4. Command

Any shader invocations executed as the result of a single command such as vkCmdDispatch or vkCmdDraw form a command scope instance. For indirect drawing commands with drawCount greater than one, invocations from separate draws are in separate command scope instances. For ray tracing shaders, an invocation group is an implementation-dependent subset of the set of shader invocations of a given shader stage which are produced by a single trace rays command.

There is no specific Scope for communication across invocations in a command scope instance. As this has a clear boundary at the API level, coordination here can be performed in the API, rather than in SPIR-V.

Each invocation in a command scope instance must be in the same queue-family scope instance.

For shaders without defined workgroups, this set of invocations forms an invocation group as defined in the SPIR-V specification.

8.21.5. Primitive

Any fragment shader invocations executed as the result of rasterization of a single primitive form a primitive scope instance.

There is no specific Scope for communication across invocations in a primitive scope instance.

Any generated helper invocations are included in this scope instance.

Each invocation in a primitive scope instance must be in the same command scope instance.

Any input variables decorated with Flat are uniform within a primitive scope instance.

8.21.6. Shader Call

Any shader-call-related invocations that are executed in one or more ray tracing execution models form a shader call scope instance.

The ShaderCallKHR Scope can be used as Memory Scope for barrier and atomic operations.

Each invocation in a shader call scope instance must be in the same queue family scope instance.

8.21.7. Workgroup

A local workgroup is a set of invocations that can synchronize and share data with each other using memory in the Workgroup storage class.

The Workgroup Scope can be used as both an Execution Scope and Memory Scope for barrier and atomic operations.

Each invocation in a local workgroup must be in the same command scope instance.

Only task, mesh, and compute shaders have defined workgroups - other shader types cannot use workgroup functionality. For shaders that have defined workgroups, this set of invocations forms an invocation group as defined in the SPIR-V specification.

8.21.8. Subgroup

A subgroup (see the subsection “Control Flow” of section 2 of the SPIR-V 1.3 Revision 1 specification) is a set of invocations that can synchronize and share data with each other efficiently.

The Subgroup Scope can be used as both an Execution Scope and Memory Scope for barrier and atomic operations. Other subgroup features allow the use of group operations with subgroup scope.

If the shaderSubgroupClock feature is enabled, using the Subgroup Scope with the OpReadClockKHR instruction will read from a clock that is consistent across invocations in the same subgroup.

For shaders that have defined workgroups, each invocation in a subgroup must be in the same local workgroup.

In other shader stages, each invocation in a subgroup must be in the same device scope instance.

Only shader stages that support subgroup operations have defined subgroups.

8.21.9. Quad

A quad scope instance is formed of four shader invocations.

In a fragment shader, each invocation in a quad scope instance is formed of invocations in neighboring framebuffer locations (xi, yi), where:

  • i is the index of the invocation within the scope instance.

  • w and h are the number of pixels the fragment covers in the x and y axes.

  • w and h are identical for all participating invocations.

  • (x0) = (x1 - w) = (x2) = (x3 - w)

  • (y0) = (y1) = (y2 - h) = (y3 - h)

  • Each invocation has the same layer and sample indices.

In a compute shader, if the DerivativeGroupQuadsNV execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation IDs (xi, yi), where:

  • i is the index of the invocation within the quad scope instance.

  • (x0) = (x1 - 1) = (x2) = (x3 - 1)

  • (y0) = (y1) = (y2 - 1) = (y3 - 1)

  • x0 and y0 are integer multiples of 2.

  • Each invocation has the same z coordinate.

In a compute shader, if the DerivativeGroupLinearNV execution mode is specified, each invocation in a quad scope instance is formed of invocations with adjacent local invocation indices (li), where:

  • i is the index of the invocation within the quad scope instance.

  • (l0) = (l1 - 1) = (l2 - 2) = (l3 - 3)

  • l0 is an integer multiple of 4.

In all shaders, each invocation in a quad scope instance is formed of invocations in adjacent subgroup invocation indices (si), where:

  • i is the index of the invocation within the quad scope instance.

  • (s0) = (s1 - 1) = (s2 - 2) = (s3 - 3)

  • s0 is an integer multiple of 4.

Each invocation in a quad scope instance must be in the same subgroup.

Fragment and compute shaders have defined quad scope instances. If the quadOperationsInAllStages limit is supported, any shader stages that support subgroup operations also have defined quad scope instances.

8.21.10. Fragment Interlock

A fragment interlock scope instance is formed of fragment shader invocations based on their framebuffer locations (x,y,layer,sample), executed by commands inside a single subpass.

The specific set of invocations included varies based on the execution mode as follows:

  • If the SampleInterlockOrderedEXT or SampleInterlockUnorderedEXT execution modes are used, only invocations with identical framebuffer locations (x,y,layer,sample) are included.

  • If the PixelInterlockOrderedEXT or PixelInterlockUnorderedEXT execution modes are used, fragments with different sample ids are also included.

  • If the ShadingRateInterlockOrderedEXT or ShadingRateInterlockUnorderedEXT execution modes are used, fragments from neighbouring framebuffer locations are also included, as determined by the shading rate.

Only fragment shaders with one of the above execution modes have defined fragment interlock scope instances.

There is no specific Scope value for communication across invocations in a fragment interlock scope instance. However, this is implicitly used as a memory scope by OpBeginInvocationInterlockEXT and OpEndInvocationInterlockEXT.

Each invocation in a fragment interlock scope instance must be in the same queue family scope instance.

8.21.11. Invocation

The smallest scope is a single invocation; this is represented by the Invocation Scope in SPIR-V.

Fragment shader invocations must be in a primitive scope instance.

All invocations in all stages must be in a command scope instance.

8.22. Group Operations

Group operations are executed by multiple invocations within a scope instance; with each invocation involved in calculating the result. This provides a mechanism for efficient communication between invocations in a particular scope instance.

Group operations all take a Scope defining the desired scope instance to operate within. Only the Subgroup scope can be used for these operations; the subgroupSupportedOperations limit defines which types of operation can be used.

8.22.1. Basic Group Operations

Basic group operations include the use of OpGroupNonUniformElect, OpControlBarrier, OpMemoryBarrier, and atomic operations.

OpGroupNonUniformElect can be used to choose a single invocation to perform a task for the whole group. Only the invocation with the lowest id in the group will return true.

The Memory Model appendix defines the operation of barriers and atomics.

8.22.2. Vote Group Operations

The vote group operations allow invocations within a group to compare values across a group. The types of votes enabled are:

  • Do all active group invocations agree that an expression is true?

  • Do any active group invocations evaluate an expression to true?

  • Do all active group invocations have the same value of an expression?

Note

These operations are useful in combination with control flow in that they allow for developers to check whether conditions match across the group and choose potentially faster code-paths in these cases.

8.22.3. Arithmetic Group Operations

The arithmetic group operations allow invocations to perform scans and reductions across a group. The operators supported are add, mul, min, max, and, or, xor.

For reductions, every invocation in a group will obtain the cumulative result of these operators applied to all values in the group. For exclusive scans, each invocation in a group will obtain the cumulative result of these operators applied to all values in invocations with a lower index in the group. Inclusive scans are identical to exclusive scans, except the cumulative result includes the operator applied to the value in the current invocation.

The order in which these operators are applied is implementation-dependent.

8.22.4. Ballot Group Operations

The ballot group operations allow invocations to perform more complex votes across the group. The ballot functionality allows all invocations within a group to provide a boolean value and get as a result what each invocation provided as their boolean value. The broadcast functionality allows values to be broadcast from an invocation to all other invocations within the group.

8.22.5. Shuffle Group Operations

The shuffle group operations allow invocations to read values from other invocations within a group.

8.22.6. Shuffle Relative Group Operations

The shuffle relative group operations allow invocations to read values from other invocations within the group relative to the current invocation in the group. The relative operations supported allow data to be shifted up and down through the invocations within a group.

8.22.7. Clustered Group Operations

The clustered group operations allow invocations to perform an operation among partitions of a group, such that the operation is only performed within the group invocations within a partition. The partitions for clustered group operations are consecutive power-of-two size groups of invocations and the cluster size must be known at pipeline creation time. The operations supported are add, mul, min, max, and, or, xor.

8.23. Quad Group Operations

Quad group operations (OpGroupNonUniformQuad*) are a specialized type of group operations that only operate on quad scope instances. Whilst these instructions do include a Scope parameter, this scope is always overridden; only the quad scope instance is included in its execution scope.

Fragment shaders that statically execute quad group operations must launch sufficient invocations to ensure their correct operation; additional helper invocations are launched for framebuffer locations not covered by rasterized fragments if necessary.

The index used to select participating invocations is i, as described for a quad scope instance, defined as the quad index in the SPIR-V specification.

For OpGroupNonUniformQuadBroadcast this value is equal to Index. For OpGroupNonUniformQuadSwap, it is equal to the implicit Index used by each participating invocation.

8.24. Derivative Operations

Derivative operations calculate the partial derivative for an expression P as a function of an invocation’s x and y coordinates.

Derivative operations operate on a set of invocations known as a derivative group as defined in the SPIR-V specification. A derivative group is equivalent to the local workgroup for a compute shader invocation, or the primitive scope instance for a fragment shader invocation.

Derivatives are calculated assuming that P is piecewise linear and continuous within the derivative group. All dynamic instances of explicit derivative instructions (OpDPdx*, OpDPdy*, and OpFwidth*) must be executed in control flow that is uniform within a derivative group. For other derivative operations, results are undefined if a dynamic instance is executed in control flow is not uniform within the derivative group.

Fragment shaders that statically execute derivative operations must launch sufficient invocations to ensure their correct operation; additional helper invocations are launched for framebuffer locations not covered by rasterized fragments if necessary.

Note

In a compute shader, it is the application’s responsibility to ensure that sufficient invocations are launched.

Derivative operations calculate their results as the difference between the result of P across invocations in the quad. For fine derivative operations (OpDPdxFine and OpDPdyFine), the values of DPdx(Pi) are calculated as

DPdx(P0) = DPdx(P1) = P1 - P0

DPdx(P2) = DPdx(P3) = P3 - P2

and the values of DPdy(Pi) are calculated as

DPdy(P0) = DPdy(P2) = P2 - P0

DPdy(P1) = DPdy(P3) = P3 - P1

where i is the index of each invocation as described in Quad.

Coarse derivative operations (OpDPdxCoarse and OpDPdyCoarse), calculate their results in roughly the same manner, but may only calculate two values instead of four (one for each of DPdx and DPdy), reusing the same result no matter the originating invocation. If an implementation does this, it should use the fine derivative calculations described for P0.

Note

Derivative values are calculated between fragments rather than pixels. If the fragment shader invocations involved in the calculation covers multiple pixels, these operations cover a wider area, resulting in larger derivative values. This in turn will result in a coarser level of detail being selected for image sampling operations using derivatives.

Applications may want to account for this when using multi-pixel fragments; if pixel derivatives are desired, applications should use explicit derivative operations and divide the results by the size of the fragment in each dimension as follows:

DPdx(Pn)' = DPdx(Pn) / w

DPdy(Pn)' = DPdy(Pn) / h

where w and h are the size of the fragments in the quad, and DPdx(Pn)' and DPdy(Pn)' are the pixel derivatives.

The results for OpDPdx and OpDPdy may be calculated as either fine or coarse derivatives, with implementations favouring the most efficient approach. Implementations must choose coarse or fine consistently between the two.

Executing OpFwidthFine, OpFwidthCoarse, or OpFwidth is equivalent to executing the corresponding OpDPdx* and OpDPdy* instructions, taking the absolute value of the results, and summing them.

Executing a OpImage*Sample*ImplicitLod instruction is equivalent to executing OpDPdx(Coordinate) and OpDPdy(Coordinate), and passing the results as the Grad operands dx and dy.

Note

It is expected that using the ImplicitLod variants of sampling functions will be substantially more efficient than using the ExplicitLod variants with explicitly generated derivatives.

8.25. Helper Invocations

When performing derivative or quad group operations in a fragment shader, additional invocations may be spawned in order to ensure correct results. These additional invocations are known as helper invocations and can be identified by a non-zero value in the HelperInvocation built-in. Stores and atomics performed by helper invocations must not have any effect on memory, and values returned by atomic instructions in helper invocations are undefined.

Helper invocations may become inactive at any time for any reason, with one exception. If a helper invocation would be active if it were not a helper invocation, it must be active for derivative and quad group operations.

Helper invocations may become permanently inactive if all invocations in a quad scope instance become helper invocations.

8.26. Cooperative Matrices

A cooperative matrix type is a SPIR-V type where the storage for and computations performed on the matrix are spread across the invocations in a scope instance. These types give the implementation freedom in how to optimize matrix multiplies.

SPIR-V defines the types and instructions, but does not specify rules about what sizes/combinations are valid, and it is expected that different implementations may support different sizes.

To enumerate the supported cooperative matrix types and operations, call:

// Provided by VK_NV_cooperative_matrix
VkResult vkGetPhysicalDeviceCooperativeMatrixPropertiesNV(
    VkPhysicalDevice                            physicalDevice,
    uint32_t*                                   pPropertyCount,
    VkCooperativeMatrixPropertiesNV*            pProperties);
  • physicalDevice is the physical device.

  • pPropertyCount is a pointer to an integer related to the number of cooperative matrix properties available or queried.

  • pProperties is either NULL or a pointer to an array of VkCooperativeMatrixPropertiesNV structures.

If pProperties is NULL, then the number of cooperative matrix properties available is returned in pPropertyCount. Otherwise, pPropertyCount must point to a variable set by the user to the number of elements in the pProperties array, and on return the variable is overwritten with the number of structures actually written to pProperties. If pPropertyCount is less than the number of cooperative matrix properties available, at most pPropertyCount structures will be written. If pPropertyCount is smaller than the number of cooperative matrix properties available, VK_INCOMPLETE will be returned instead of VK_SUCCESS, to indicate that not all the available cooperative matrix properties were returned.

Valid Usage (Implicit)
  • physicalDevice must be a valid VkPhysicalDevice handle

  • pPropertyCount must be a valid pointer to a uint32_t value

  • If the value referenced by pPropertyCount is not 0, and pProperties is not NULL, pProperties must be a valid pointer to an array of pPropertyCount VkCooperativeMatrixPropertiesNV structures

Return Codes
Success
  • VK_SUCCESS

  • VK_INCOMPLETE

Failure
  • VK_ERROR_OUT_OF_HOST_MEMORY

  • VK_ERROR_OUT_OF_DEVICE_MEMORY

Each VkCooperativeMatrixPropertiesNV structure describes a single supported combination of types for a matrix multiply/add operation (OpCooperativeMatrixMulAddNV). The multiply can be described in terms of the following variables and types (in SPIR-V pseudocode):

    %A is of type OpTypeCooperativeMatrixNV %AType %scope %MSize %KSize
    %B is of type OpTypeCooperativeMatrixNV %BType %scope %KSize %NSize
    %C is of type OpTypeCooperativeMatrixNV %CType %scope %MSize %NSize
    %D is of type OpTypeCooperativeMatrixNV %DType %scope %MSize %NSize

    %D = %A * %B + %C // using OpCooperativeMatrixMulAddNV

A matrix multiply with these dimensions is known as an MxNxK matrix multiply.

The VkCooperativeMatrixPropertiesNV structure is defined as:

// Provided by VK_NV_cooperative_matrix
typedef struct VkCooperativeMatrixPropertiesNV {
    VkStructureType      sType;
    void*                pNext;
    uint32_t             MSize;
    uint32_t             NSize;
    uint32_t             KSize;
    VkComponentTypeNV    AType;
    VkComponentTypeNV    BType;
    VkComponentTypeNV    CType;
    VkComponentTypeNV    DType;
    VkScopeNV            scope;
} VkCooperativeMatrixPropertiesNV;
  • sType is the type of this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • MSize is the number of rows in matrices A, C, and D.

  • KSize is the number of columns in matrix A and rows in matrix B.

  • NSize is the number of columns in matrices B, C, D.

  • AType is the component type of matrix A, of type VkComponentTypeNV.

  • BType is the component type of matrix B, of type VkComponentTypeNV.

  • CType is the component type of matrix C, of type VkComponentTypeNV.

  • DType is the component type of matrix D, of type VkComponentTypeNV.

  • scope is the scope of all the matrix types, of type VkScopeNV.

If some types are preferred over other types (e.g. for performance), they should appear earlier in the list enumerated by vkGetPhysicalDeviceCooperativeMatrixPropertiesNV.

At least one entry in the list must have power of two values for all of MSize, KSize, and NSize.

Valid Usage (Implicit)

Possible values for VkScopeNV include:

// Provided by VK_NV_cooperative_matrix
typedef enum VkScopeNV {
    VK_SCOPE_DEVICE_NV = 1,
    VK_SCOPE_WORKGROUP_NV = 2,
    VK_SCOPE_SUBGROUP_NV = 3,
    VK_SCOPE_QUEUE_FAMILY_NV = 5,
} VkScopeNV;
  • VK_SCOPE_DEVICE_NV corresponds to SPIR-V Device scope.

  • VK_SCOPE_WORKGROUP_NV corresponds to SPIR-V Workgroup scope.

  • VK_SCOPE_SUBGROUP_NV corresponds to SPIR-V Subgroup scope.

  • VK_SCOPE_QUEUE_FAMILY_NV corresponds to SPIR-V QueueFamily scope.

All enum values match the corresponding SPIR-V value.

Possible values for VkComponentTypeNV include:

// Provided by VK_NV_cooperative_matrix
typedef enum VkComponentTypeNV {
    VK_COMPONENT_TYPE_FLOAT16_NV = 0,
    VK_COMPONENT_TYPE_FLOAT32_NV = 1,
    VK_COMPONENT_TYPE_FLOAT64_NV = 2,
    VK_COMPONENT_TYPE_SINT8_NV = 3,
    VK_COMPONENT_TYPE_SINT16_NV = 4,
    VK_COMPONENT_TYPE_SINT32_NV = 5,
    VK_COMPONENT_TYPE_SINT64_NV = 6,
    VK_COMPONENT_TYPE_UINT8_NV = 7,
    VK_COMPONENT_TYPE_UINT16_NV = 8,
    VK_COMPONENT_TYPE_UINT32_NV = 9,
    VK_COMPONENT_TYPE_UINT64_NV = 10,
} VkComponentTypeNV;
  • VK_COMPONENT_TYPE_FLOAT16_NV corresponds to SPIR-V OpTypeFloat 16.

  • VK_COMPONENT_TYPE_FLOAT32_NV corresponds to SPIR-V OpTypeFloat 32.

  • VK_COMPONENT_TYPE_FLOAT64_NV corresponds to SPIR-V OpTypeFloat 64.

  • VK_COMPONENT_TYPE_SINT8_NV corresponds to SPIR-V OpTypeInt 8 1.

  • VK_COMPONENT_TYPE_SINT16_NV corresponds to SPIR-V OpTypeInt 16 1.

  • VK_COMPONENT_TYPE_SINT32_NV corresponds to SPIR-V OpTypeInt 32 1.

  • VK_COMPONENT_TYPE_SINT64_NV corresponds to SPIR-V OpTypeInt 64 1.

  • VK_COMPONENT_TYPE_UINT8_NV corresponds to SPIR-V OpTypeInt 8 0.

  • VK_COMPONENT_TYPE_UINT16_NV corresponds to SPIR-V OpTypeInt 16 0.

  • VK_COMPONENT_TYPE_UINT32_NV corresponds to SPIR-V OpTypeInt 32 0.

  • VK_COMPONENT_TYPE_UINT64_NV corresponds to SPIR-V OpTypeInt 64 0.

8.27. Validation Cache

Validation cache objects allow the result of internal validation to be reused, both within a single application run and between multiple runs. Reuse within a single run is achieved by passing the same validation cache object when creating supported Vulkan objects. Reuse across runs of an application is achieved by retrieving validation cache contents in one run of an application, saving the contents, and using them to preinitialize a validation cache on a subsequent run. The contents of the validation cache objects are managed by the validation layers. Applications can manage the host memory consumed by a validation cache object and control the amount of data retrieved from a validation cache object.

Validation cache objects are represented by VkValidationCacheEXT handles:

// Provided by VK_EXT_validation_cache
VK_DEFINE_NON_DISPATCHABLE_HANDLE(VkValidationCacheEXT)

To create validation cache objects, call:

// Provided by VK_EXT_validation_cache
VkResult vkCreateValidationCacheEXT(
    VkDevice                                    device,
    const VkValidationCacheCreateInfoEXT*       pCreateInfo,
    const VkAllocationCallbacks*                pAllocator,
    VkValidationCacheEXT*                       pValidationCache);
  • device is the logical device that creates the validation cache object.

  • pCreateInfo is a pointer to a VkValidationCacheCreateInfoEXT structure containing the initial parameters for the validation cache object.

  • pAllocator controls host memory allocation as described in the Memory Allocation chapter.

  • pValidationCache is a pointer to a VkValidationCacheEXT handle in which the resulting validation cache object is returned.

Note

Applications can track and manage the total host memory size of a validation cache object using the pAllocator. Applications can limit the amount of data retrieved from a validation cache object in vkGetValidationCacheDataEXT. Implementations should not internally limit the total number of entries added to a validation cache object or the total host memory consumed.

Once created, a validation cache can be passed to the vkCreateShaderModule command by adding this object to the VkShaderModuleCreateInfo structure’s pNext chain. If a VkShaderModuleValidationCacheCreateInfoEXT object is included in the VkShaderModuleCreateInfo::pNext chain, and its validationCache field is not VK_NULL_HANDLE, the implementation will query it for possible reuse opportunities and update it with new content. The use of the validation cache object in these commands is internally synchronized, and the same validation cache object can be used in multiple threads simultaneously.

Note

Implementations should make every effort to limit any critical sections to the actual accesses to the cache, which is expected to be significantly shorter than the duration of the vkCreateShaderModule command.

Valid Usage (Implicit)
Return Codes
Success
  • VK_SUCCESS

Failure
  • VK_ERROR_OUT_OF_HOST_MEMORY

The VkValidationCacheCreateInfoEXT structure is defined as:

// Provided by VK_EXT_validation_cache
typedef struct VkValidationCacheCreateInfoEXT {
    VkStructureType                    sType;
    const void*                        pNext;
    VkValidationCacheCreateFlagsEXT    flags;
    size_t                             initialDataSize;
    const void*                        pInitialData;
} VkValidationCacheCreateInfoEXT;
  • sType is the type of this structure.

  • pNext is NULL or a pointer to a structure extending this structure.

  • flags is reserved for future use.

  • initialDataSize is the number of bytes in pInitialData. If initialDataSize is zero, the validation cache will initially be empty.

  • pInitialData is a pointer to previously retrieved validation cache data. If the validation cache data is incompatible (as defined below) with the device, the validation cache will be initially empty. If initialDataSize is zero, pInitialData is ignored.

Valid Usage
  • If initialDataSize is not 0, it must be equal to the size of pInitialData, as returned by vkGetValidationCacheDataEXT when pInitialData was originally retrieved

  • If initialDataSize is not 0, pInitialData must have been retrieved from a previous call to vkGetValidationCacheDataEXT

Valid Usage (Implicit)
  • sType must be VK_STRUCTURE_TYPE_VALIDATION_CACHE_CREATE_INFO_EXT

  • pNext must be NULL

  • flags must be 0

  • If initialDataSize is not 0, pInitialData must be a valid pointer to an array of initialDataSize bytes

// Provided by VK_EXT_validation_cache
typedef VkFlags VkValidationCacheCreateFlagsEXT;

VkValidationCacheCreateFlagsEXT is a bitmask type for setting a mask, but is currently reserved for future use.

Validation cache objects can be merged using the command:

// Provided by VK_EXT_validation_cache
VkResult vkMergeValidationCachesEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        dstCache,
    uint32_t                                    srcCacheCount,
    const VkValidationCacheEXT*                 pSrcCaches);
  • device is the logical device that owns the validation cache objects.

  • dstCache is the handle of the validation cache to merge results into.

  • srcCacheCount is the length of the pSrcCaches array.

  • pSrcCaches is a pointer to an array of validation cache handles, which will be merged into dstCache. The previous contents of dstCache are included after the merge.

Note

The details of the merge operation are implementation dependent, but implementations should merge the contents of the specified validation caches and prune duplicate entries.

Valid Usage
  • dstCache must not appear in the list of source caches

Valid Usage (Implicit)
  • device must be a valid VkDevice handle

  • dstCache must be a valid VkValidationCacheEXT handle

  • pSrcCaches must be a valid pointer to an array of srcCacheCount valid VkValidationCacheEXT handles

  • srcCacheCount must be greater than 0

  • dstCache must have been created, allocated, or retrieved from device

  • Each element of pSrcCaches must have been created, allocated, or retrieved from device

Host Synchronization
  • Host access to dstCache must be externally synchronized

Return Codes
Success
  • VK_SUCCESS

Failure
  • VK_ERROR_OUT_OF_HOST_MEMORY

  • VK_ERROR_OUT_OF_DEVICE_MEMORY

Data can be retrieved from a validation cache object using the command:

// Provided by VK_EXT_validation_cache
VkResult vkGetValidationCacheDataEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        validationCache,
    size_t*                                     pDataSize,
    void*                                       pData);
  • device is the logical device that owns the validation cache.

  • validationCache is the validation cache to retrieve data from.

  • pDataSize is a pointer to a value related to the amount of data in the validation cache, as described below.

  • pData is either NULL or a pointer to a buffer.

If pData is NULL, then the maximum size of the data that can be retrieved from the validation cache, in bytes, is returned in pDataSize. Otherwise, pDataSize must point to a variable set by the user to the size of the buffer, in bytes, pointed to by pData, and on return the variable is overwritten with the amount of data actually written to pData.

If pDataSize is less than the maximum size that can be retrieved by the validation cache, at most pDataSize bytes will be written to pData, and vkGetValidationCacheDataEXT will return VK_INCOMPLETE. Any data written to pData is valid and can be provided as the pInitialData member of the VkValidationCacheCreateInfoEXT structure passed to vkCreateValidationCacheEXT.

Two calls to vkGetValidationCacheDataEXT with the same parameters must retrieve the same data unless a command that modifies the contents of the cache is called between them.

Applications can store the data retrieved from the validation cache, and use these data, possibly in a future run of the application, to populate new validation cache objects. The results of validation, however, may depend on the vendor ID, device ID, driver version, and other details of the device. To enable applications to detect when previously retrieved data is incompatible with the device, the initial bytes written to pData must be a header consisting of the following members:

Table 11. Layout for validation cache header version VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT
Offset Size Meaning

0

4

length in bytes of the entire validation cache header written as a stream of bytes, with the least significant byte first

4

4

a VkValidationCacheHeaderVersionEXT value written as a stream of bytes, with the least significant byte first

8

VK_UUID_SIZE

a layer commit ID expressed as a UUID, which uniquely identifies the version of the validation layers used to generate these validation results

The first four bytes encode the length of the entire validation cache header, in bytes. This value includes all fields in the header including the validation cache version field and the size of the length field.

The next four bytes encode the validation cache version, as described for VkValidationCacheHeaderVersionEXT. A consumer of the validation cache should use the cache version to interpret the remainder of the cache header.

If pDataSize is less than what is necessary to store this header, nothing will be written to pData and zero will be written to pDataSize.

Valid Usage (Implicit)
  • device must be a valid VkDevice handle

  • validationCache must be a valid VkValidationCacheEXT handle

  • pDataSize must be a valid pointer to a size_t value

  • If the value referenced by pDataSize is not 0, and pData is not NULL, pData must be a valid pointer to an array of pDataSize bytes

  • validationCache must have been created, allocated, or retrieved from device

Return Codes
Success
  • VK_SUCCESS

  • VK_INCOMPLETE

Failure
  • VK_ERROR_OUT_OF_HOST_MEMORY

  • VK_ERROR_OUT_OF_DEVICE_MEMORY

Possible values of the second group of four bytes in the header returned by vkGetValidationCacheDataEXT, encoding the validation cache version, are:

// Provided by VK_EXT_validation_cache
typedef enum VkValidationCacheHeaderVersionEXT {
    VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT = 1,
} VkValidationCacheHeaderVersionEXT;
  • VK_VALIDATION_CACHE_HEADER_VERSION_ONE_EXT specifies version one of the validation cache.

To destroy a validation cache, call:

// Provided by VK_EXT_validation_cache
void vkDestroyValidationCacheEXT(
    VkDevice                                    device,
    VkValidationCacheEXT                        validationCache,
    const VkAllocationCallbacks*                pAllocator);
  • device is the logical device that destroys the validation cache object.

  • validationCache is the handle of the validation cache to destroy.

  • pAllocator controls host memory allocation as described in the Memory Allocation chapter.

Valid Usage
  • If VkAllocationCallbacks were provided when validationCache was created, a compatible set of callbacks must be provided here

  • If no VkAllocationCallbacks were provided when validationCache was created, pAllocator must be NULL

Valid Usage (Implicit)
  • device must be a valid VkDevice handle

  • If validationCache is not VK_NULL_HANDLE, validationCache must be a valid VkValidationCacheEXT handle

  • If pAllocator is not NULL, pAllocator must be a valid pointer to a valid VkAllocationCallbacks structure

  • If validationCache is a valid handle, it must have been created, allocated, or retrieved from device

Host Synchronization
  • Host access to validationCache must be externally synchronized