Welcome to the “UC International Technology” public account, we will provide you with the client, server, algorithm, testing, data, front-end and other related high-quality technical articles, not limited to original and translation.
background
In 2016, Khronos Group released the Vulkan API, primarily for Android, with similar advantages.
Each of these modern 3D graphics apis uses shaders, and WebGpus are no exception. Shaders are programs that take advantage of gPU-specific architectures. In particular, the GPU is superior to the CPU in heavy parallel numerical processing. To take advantage of both architectures, modern 3D applications use hybrid designs that use cpus and Gpus to accomplish different tasks. By leveraging the best features of each architecture, the modern graphics API provides developers with a powerful framework for creating complex, rich, and fast 3D applications. Metal Shading Language is used for Metal, HLSL for Direct3D 12, and Spir-V or GLSL for Vulkan.
Language requirements
It needs to specify language specifications explicitly. The language specification must specify whether every possible string is a valid program. As with all other Web formats, the Web’s coloring language must be specified precisely to ensure interoperability between browsers.
It needs to be translated into Metal Shading Language, HLSL (or DXIL) and SPir-V. This is because WebGPU is designed to work on Metal, Direct3D 12, and Vulkan at the same time, so shaders need to be able to be represented in a form acceptable to each of the above apis.
It needs to evolve using the WebGPU API. WebGPU features such as binding models and surface subdivision models interact deeply with the coloring language. While it is possible to use a language independent of API development, using the WebGPU API and the coloring language in the same forum ensures shared goals and simplifies development.
The second part is that language should be human readable. The culture of the Web is that anyone can start writing a Web page with a text editor and browser. The democratization of content is one of the Web’s greatest strengths. This culture has created a rich ecosystem of tools and reviewers, and tinkerers can use View-Source to investigate how any web page works. A human-readable language with a single specification will greatly assist the community in adopting the WebGPU API.
Similarly, using bytecode formats such as WebAssembly does not avoid the need for browsers to optimize source code. Every major browser runs optimizations on bytecode before executing. Unfortunately, the desire for simpler compilers never ended.
A new language? Is it true?
Metal Shading Language is very similar to C++, which means it has all the features of bit conversion and raw Pointers. It’s very powerful; You can even compile the same source code for the CPU and GPU. Porting existing CPU-side code to Metal Shading Language is very easy. Unfortunately, all of these capabilities have some drawbacks. For example, in Metal Shading Language, you can write a shader that converts the pointer to an integer, add 17, cast it back to the pointer, and then unreference it. This is a security issue because it means that the shader can access any resource that happens to be in the application’s address space, contrary to the Web’s security model. Theoretically, it would be possible to specify a Metal Shading Language without primitive Pointers, but Pointers are so basic to C and C++ languages that the results would be completely alien. C++ also relies heavily on undefined behavior, so any effort to fully specify C++ ‘s many features is unlikely to succeed.
GLSL is the language used by WebGL and is used by WebGL for the Web platform. However, interoperability across browsers is extremely difficult to achieve due to GLSL compiler incompatibility. GLSL is still under investigation due to long-standing security and portability errors. Besides, GLSL is coming of age. Its limitations lie in its lack of pointer-like objects, or the ability to have variable-length arrays. Its inputs and outputs are global variables with hard-coded names.
Second, Spir-v contains more than 50 optional features whose implementation is selectively supported, so shader authors using Spir-v do not know if their shader will work on a WebGPU implementation. This is the opposite of the one-write run feature of the Web.
WebAssembly is another familiar possibility, but it also doesn’t map well to the ARCHITECTURE of the GPU. For example, WebAssembly assumes a dynamically sized heap, but A GPU program can access multiple dynamically sized buffers. There is no recompilation, no high-performance way to map between the two models.
WHLSL
VSParticleDrawOut output;
output.pos = g_bufPosVelo[input.id].pos.xyz;
float mag = g_bufPosVelo[input.id].velo.w / 9;
output.color = lerp(float4(1.0f, 0.1f, 0.1f, 1.0f), input.color, mag);
return output;Copy the code
floatIntensity = 0.5 f-length (float2 (0.5 f to 0.5 f) - input. Tex); Intensity = clamp(intensity, 0.0F, 0.5F) * 2.0F;return float4(input.color.xyz, intensity);Copy the code
basis
As in HLSL, the raw data types are bool, int, uint, float, and half. Double types are not supported because they do not exist in Metal and software emulation is too slow. Bool has no specific bit representation and therefore cannot appear in shader input/output or resources. The same limitation exists in Spir-v, and we want to be able to use OpTypeBool in the generated Spir-v code. WHLSL also includes the smaller integer types char, uchar, short and USHORT, which can be used directly in Metal Shading Language and can be specified in Spir-v by specifying 16 in OpTypeFloat, And it can be simulated in HLSL. These types of emulation are faster than double emulation because the types are smaller and their bit representation is less complex.
As in HLSL, WHLSL has vector and matrix types, such as FLOAT4 and INT3x4. We chose to keep the library simple rather than add a bunch of “x1” single-element vectors and matrices, because single-element vectors can already be represented as scalars, and single-element matrices as vectors. This is consistent with the desire to eliminate implicit conversions and requires explicit conversions between float1 and float, which is cumbersome and unnecessarily verbose.
int a = 7;
a += 3;
float3 b = float3 (float(a) * 5, 6, 7);
float3 c = b.xxy;
float3 d = b * c;Copy the code
One difference between WHLSL and C is that WHLSL initializes all uninitialized variables at its declaration site at zero. This prevents non-portable behavior across operating systems and drivers — or even worse, reading any values of the page before the shader starts executing. This also means that all constructible types in WHLSL have zero values.
The enumeration
enum Weekday {
Monday,
Tuesday,
Wednesday,
Thursday,
PizzaDay
}Copy the code
structure
struct Foo {
int x;
float y;
}Copy the code
Like other coloring languages, arrays are value types that pass and return functions by value (also known as “copy-in copy-out,” similar to regular scalars). One can be created using the following syntax:
int[3] x;Copy the code
-
Having all type information in one place makes the parser simpler (avoid the clockwise/spiral rule)
-
Avoid ambiguity when declaring multiple variables in a single statement (e.g. Int [10] x, y;)
One of our key ways to ensure language security is to perform boundary checking on each array access. We make this potentially expensive operation efficient in a number of ways. The array index is uint, which reduces the check to a single comparison. Arrays have no sparse implementation and contain a length member that is available at compile time, making access costs close to zero.
To meet safety requirements, WHLSL uses safety Pointers that are guaranteed to point to valid or invalid Pointers. As with C, you can use the & operator to create Pointers to values to the left and the * operator to dereference them. Unlike C, you can’t index by pointer – if it’s an array. You cannot convert it to a scalar value, nor can you use a specific bit-pattern representation. Therefore, it cannot exist in buffers or as shader input/output.
The device address space corresponds to most of the memory on the device. The memory is read-write and corresponds to the out-of-order access view in Direct3D and the device memory in Metal Shading Language. A constant address space corresponds to a read-only area of memory and is typically optimized for data broadcast to each thread. Therefore, writing an lvalue that exists in a constant address space is a compilation error. Finally, the thread group address space corresponds to a region of memory that can be read and written, which is shared between each thread in the thread group. It can only be used to compute shaders.
int i = 4;
thread int* j = &i;
*j = 7;
// i is now 7Copy the code
thread int* i;Copy the code
An array reference
They correspond to the OpTypeRuntimeArray type in SPIR-V and one of the buffers, rwBuffers, structuredBuffers, or rwstructuredBuffers in HLSL. In Metal, it is represented as a pointer and a tuple of length. Just like array access, all operations are checked against the length of the array reference. Buffers are passed to the API’s entry point either by array references or Pointers.
int i = 4;
thread int[] j = @i;
j[0] = 7;
// i is 7
// j.length is 1Copy the code
int i = 4;
thread int* j = &i;
thread int[] k = @j;
k[0] = 7;
// i is 7
// k.length is 1Copy the code
int[3] i = int[3](4, 5, 6);
thread int[] j = @i;
j[1] = 7;
// i[1] is 7
// j.length is 3Copy the code
function
float4 lit(float n_dot_l, float n_dot_h, float m) {
float ambient = 1;
float diffuse = max(0, n_dot_l);
float specular = n_dot_l < 0 || n_dot_h < 0 ? 0 : n_dot_h * m;
float4 result;
result.x = ambient;
result.y = diffuse;
result.z = specular;
result.w = 1;
return result;
}Copy the code
Operator and operator overloading But there’s something else going on here, too. When the compiler sees n_dot_h * m, it essentially doesn’t know how to perform the multiplication. Instead, the compiler converts it to a call to operator(). Then, the standard function overload decision algorithm selects specific operators for execution. This is important because it means you can write your own operator*() function and teach WHLSL how to multiply your own types.
int operator++(int value) {
return value + 1;
}Copy the code
Operator overloading is used throughout the language. That’s how you do vector and matrix multiplication. That’s the way arrays are indexed. This is how the hybrid operator works. Operator overloading provides power and simplicity; The core language does not have to know each operation directly because they are implemented by overloaded operators.
Generate properties
Getters
float3 operator.xxy(float3 v) {
float3 result;
result.x = v.x;
result.y = v.x;
result.z = v.y;
return result;
}Copy the code
Setters
float4 operator.xyz=(float4 v, float3 c) {
float4 result = v;
result.x = c.x;
result.y = c.y;
result.z = c.z;
return result;
}Copy the code
float4 a = float4 (1, 2, 3, 4); a.xyz =float3 (7, 8, 9);Copy the code
Anders
thread float* operator.r(thread Foo* value) {
return &value->x;
}Copy the code
Indexers
float operator[](float2 v, uint index) {
switch (index) {
caseZero:return v.x;
case 1:
return v.y;
default:
/* trap or clamp, more on this below */
}
}
float2 operator[]=(float2 v, uint index, float a) {
switch (index) {
case 0:
v.x = a;
break;
case 1:
v.y = a;
break;
default:
/* trap or clamp, more on this below */
}
return v;
}Copy the code
The standard library
One of the design principles of WHLSL is to keep the language itself small so that it can be defined in the standard library as much as possible. Of course, not all functions in the library can be represented in WHLSL (such as the bool operator * (float, float)), but almost all functions are implemented in WHLSL. For example, this function is part of the standard library:
float smoothstep(float edge0, float edge1, float x) {
float t = clamp((x - edge0) / (edge1 - edge0), 0, 1);
return t * t * (3 - 2 * t);
}Copy the code
Not every feature in the HLSL standard library exists in WHLSL. For example, HLSL supports printf(). However, implementing such functionality in Metal Shading Language or Spir-V would be very difficult. We include as many functions as possible in the HLSL standard library, which makes sense in a Web environment.
Variable Lifetime
thread int* foo() {
int a;
return&a; }... int b = *foo();Copy the code
This means that this WHLSL snippet is fully valid and well-defined for two reasons:
This global lifecycle is only possible because recursion is not allowed (which is common for coloring languages), which means there are no reentrant problems. Similarly, shaders cannot allocate or free memory, so the compiler knows at compile time every block of memory that the shader may access.
thread int* foo() {
int a;
return&a; }... thread int* x = foo(); *x = 7; thread int* y = foo(); // *x equals 0, because the variable got zero-filled again *y = 8; // *x equals 8, because x and y point to the same variableCopy the code
Compilation phase
WHLSL is designed for two-stage compilation. In our research, we found that many 3D engines want to compile large shaders, and each compilation includes a large library of functions that are repeated between compilations. Rather than compiling these support functions multiple times, a better solution would be to compile the entire library at once and then allow the second stage to choose which entry points in the library should be used together.
The second compilation phase also provides convenient locations for specifying specialized constants. Recall that WHLSL does not have a preprocessor, which is the traditional way to enable and disable functionality in HLSL. The engine typically customizes a single shader for a particular situation by enabling render effects or by toggle BRDF with a flip switch. The technique of including each render option in a single shader, and specifically setting a single shader based on which effect is enabled, is so common that it has a name: UberShaders. WHLSL programmers can use specialized constants instead of preprocessor macros, which work in the same way as the specialized constants of SPir-V. From a language point of view, they are just scalar constants. However, the values of these constants are provided during the second compilation phase, which makes it very easy to configure the program at run time.
Compute void ComputeKernel(device uint[] b: register(u0)) {... }Copy the code
security
Another way WHLSL implements security is by performing boundary checking for array/pointer access. These boundary checks may take three forms:
2. The Clamping. An array index operation can limit an index to the size of the array. There is no new control flow involved, so it has no effect on uniformity. You can even “clap” pointer access or zero-length array access by ignoring writes and returning 0 for reads. This is possible because there is a limit to what you can do with the pointer in WHLSL, so we can simply have each operation do something clearly defined with a “Clamped” pointer. Hardware and driver support. Some hardware and drivers already include a mode in which out-of-bounds access does not occur. With this approach, the mechanism by which the hardware prohibits out-of-bounds access is implementation-defined. An example is the ARB_robustness OpenGL extension. Unfortunately, WHLSL should run on almost all modern hardware, and there are not enough apis/devices to support these modes.
To determine the optimal behavior of boundary checking, we performed some performance experiments. We took some of the kernels used in the Metal Performance Shaders framework and created two new versions: one using CLAMP and one using Trap. The kernels we chose were those that do a lot of array access: multiply by large matrices, for example. We ran this benchmark on a variety of devices with different data sizes. We made sure that no traps were actually hit, and that no clamp actually had any effect, so we could be sure that we were measuring the common case of properly written programs.
-
Built-in variables, such as uint vertexID: SV_VertexID
-
Specialization constants, such as uint Numlights: specialization
-
Phase input/output semantics, e.g. Float2 Coordinates: property (0)
-
Resource semantics, such as Device float [] Coordinates: register (u0)
To accommodate this, the shader’s return value can be a structure, and the fields are handled independently. In fact, this works recursively – a structure can contain another structure whose members can also be handled independently. The nested structure is flattened, and all unstructured fields are collected and output as shaders.
After flattening all of these structures into a set of inputs and a set of outputs, each item in the collection must have semantics. Each built-in variable must have a specific type and can only be used in a specific shader phase. Specialized constants must have only simple scalar types.
HLSL programmers should be familiar with resource semantics. WHLSL includes resource semantics and address Spaces, but the two serve different purposes. The address space of a variable is used to determine which cache and memory hierarchy should be accessed within it. The address space is necessary because it exists even through pointer operations; Device Pointers cannot be set to point to thread variables. In WHLSL, resource semantics are used only to identify variables in the WebGPU API. However, in order to be consistent with HLSL, the resource semantics must “match” the address space of the variables it places. For example, you can’t place a register (s0) on texture. You cannot place register (u0) on a constant resource. Arrays in WHLSL have no address space (because they are value types, not reference types), so if an array is displayed as a shader parameter, it is treated as a device resource for matching semantics.
The ‘logical mode’ limits WHLSL’s design requirements to be compatible with Metal Shading Language, SPir-V and HLSL (or DXIL). Spir-v has many different modes of operation, targeting different embedding apis. Specifically, we were interested in the taste of Pir-v, which Vulkan targets.
Because WHLSL needed to be compatible with PIR-V, WHLSL had to be more expressive than Pir-V. As a result, WHLSL has some limitations in the Pir-V logical mode that make it expressible. These limitations did not surface as an optional mode for WHLSL; Rather, they are part of the language itself. Eventually, we hope to remove these restrictions in future language versions, but until then, the language is limited.
But not so fast! Recall that thread variables have a global life cycle, which means they behave as if they were declared at the beginning of an entry point. What if the runtime collects all these local variables together, sorts them by type, and aggregates all variables of the same type into an array? The pointer can then simply be an offset of the appropriate array. In WHLSL, Pointers cannot be redirected to different types, which means that the compiler statically determines the corresponding array. Therefore, thread Pointers do not need to comply with the above restrictions. However, this technique does not work for Pointers in other address Spaces; It only applies to thread Pointers.
resources
Depth Textures are different from non-depth textures because they are different types of Metal Shading Language, so the compiler needs to know which one to issue when issuing the Metal Shading Language. Textures sampling is not like texture.Sample(…) because WHLSL does not support member functions ; Instead, it uses things like Sample(texture,…) The free function does that.
The WebGPU API will automatically issue some resource barriers at certain locations, which means that the API needs to know which resources are being used in the shader. Therefore, an “unconstrained” resource model cannot be used. This means that all resources are listed as explicit input to the shader. Similarly, the API wants to know which resources are used for reading and which resources are used for writing; The compiler knows this statically by checking the program. “Const” has no language-level support, or there is no difference between StructuredBuffer and RWStructuredBuffer because the information already exists in the program.
The current progress
The future direction
For the first proposal, we want to satisfy the constraints outlined at the beginning of this article while providing ample opportunity to extend the language. A natural evolution of languages can add facilities for type abstraction, such as protocols or interfaces. WHLSL contains simple structures with no access control or inheritance. Other coloring languages such as Slang model type abstraction serve as a set of methods that must exist within a structure. However, Slang ran into a problem in that it could not make existing types conform to the new interface. Once a structure is defined, you cannot add new methods to it; Curly braces close the structure forever. This problem is solved by extensions, similar to Objective-C or Swift, that retroactively add methods to structures after they have been defined. Java solves this problem by encouraging authors to add new classes (called adapters) that exist only in the implementation interface and connect each call to the implementation type.
conclusion
Please join us! We’re doing this on the WebGPU GitHub project. We’ve been working on a formal specification of the Language, a reference compiler for issuing Metal Shading Language and Spir-V, and a CPU-side interpreter for verifying correctness. We welcome you to give it a try and let us know how it works!
English text: https://webkit.org/blog/8482/web-high-level-shading-language/
UC International TechnologyFollow our official account, or share this article with your friends