Metal Shading Language Is the Shading Language for iOS and OpenGL ES & Metal. What should I pay attention to when writing Metal
Metal Shading Language
1. Definition and function
Metal coloring language is a programming language used to write 3D graphics rendering logic and parallel computing core logic. You need to use the Metal programming language when using the Metal framework to implement your APP.
The Metal language uses Clang and LLVM for compilation processing
Metal is designed based on C++ 11.0 language. It is mainly used to write image rendering logic code and general parallel computing logic code executed on GPU.
2. Pay attention
-
Restrictions on Pointers:
- Metal graphics and parallel calculation function used in the input parameter; Address space modifiers (device, Threadgroup, constant) must be used for Pointers
- Function Pointers are not supported
- The function name cannot appear as main
-
The origin of the Metal pixel coordinate system is the upper left corner of the Metal texture/Frame attachment pixel.
3. Differences between Metal and C++ 11.0
Although Metal is designed based on the C++ 11.0 language, there are some C++ 11.0 feature syntax that is not supported in Metal:
- Lambda expressions
- Recursive function calls
- Dynamic conversion operator
- Type recognition
- Object creates new and destroys the DELETE operator
- Operator noexcept
- Goto jump
- Variables store modifiers register and thread_local
- Virtual function modifier
- Derived classes
- Exception handling
- The C++ standard library is also not available in Metal
2. Basic data types
Unlike the GLSL language, float types are allowed to be followed by an f or f. Similarly, an h or h is allowed after the half type.
1, the scalar
- Notice that the unsigned modifier can be abbreviated as u. Example: unsigned char short for uchar
bool a = true; char b = 5; int c = 15; Size_t d = 1; ptrdiff_t e = 2;Copy the code
2, vector
Booln, CHARn, shortn, INTN, UCHARn, UShorTN, Uintn, Halfn, Floatn (n refers to dimensions 1~4)
// Initialize type 1: Bool2 A = {1,0}; Float4 pos = {1.0, 2.0, 3.0, 4.0}; // initialize type 2: bool2, bool2 A = bool2(1,0); Float4 pos = float4 (1.0, 2.0, 3.0, 4.0); Float x = pos[0]; float y = pos[1]; // Assign float4 VB to the for loop; for(int i = 0; i < 4 ; I ++) {VB[I] = pos[I] * 2.0f; }Copy the code
There are only two sets of vector components (XYZW, RGBA) used to get elements, but note: a single set of values can be out of order, not two sets of mixed use, assignment of components can not be repeated, value of components can be repeated
Int4 test = int4(0,1,2,3); int a = test.x; int b = test.y; int c = test.z; int d = test.w; int e = test.r; int f = test.g; int g = test.b; int h = test.a; float4 c; C.x yzw = float4 (1.0 f, f 2.0, 3.0 f, 4.0 f); C.z = 1.0 f; C.x y = float2 (4.0 3.0 f, f); C.x z = float3 (5.0 3.0 f, 4.0 f, f); Float4 pos = float4 (1.0 f, f 2.0, 3.0 f, 4.0 f); float4 swiz = pos.wxyz; / / swiz = (4.0, 1.0, 2.0, 3.0); float4 dup = pos.xxyy; / / dup = (1.0 f, 1.0 f, f 2.0, 2.0 f); / / pos = (f 5.0, 2.0, 3.0, 6.0) pos. Xw = float2 (5.0 f to 6.0 f); / / pos = (8.0 f, 2.0 f, f 3.0, 7.0 f) pos. Wx = float2 (7.0 f to 8.0 f); / / pos = (3.0 f, 5.0 f, f 9.0, 7.0 f); Pos. Xyz = float3 (3.0 f, f 5.0, 9.0 f); Pos. Xx = float2(2.0f,3.0f); // Invalid, cannot be consecutively assignedCopy the code
Be careful not to cross the line
float2 pos; Pos. X = 1.0 f; // Pos.z = 1.0f; Float3 pos2; float3 pos2; float3 pos2 Pos2. Z = 1.0 f; // pos2.w = 1.0f; // Invalid, only 3 elements are defined, the fourth element is out of boundsCopy the code
Name all the ways you can construct a two-dimensional vector, a three-dimensional vector, a four-dimensional vector
// All possible constructs of a float2 type vector float2(float x); float2(float x,float y); float2(float2 x); // All possible constructs of a float3 type vector float3(float x); float3(float x,float y,float z); float3(float a,float2 b); float3(float2 a,float b); float3(float3 x); // All possible constructs of a float4 type vector float4(float x); float4(float x,float y,float z,float w); float4(float2 a,float2 b); float4(float2 a,float b,float c); float4(float a,float2 b,float c); float4(float a,float b,float2 c); float4(float3 a,float b); float4(float a,float3 b); float4(float4 x);Copy the code
3, matrix
Halfnxm and FLOATNXM (NXM refers to the number of rows and columns of the matrix respectively)
// Define a 4x4 matrix m note that 4 rows and 4 columns are m[3][3] float4x4 m; // set the first line/first column to 1.0f m[0][0] = 1.0f; M [1] = float4(2.0f); [2][3] = 3.0f;Copy the code
Texture type and sampler type
1. Texture types
The texture type is a handle (ID) that points to a 1D / 2D / 3D texture data. Is a defined enumeration value:
Enum class access {sample, // Texture objects can be sampled. Sampling one-dimensional this is reading data from a texture with or without a sampler; Read, // Without using the sampler, a graphics rendering function or a parallel computing function can read the texture object write // A graphics rendering function or a parallel computing function can write data to the texture object}; // the sample is readable, writable and sampledCopy the code
3 textures: texture1d
texture2d
texture3d
T: Data type Sets the color type to read from or write to the texture. T can be half, float, short, int, etc
Code examples:
/* Type variable modifier type: Texture2d <float,access::read> texture2D <float,access::read> Read texture2D <float,access::write> read texture2D <float,access::write> read variable imgA imgB imgC modifier: Void foo (texture2d<float> imgA [[texture(0)]], texture2d<float> imgA [texture(0)]] texture2d<float, access::read> imgB [[ texture(1) ]], texture2d<float, access::write> imgC [[ texture(2) ]]) { ... }Copy the code
2. Samplers type
The receiver type determines how a texture is sampled. In the Metal framework, there is an object MTLSamplerState corresponding to the sampler of the shader language. This object is passed as the parameter of the graph rendering shader function or the parameter of the parallel calculation function.
- Whether texture coordinates need to be normalized when sampling from a texture
enum class coord { normalized, pixel };
Copy the code
- Texture sampling filter mode, zoom in/out filter mode
enum class filter { nearest, linear };
Copy the code
- Sets the zoom filter mode for texture sampling
enum class min_filter { nearest, linear };
Copy the code
- Set the zoom filter mode for texture sampling
enum class mag_filter { nearest, linear };
Copy the code
- Sets the addressing mode for all texture coordinates
enum class address { clamp_to_zero, clamp_to_edge, repeat, mirrored_repeat };
Copy the code
- Set addressing mode for texture s, T,r coordinates
enum class s_address { clamp_to_zero, clamp_to_edge, repeat, mirrored_repeat };
enum class t_address { clamp_to_zero, clamp_to_edge, repeat, mirrored_repeat };
enum class r_address { clamp_to_zero, clamp_to_edge, repeat, mirrored_repeat };
Copy the code
- Set mipMap filtering mode for texture sampling (if none, then only one texture layer works)
enum class mip_filter { none, nearest, linear };
Copy the code
Note: Samplers initialized in the Metal program must be declared using a CONSTExpr modifier. Code examples:
// Set normalization, set addressing mode, filter mode constexpr sampler s (Coord :: Pixel, address:: Clamp_to_zero, filter:: Linear); // ConstExpr Sampler A (coord::normalized) can be stored only if the default values are used. constexpr sampler b ( address::repeat );Copy the code
Function modifiers
Metal has the following three function modifiers: kernel, vertex, and fragment.
1, the kernel
Indicates that the function is a data parallel computation coloring function. It can be assigned to execute in 1d / 2D / 3D thread groups
kernel void foo(...) {... }Copy the code
2, vertex
Indicates that this function is a vertex shader function, which will execute once for each vertex in the vertex data stream and then generate data output for each vertex to draw the pipeline;
3, fragments
Represents that the function is a slice coloring function, which will execute once for each slice in the slice metadata stream and its association, and then output the color data generated by each slice to the drawing pipeline;
Note:
- By the above three modifiers
Modified function
Cannot be called by the same three modifiersModified function
Otherwise it will be directCompilation fails
. Normal functions can be called. - A function decorated with kernel. its
The return value type must be void
- Only graph shaders can be modified by vertex and fragment, and the return value type identifies whether it evaluates for a vertex or per pixel
- The graph shader function can return void, but this means that it does not produce data output to the drawing pipeline; This is a meaningless action
Address space modifiers (for variables or parameters)
There are four address space modifiers: Device, Threadgrounp, constant, and Thread
Address space modifiers are used to indicate which memory area a function variable or parameter variable is allocated to
- All shader functions (vertex, fragment, kernel) arguments, if Pointers or references, must carry address space modifier
- For graph shader functions, the parameter of pointer or reference type must be defined as device or constant address space
- For parallel computed shader functions, the pointer or reference type parameter must be defined as device or Threadgrounp or constant address space
1, Device Address Space
Device (device address space) refers to a cache object allocated by the device memory pool (here the device refers to the graphics memory, GPU). It is both readable and writable. A cache object can be declared as a scalar, vector, user-defined structure, or pointer to a structure
- Put it in video memory for faster reading
- Texture objects, without device modifications, are also placed in video memory by default
//1. Modify pointer variables. Device Float4 *color; //2, modify the structure pointer. Struct Foo {float a[3]; struct Foo {float a[3]; int b[2]; }; device Foo *my_info;Copy the code
Threadgrounp Address Space Address Space of the thread group
Threadgrounp (thread group address space) is used to allocate memory variables for parallel computation shaders. These variables are shared by all threads in a thread group. Variables allocated by the online program group address space cannot be used in the graph drawing shader function (vertex shader function, slice shader function), that is, threadgrounp variables cannot be modified in the vertex, slice shader function
In parallel computing shader functions, variables allocated in the thread group address space are used by a thread group, and the declaration period is the same as that of the thread group
*/ kernel void my_func(threadgroup float *a [[threadgroup(0)]],...) {// assign a variable of type x threadgroup float x to the threadgroup address space; // Allocate an array y of 10 floating-point numbers in the thread group address space; threadgroup float y[10]; }Copy the code
3, Constant Address Space
- The cache object that constant (constant address space) points to is also allocated storage from the device memory pool (video memory), but it is
read-only
- In the program domain, the constant modifier variable,
Must be initialized and assigned at declaration time
- In the program domain, constant modifies the value of a variable
The life cycle is the same as the program
In the program, the parallel calculation of the shader function or the graph-drawing shader function call, but the value of constant remains the same (i.eOnce initialized it is a constant and cannot be changed
)
Note: Constant variables that are not assigned or modified later will generate a compilation error
✅ constant float samples[] = {1.0f, 2.0f, 3.0f, 4.0f}; ❌ sampler[4] = {3,3,3,3}; // error ❌ no assignment, direct compilation failure constant float a;Copy the code
4, Thread Address Space
Thread refers to the address space prepared by each thread. The address space of this thread defines variables that are not visible to other threads. The variable thread address space allocation is declared in the graphing shader function or in the parallel calculation shader function
- Threadgroup modifies variables that can be shared between threads. Thread, thread, thread, thread
- In shader functions, instead of threadgroups, you can use threads
kernel void my_func(...) {// Allocate space to x in thread space,p float x; thread float p = &x; }Copy the code
Pass modifiers (function parameters and variable attribute modifiers)
Input and output of shader functions in graph drawing or parallel computation are passed as parameters. In addition to constant address space variables and domain-defined samplers, other parameters can be modified by one of the following five property modifiers:
-
Device Buffer: A pointer/reference to any data type in the device address space
-
Constant buffer: A constant buffer, a pointer/reference to any data type in the constant address space
-
Texture: Texture object
-
Sampler: Sampler object
-
ThreadGroup: a cache shared by threads within a threadGroup
So why do you need attribute modifiers?
- Parameter indicates the location of a resource. It can be understood as a port, which is equivalent to the location in OpenGl ES
- Pass built – in variables in fixed and programmable pipelines
- Pass data from the vertex function to the chip function along the render pipeline
For each shader function, a modifier must be specified to set the location of a cache, texture, and sampler. The transfer modifier is written as follows:
- device buffer —> [[buffer(index)]]
- constant buffer —> [[buffer(index)]]
- texture —> [[texture(index)]]
- sampler —> [[sampler(index)]]
- threadGroup —> [[threadGroup(index)]]
Index can be specified by the developer. It can be a value of type unsigned Interger or a custom enumeration value (in a bridging. H file) that represents the position of a cache, texture, or sampler parameter in the function parameter index table. Attribute modifiers are usually placed after variable names. How to behave in code: [[buffer(index)]] [[buffer(index)]] [[buffer(index)]]] [[texture(index)]] [[texture(index)]] [[texture(index)]] [[sampler(index)]] [[sampler(index)]] [sampler(index)]] Known conditions: threadgroup Object (thread group Object) modified variable code: variable to write back [[threadgroup (index)]] reading: same threadgroup, index can be specified by the developer.Copy the code
Code sample
// The attribute modifier "[[buffer(index)]]" sets the cache position for the shader parameter /* kernel: void: returns the value of the shader function (kernel, The return value must be void) add_Vectors: const Device Float4 *inA [[buffer(0)]] : const: Float4: InA: variable name [[buffer(0)]] : id thread_position_in_grid: It is used to indicate the position of the current node in a multithreaded grid, and does not need to be passed by the developer. */ kernel void add_vectors(const device float4 *inA [[ buffer(0) ]], const device float4 *inB [[ buffer(1) ]], device float4 *out [[ buffer(2) ]], uint id [[ thread_position_in_grid ]]) { out[id] = inA[id] + inB[id]; }Copy the code
Common built-in variable attribute modifiers
[[vertex_id]]
: vertex ID identifier, not passed by the developer[[position]]
:- In a vertex function, theta
Vertex information
And the type is float4 - In a slice function, represents
The relative window coordinates of the slice
(x,y,z,w), which is the location of pixels on the screen
- In a vertex function, theta
[[point_size]]
: Size of the dot, type float[[color(m)]]
: color, m must be determined before compilation[[stage_in]]
: Output data from the vertex function after rasterization into the incoming chip function. Note that only one stage_in-modified parameter can be declared, either in a vertex or a fragment function. This parameter can also be a structure whose member types can be integers/floating-point scalars/vectors
Struct MyFragmentOutput {// Color attachment 0 float4 clr_f [[color(0)]]; [[color(1)]] [color(1)]]; // Uint4 clr_ui [[color(2)]]; }; fragment MyFragmentOutput my_frag_shader( ... ) { MyFragmentOutput f; . f.clr_f = ... ; . return f; }Copy the code