DirectXMath is a math library developed by Microsoft. Its API provides C++ types and functions that support SIMD (single instruction multiple data). To perform some common linear algebra and graphics-related operations in DirectX applications. For Windows 8 and later, DirectXMath is part of the Windows SDK and supports the SSE2 (Streaming SIMD Extensions 2) instruction set.

This article briefly introduces some aspects of Vector in DirectXMath, including vector types, compiler optimization, and common function operations.

SIMD and SSE2

SIMD (Single Instruction Multiple Data) is a technology that adopts a controller to control multiple processing units to perform the same operation on each data point in a set of data (also known as “data vector”) at the same time, thus realizing the parallelism in space. Examples of microprocessors are Intel’s MMX, SSE and AMD’s 3D Now! Instruction set, etc.

SSE2 (Streaming SIMD Extensions 2) is a SIMD instruction set of IA-32 architecture. SSE2 was introduced in 2001 with the release of Intel’s first-generation Pentium 4 processor. It extends the earlier SSE instruction set and is intended to completely replace the MMX instruction set.

Using the 128-bit SIMD registers, SSE2 can operate on up to four 32-bit floats or ints simultaneously with one instruction. Vectors in graphics applications generally do not exceed four dimensions, so SIMD instruction can improve the performance of graphics applications.

Type and compiler optimization

In order to take advantage of the performance gains that SIMD brings, there are some compile, operating system, and hardware related issues to be aware of when using the DirectXMath library.

Visual Studio compiler configuration

Since we’re going to use DirectXMath from Microsoft, developing on Windows is of course the first IDE in the universe — Visual Studio (I’m using Visual Studio 2017 Community).

For x86 platforms, we need to enable the SSE2 instruction set:

Project Properties > Configuration Properties > C/C++ > Code Generation > Enable Enhanced Instruction Set) > Streaming SIMD Extensions 2 (/arch:SSE2)
Copy the code

On the X64 platform, there is no need to enable the SSE2 instruction set because all X64 cpus support the SSE2 instruction set.

In addition, we should enable the fast Floating Point Model/FP :fast: for all platforms.

Project Properties > Configuration Properties > C/C++ > Code Generation > Floating Point Model > Fast (/fp:fast)
Copy the code

Vector types and memory alignment

In DirectXMath, the core vector type is XMVECTOR, which corresponds to a SIMD register. It is a 128-bit type and can handle four 32-bit float data at the same time. It is defined as follows:

typedef __m128 XMVECTOR
Copy the code

__m128 is a SIMD type, and we must use this type for vector calculation to take advantage of SIMD.

Note that XMVECTOR types require 16-byte alignment in memory (Google it if you’re not sure what memory alignment is and why). For global and local variables, the compiler automatically implements memory alignment, but for the class’s data members, using XMFLOAT2, XMFLOAT3, or XMFLOAT4 can cause problems with memory alignment. The XMFLOAT3 type is defined as follows (XMFLOAT2 and XMFLOAT4 are similar, except that the class has a different number of floating-point members) :

struct XMFLOAT3 { float x; float y; float z; XMFLOAT3() {} XMFLOAT3(float _x, float _y, float _z) : x(_x), y(_y), z(_z) {} explicit XMFLOAT3(_In_reads_(3) const float *pArray) : x(pArray[0]), y(pArray[1]), z(pArray[2]) {} XMFLOAT3& operator= (const XMFLOAT3& Float3) { x = Float3.x; y = Float3.y; z = Float3.z; return *this; }};Copy the code

However, using THE XMFLOATn type directly cannot take advantage of SIMD, so we need to convert it to XMVECTOR and perform the operation, and then revert the result back to XMFLOATn. DirectXMath provides Loading and Storage methods for these conversions:

// Loading methods: load data from XMFLOATn into XMVECTOR
XMVECTOR XM_CALLCONV XMLoadFloat2(const XMFLOAT2 *pSource);
XMVECTOR XM_CALLCONV XMLoadFloat3(const XMFLOAT3 *pSource);
XMVECTOR XM_CALLCONV XMLoadFloat4(const XMFLOAT4 *pSource);

// Stroing methods: store XMVECTOR into XMFLOATn
void XM_CALLCONV XMStoreFloat2(XMFLOAT2 *pDestination, FXMVECTOR V);
void XM_CALLCONV XMStoreFloat3(XMFLOAT3 *pDestination, FXMVECTOR V);
void XM_CALLCONV XMStoreFloat4(XMFLOAT4 *pDestination, FXMVECTOR V);
Copy the code

Sometimes we want to manipulate only one component of an XMVECTOR, and we can use the following getter and setter functions:

// Getter functions
float XM_CALLCONV XMVectorGetX(FXMVECTOR V);
float XM_CALLCONV XMVectorGetY(FXMVECTOR V);
float XM_CALLCONV XMVectorGetZ(FXMVECTOR V);
float XM_CALLCONV XMVectorGetW(FXMVECTOR V);

// Setter functions
XMVECTOR XM_CALLCONV XMVectorSetX(FXMVECTOR V, float x);
XMVECTOR XM_CALLCONV XMVectorSetY(FXMVECTOR V, float y);
XMVECTOR XM_CALLCONV XMVectorSetZ(FXMVECTOR V, float z);
XMVECTOR XM_CALLCONV XMVectorSetW(FXMVECTOR V, float w);
Copy the code

Function parameter passing and calling convention

You might be wondering why the function name is preceded by an XM_CALLCONV modifier, and why the parameter type of the function is not XMVECTOR but FXMVECTOR. This is because DirectXMath uses these modifiers and type aliases to guide the compiler to generate efficient object code that leverages SIMD to improve program performance.

Parameter passing

In order to improve efficiency, when the XMVECTOR value is passed to the function as a parameter in the function call, it can be passed directly to the SSE/SSE2 register, rather than stored in the stack memory (if the knowledge of register, stack memory, function call compilation is not clear). The classic Computer Systems: A Programmer’s Perspective (3rd Edition) is recommended for understanding Computer Systems.

However, the number of SSE/SSE2 registers, and the maximum number of parameters that can be passed directly into these registers in a function call, are platform – and compiler-dependent and have different values for different platforms and compilers. To improve portability, we can use the type alias provided by DirectXMath. The XMVECTOR argument is passed as follows:

  1. The first threeXMVECTORParameters usingFXMVECTORType;
  2. The fourthXMVECTORParameters usingGXMVECTORType;
  3. Fifth and sixthXMVECTORParameters usingHXMVECTORType;
  4. The rest extraXMVECTORParameters usingCXMVECTORType.

Note that the above count is for input parameters of type XMVECTOR. Output parameters of type XMVECTOR (references, Pointers) and other parameters are not included in the count and can be ignored from the prototype. Here is an example of a function prototype:

inline XMMATRIX XM_CALLCONV XMMatrixTransformation2D(
    FXMVECTOR ScalingOrigin,
    float ScalingOrientation,
    FXMVECTOR Scaling,
    FXMVECTOR RotationOrigin,
    float Rotation,
    GXMVECTOR Translation
);
Copy the code

Calling Conventions

Above we discussed the corresponding XMVECTOR type aliases that should be used at different parameter locations during function calls, and we mentioned earlier that XMVECTOR is a 16-byte aligned type in memory. To satisfy the needs of these function calls on different platforms and compilers, we need to use the Calling Conventions provided by the appropriate DirectXMath, which uses the modifier XM_CALLCONV before the function name.

XM_CALLCONV has two different definitions for different platforms and compilers: __fastCall and __vectorCall. They are windows-specific modifiers for efficient parameter passing of __m128 values. They determine the definition of aliases for FXMVECTOR, GXMVECTOR, and so on.

__vectorCall is a calling convention supported by newer compilers that can pass a larger number of arguments directly into SSE/SSE2 registers than __fastCall. I’m not sure about some of the details and other questions, but those interested can refer to the documentation provided by Microsoft.

In summary, to improve code portability and platform/compiler independence, we should use the XM_CALLCONV modifier in function calls and use type aliases such as FXMVECTOR and GXMVECTOR as a rule.

Vector constants

For constant instances of XMVECTOR, we should use the XMVECTORF32 type. In fact, we should all use the XMVECTORF32 type when we want to use C++ initialization syntax. Here are some examples:

Static const XMVECTORF32 g_vHalfVector = {0.5f, 0.5f, 0.5f, 0.5f}; XMVECTORF32 vRightTop = {vViewFrust.RightSlope, vViewFrust.TopSlope, 1.0f, 1.0f};Copy the code

XMVECOTRF32 is also a 16-byte aligned structure that can also be converted to XMVECTOR as follows:

__declspec(align(16)) struct XMVECTORF32 { union { float f[4]; XMVECTOR v; }; inline operator XMVECTOR() const { return v; } inline operator const float*() const { return f; } #if ! defined(_XM_NO_INTRINSICS_) && defined(_XM_SSE_INTRINSICS_) inline operator __m128i() const { return _mm_castps_si128(v); } inline operator __m128d() const { return _mm_castps_pd(v); } #endif };Copy the code

A commonly used overloaded operator

XMVECTOR overloads some operators to perform associated vector computations:

XMVECTOR XM_CALLCONV operator+ (FXMVECTOR V);
XMVECTOR XM_CALLCONV operator- (FXMVECTOR V);

XMVECTOR& XM_CALLCONV operator+= (XMVECTOR& V1, FXMVECTOR V2);
XMVECTOR& XM_CALLCONV operator-= (XMVECTOR& V1, FXMVECTOR V2);
XMVECTOR& XM_CALLCONV operator*= (XMVECTOR& V1, FXMVECTOR V2);
XMVECTOR& XM_CALLCONV operator/= (XMVECTOR& V1, FXMVECTOR V2);

XMVECTOR& operator*= (XMVECTOR& V, float S);
XMVECTOR& operator/= (XMVECTOR& V, float S);

XMVECTOR XM_CALLCONV operator+ (FXMVECTOR V1, FXMVECTOR V2);
XMVECTOR XM_CALLCONV operator- (FXMVECTOR V1, FXMVECTOR V2);
XMVECTOR XM_CALLCONV operator* (FXMVECTOR V1, FXMVECTOR V2);
XMVECTOR XM_CALLCONV operator/ (FXMVECTOR V1, FXMVECTOR V2);
XMVECTOR XM_CALLCONV operator* (FXMVECTOR V, float S);
XMVECTOR XM_CALLCONV operator* (float S, FXMVECTOR V);
XMVECTOR XM_CALLCONV operator/ (FXMVECTOR V, float S);
Copy the code

Common constants and inline functions

DirectXMath defines some common constants and inline functions.

Some π” role=”presentation” style=”position: relative;” >Relevant constants:

Const float XM_PI = 3.141592654f; Const float XM_2PI = 6.283185307f; Const float XM_1DIVPI = 0.318309886f; Const float XM_1DIV2PI = 0.159154943f; Const float XM_PIDIV2 = 1.570796327f; Const float XM_PIDIV4 = 0.785398163f;Copy the code

The inline function of radian/Angle conversion, taking the maximum/minimum value:

Inline Float XMConvertToRadians(float fDegrees) {return fDegrees * (XM_PI / 180.0f); } Inline float XMConvertToDegrees(float fRadians) {return fRadians * (180.0f/XM_PI); } template<class T> inline T XMMin(T a, T b) { return (a < b) ? a : b; } template<class T> inline T XMMax(T a, T b) { return (a > b) ? a : b; }Copy the code

Commonly used vector functions

DirectXMath provides a set of commonly used functions to manipulate vectors and perform associated vector operations.

A Setter function

The following functions can be used to set the contents of an XMVECTOR value:

// Returns the zero vector 0
XMVECTOR XM_CALLCONV XMVectorZero();
// Returns the vector (1, 1, 1, 1)
XMVECTOR XM_CALLCONV XMVectorSplatOne();
// Returns the vector (x, y, z, w)
XMVECTOR XM_CALLCONV XMVectorSet(float x, float y, float z, float w);
// Returns the vector (s, s, s, s)
XMVECTOR XM_CALLCONV XMVectorReplicate(float Value);
// Returns the vector (vx, vx, vx, vx)
XMVECTOR XM_CALLCONV XMVectorSplatX(FXMVECTOR V);
// Returns the vector (vy, vy, vy, vy)
XMVECTOR XM_CALLCONV XMVectorSplatY(FXMVECTOR V);
// Returns the vector (vz, vz, vz, vz)
XMVECTOR XM_CALLCONV XMVectorSplatZ(FXMVECTOR V);
Copy the code

Vector function

DirectXMath provides a number of functions to perform vector operations, and some of the commonly used 3D versions of vector functions are listed below (they are also available in 2D and 4D versions, just replace the 3 in the function name with 2 or 4) :

XMVECTOR XM_CALLCONV XMVector3Length( // Returns |v|| FXMVECTOR V // Input v ); XMVECTOR XM_CALLCONV XMVector3LengthSq( // Returns ||v||^2 FXMVECTOR V // Input v ); XMVECTOR XM_CALLCONV XMVector3Dot(// Returns v1 · v2 FXMVECTOR v1, // Input v1 FXMVECTOR v2 // Input v2) XMVECTOR XM_CALLCONV XMVector3Cross(// Returns v1 × v2 FXMVECTOR v1, // Input v1 FXMVECTOR v2 // Input v2); XMVECTOR XM_CALLCONV XMVector3Normalize( // Returns v / ||v|| FXMVECTOR V // Input v ); XMVECTOR XM_CALLCONV XMVector3Orthogonal( // Returns a vector orthogonal to v FXMVECTOR V // Input v ); XMVECTOR XM_CALLCONV XMVector3AngleBetweenVectors( // Returns the angle between v1 and v2 FXMVECTOR V1, // Input v1 FXMVECTOR V2 // Input v2 ); void XM_CALLCONV XMVector3ComponentsFromNormal( XMVECTOR* pParallel, // Returns proj_n(v) XMVECTOR* pPerpendicular, // Returns perp_n(v) FXMVECTOR V, // Input v FXMVECTOR Normal // Input n ); bool XM_CALLCONV XMVector3Equal( // Returns v1 == v2 FXMVECTOR V1, // Input v1 FXMVECTOR V2 // Input v2 ); bool XM_CALLCONV XMVector3NotEqual( // Returns v1 ! = v2 FXMVECTOR V1, // Input v1 FXMVECTOR V2 // Input v2 );Copy the code

conclusion

At the beginning of this paper, SIMD and SSE2 are briefly introduced. I then cover some topics related to typing and compiler optimization in DirectXMath, This includes the configuration of the Visual Studio compiler, the core vector type XMVECTOR and its memory alignment issues (introducing XMFLOAT2, XMFLOAT3, XMFLOAT4 types and related Loading and Store Method), function argument passing and calling conventions (XM_CALLCONV modifier and type aliases such as FXMVECTOR), and vector constants (type XMVECTORF32); Finally, it introduces some common overloaded operators, constant values, inline functions, Setter functions, and vector functions in DirectXMath.

reference