Implementation of a GPU compression texture GLTF extension

The reason

A long time ago, I heard that WebGL colleagues of the company investigated GPU compression texture. I also did some research before, and found that basis_Universal tool can realize fast UASTC, etc1s fast transcode to the compressed texture format supported by the corresponding platform. However, due to the large volume of WASM and Loader js, they are not used. I found a lighter transcode implementation later, so I wanted to take advantage of it.

explore

Basic-universal-transcoders are written by the KhronosGroup using AssemblyScript. Compared with the wasm of Basis 220+ KB, basic-universal-transcoders are very lightweight, but they only support three transcode formats. Development is not very active.

Later, we learned that LayaAir’s compressed texture use scheme is relatively simple and rough. Ios uses PVRTC, Android ETC1, and others are PNG/JPG. With hDR-prefilter-texture implemented earlier, the same idea can be applied to compressed textures.

Everything that needs to be processed by the Runtime can be preprocessed, and the Runtime only needs to load the preprocessed products

So there’s this gPU-compressed texture extension that stores the Basis Transcode output and the Runtime downloads the corresponding preprocessed format based on the supported format.

Front knowledge

GLTF structure

Since the goal is a GLTF extension, you need to understand the GLTF format.

ExtensionsUsed tells parser that it needs an extension to parse GLTF. Other tables are similar to relational tables, but are associated with subscripts. For example:

Nodes [j]. Nodes [I]. Meshes [I]. Meshes [I]. Materials to materials [I] [I] normalTexture: point to textures textures [I] [I]. Source: [I]. [I] images to images uri: BufferViews [I] bufferViews[I]. Buffer: Buffers [I] buffers[I]. Uri: buffers

GLTF extension

Now that you have a brief understanding of how GLTF information is related, you can begin to understand how the GLTF extension is written. The need to implement a GLTF extension can also be interpreted as a degraded extension, quite similar to the EXT_texture_webp that Google implements.

function GLTFTextureWebPExtension(parser) {
  this.parser = parser;
  this.name = EXTENSIONS.EXT_TEXTURE_WEBP;
  this.isSupported = null;
}

GLTFTextureWebPExtension.prototype.loadTexture = function (textureIndex) {
  var name = this.name;
  var parser = this.parser;
  var json = parser.json;

  var textureDef = json.textures[textureIndex];

  if(! textureDef.extensions || ! textureDef.extensions[name]) {return null;
  }

  var extension = textureDef.extensions[name];
  var source = json.images[extension.source];

  var loader = parser.textureLoader;
  if (source.uri) {
    var handler = parser.options.manager.getHandler(source.uri);
    if(handler ! = =null) loader = handler;
  }

  return this.detectSupport().then(function (isSupported) {
    if (isSupported) return parser.loadTextureImage(textureIndex, source, loader);

    if (json.extensionsRequired && json.extensionsRequired.indexOf(name) >= 0) {
      throw new Error('THREE.GLTFLoader: WebP required by asset but unsupported.');
    }

    // Fall back to PNG or JPEG.
    return parser.loadTexture(textureIndex);
  });
};

GLTFTextureWebPExtension.prototype.detectSupport = function () {
  if (!this.isSupported) {
    this.isSupported = new Promise(function (resolve) {
      var image = new Image();

      image.src = 'data:image/webp; base64,UklGRiIAAABXRUJQVlA4IBYAAAAwAQCdASoBAAEADsD+JaQAA3AAAAAA';
      image.onload = image.onerror = function () {
        resolve(image.height === 1);
      };
    });
  }

  return this.isSupported;
};
Copy the code

DetectSupport and loadTexture are both logical and easy to understand. The loadTexture is triggered by GLTFLoader.

It’s easy to find custom GLTF extensions. Just search for this._invokeOne in GLTFLoader to see how many hook functions are supported

loadMesh
loadBufferView
loadMaterial
loadTexture
getMaterialType

implementation

First sort out the general idea of implementation.

GLTF extension

Define the extended scheme
DetectSupport is obtained by getting GL to read extended support
LoadTexture loads the corresponding data according to Scheme, produces CompressedTexture and returns

Tool parts

From GLTF/GLB loaded, containing texture converts the basis, and then decode into astc | bc7 | DXT | PVRTC | etc1
Export GLTF in Scheme format.

Define the scheme

Refer to EXT_texture_webp to see that the extension configuration is stored in extension.ext_texture_webp, that is, only this part of the format needs to be defined.

{
  "textures": [{"source": 0."extensions": {
        "EXT_GPU_COMPRESSED_TEXTURE": {
          "astc": 1."bc7": 2."dxt": 3."pvrtc": 4."etc1": 5."width": 2048."height": 2048."hasAlpha": 0."compress": 1}}}]."buffers": [{"name": "buffer"."byteLength": 207816."uri": "buffer.bin" },
    { "name": "image3.astc"."byteLength": 48972."uri": "image3.astc.bin" },
    { "name": "image3.bc7"."byteLength": 50586."uri": "image3.bc7.bin" },
    { "name": "image3.dxt"."byteLength": 10686."uri": "image3.dxt.bin" },
    { "name": "image3.pvrtc"."byteLength": 21741."uri": "image3.pvrtc.bin" },
    { "name": "image3.etc1"."byteLength": 22360."uri": "image3.etc1.bin"}}]Copy the code

Format is simple, a see understand, astc | bc7 | DXT | PVRTC | etc1 fields to buffers [I].

Generate GLTF of corresponding structure

Here you can refer to the basis webgl/texture/index.html for part of the cycle to generate 5 types of compressed texture products to the bin file, and then manually write the GLTF file.

At this point, the base version is ready to be written.

export class GLTFGPUCompressedTexture {
  constructor(parser) {
    this.name = 'EXT_GPU_COMPRESSED_TEXTURE';
    this.parser = parser;
  }

  detectSupport(renderer) {
    this.supportInfo = {
      astc: renderer.extensions.has('WEBGL_compressed_texture_astc'),
      bc7: renderer.extensions.has('EXT_texture_compression_bptc'),
      dxt: renderer.extensions.has('WEBGL_compressed_texture_s3tc'),
      etc1: renderer.extensions.has('WEBGL_compressed_texture_etc1'),
      etc2: renderer.extensions.has('WEBGL_compressed_texture_etc'),
      pvrtc:
        renderer.extensions.has('WEBGL_compressed_texture_pvrtc') ||
        renderer.extensions.has('WEBKIT_WEBGL_compressed_texture_pvrtc'),};return this;
  }

  loadTexture(textureIndex) {
    const { parser, name } = this;
    const json = parser.json;
    const textureDef = json.textures[textureIndex];

    if(! textureDef.extensions || ! textureDef.extensions[name])return null;
    
    const extensionDef = textureDef.extensions[name];
    const { width, height, hasAlpha } = extensionDef;

    for (let name in this.supportInfo) {
      if (this.supportInfo[name] && extensionDef[name] ! = =undefined) {
        return parser
          .getDependency('buffer', extensionDef[name])
          .then(buffer= > {
            // TODO:Support for compressed textures with MiPMap
            // TODO:ZSTD compression

            const mipmaps = [
              {
                data: new Uint8Array(buffer),
                width,
                height,
              },
            ];


            // The current buffer is directly passed to the GPU buffer
            const texture = new CompressedTexture(
              mipmaps,
              width,
              height,
              typeFormatMap[name][hasAlpha],
              UnsignedByteType,
            );
            texture.minFilter =
              mipmaps.length === 1 ? LinearFilter : LinearMipmapLinearFilter;
            texture.magFilter = LinearFilter;
            texture.generateMipmaps = false;
            texture.needsUpdate = true;

            returntexture; }); }}// Fall back to PNG or JPEG.
    returnparser.loadTexture(textureIndex); }}Copy the code

Rich details

Since the basis produced by ETC1S is small in size but poor in quality, and UASTC is high in quality but large in size, lossless compression is required.
Mipmap must be supported. GPU compression textures cannot quickly generate MIPMap on the GPU, and miPMap loading must be implemented
Since compression is required, you may need to use Web Worker acceleration, WASM acceleration, SIMD acceleration, etc
CLI conversion tool supports multi-process, batch processing, and output size statistics
Write performance test cases, compare the compressed texture scheme of KTX2+ UASTC, and record data arrangement tables
PC, mobile browser comparison, as well as ImageBitmapLoader, texture quantity size, resolution size and other comparison
Use UI thread decode for fewer images and worker decode for more images
Perfect resource release logic, dipose

Then there is a relatively perfect solution glTF-GPU-compressed -texture

A GLTF extension for GPU compression texture degradation, as well as a batch CLI conversion tool for THREE’s GLTFLoader, DEMO address, extended definition

The performance data

Use ImageBitmapLoader, THREE R129, localhost, disable cache: true

model	parameter	load	render	The total time consuming	The model size	Depend on the size
banzi_blue	gltf-tc zstd no-mimap no-worker	36.10 ms	1.60 ms	37.70 ms	506kb	22.3 KB
banzi_blue	gltf-tc no-zstd mimap no-worker	25.80 ms	1.50 ms	27.30 ms	2.2 MB	22.3 KB
banzi_blue	gltf-tc zstd mimap no-worker	37.90 ms	1.60 ms	39.50 ms	648kb	22.3 KB
banzi_blue	gltf ktx2 uastc	534.70 ms	1.70 ms	536.40 ms	684kb	249.3 KB
banzi_blue	glb	32.80 QMS	6.00 ms	38.80 ms	443kb
banzi_blue	gltf	27.70 ms	4.90 ms	32.60 ms	446kb
BoomBox	gltf-tc zstd mipmap worker	153.50 ms	23.70 ms	177.20 ms	6.6 MB	22.3 KB
BoomBox	gltf-tc zstd mipmap no-worker	241.10 ms	9.40 ms	250.50 ms	6.6 MB	22.3 KB
BoomBox	glb ktx2 uastc	506.10 ms	9.30 ms	515.40 ms	7.1 MB	249.3 KB
BoomBox	glb	156.10 ms	89.50 ms	245.60 ms	11.3 MB
BoomBox	gltf	120.20 ms	58.80 ms	179.00 ms	11.3 MB

Since banzi_blue maps are less than 4, decode ZSTD in UI thread, because it will take a lot of time for worker to transfer data, compare all ZSTD decode in UI thread with KTX2Loader. Decode in Web Worker PR submitted dependency size 22.3KB obtained from online DEMO, HTTP-server –gzip does not work well

BoomBox gltF-TC ZSTD mipmap worker load+render time The time is not different from GLTF, but the model size has a big advantage

Test data under MI 8 can be viewed in the Screenshots directory

In wechat WebView, BoomBox is faster than GLB/GLTF, which is abnormal; in Chrome, it is normal; banzi_Blue is a little slower; KTX2 is still very slow

Command execution

Make sure ZSTD and Basisu are already in the PATH before using it

> npm i gltf-gpu-compressed-texture -S
# View help
> gltf-tc -h

  -h --help-i --input [dir] [? Outdir] [? Compress] [? Mipmap] convert GLTF textures to GPU compressed textures and support Fallback Examples: gltf-tc -i ./examples/glb ./examples/zstd gltf-tc -i ./examples/glb ./examples/no-zstd 0 gltf-tc -i ./examples/glb ./examples/no-mipmap 1false
  gltf-tc -i ./examples/glb ./examples/no-zstd-no-mipmap 0 false

# to perform
> gltf-tc -i ./examples/glb ./examples/zstd

done: 6417ms image3.png Normal:false      sRGB: true
done: 13746ms image2.png Normal:true       sRGB: false
donePNG: 14245ms image0.png Normal:false      sRGB: true
done: 14491ms image1.png Normal:false      sRGB: false
done: 577ms findi_touming01_nomarL1.jpg Normal:true       sRGB: false
donePNG normal: : 568ms findi_touming01_basecoler.pngfalse      sRGB: true
done: 1267ms lanse_banzi-1.jpg Normal:false      sRGB: true
donePNG: 577ms Findi_touming01_basecoler.png Normal:false      sRGB: true
done: 604ms findi_touming01_nomarL1.jpg normal:true       sRGB: false
done: 1280ms lvse_banzi-1.jpg Normal:false      sRGB: trueCost: 17.75s compress: 1, Summary: Bitmap: 11.22MB ASTC: 7.18MB ETC1:1.85MB BC7:7.16MB DXT: 3.04MB PVRTC: 2.28MBCopy the code

NPM package use

import { GLTFLoader, CompressedTexture, WebGLRenderer } from 'three-platfromzie/examples/jsm/loaders/GLTFLoader';
import GLTFGPUCompressedTexture from 'gltf-gpu-compressed-texture';

const gltfLoader = new GLTFLoader();
const renderer = new WebGLRenderer();
const scene = new Scene();

gltfLoader.register(parser= > {
  return new GLTFGPUCompressedTexture(parser, renderer, {
    CompressedTexture: THREE.CompressedTexture,
  });
});

gltfLoader.loadAsync('./examples/zstd/BoomBox.gltf').then((gltf) = > {
  scene.add(gltf.scene);
});
Copy the code

Found over

Compressed textures minFilter and magFilter support is limited
ZSTD is faster than PNG decode, so ZPNG format appears
Az64 is better than ZSTD, but it’s not open source, and the actual performance is unknown
Ktx2Loader uses zSTddec in UI thread decode, so put forward PR, implement worker pool decode
Uint8Array(buffer, dataOffset); Uint8Array. From (new Uint8Array(buffer, dataOffset));
Epic has similar Basis Transcode schemes and compressed oodle formats, closed source
It is also possible to use ZSTD on top of tf models, but TF has its own data compression
There are implementations on GPU decode Huffman, Massively Parallel Huffman Decoding on GPUs
As mentioned earlier, basis-universal-transcoders, Babylon is already in use, but experimentally
ZSTD WASM should be a non-SIMD version and was built last year, using the latest version of wASM but not running successfully
IOS textures will block GIF uploads, while compressed textures will not

The last

Gltf-gpu-compressed -texture, star

Implementation of a GPU compression texture GLTF extension

The reason

explore

Front knowledge

GLTF structure

GLTF extension

implementation

Define the scheme

Generate GLTF of corresponding structure

Rich details

The performance data

Command execution

NPM package use

Found over

The last

Related Posts

Deploy easy-mock locally

JavaScript basics – Scope/object

Vue + Node (Express) does mid-tier development three — Express supports ES6