The reason

A long time ago, I heard that WebGL colleagues of the company investigated GPU compression texture. I also did some research before, and found that basis_Universal tool can realize fast UASTC, etc1s fast transcode to the compressed texture format supported by the corresponding platform. However, due to the large volume of WASM and Loader js, they are not used. I found a lighter transcode implementation later, so I wanted to take advantage of it.

explore

Basic-universal-transcoders are written by the KhronosGroup using AssemblyScript. Compared with the wasm of Basis 220+ KB, basic-universal-transcoders are very lightweight, but they only support three transcode formats. Development is not very active.

Later, we learned that LayaAir’s compressed texture use scheme is relatively simple and rough. Ios uses PVRTC, Android ETC1, and others are PNG/JPG. With hDR-prefilter-texture implemented earlier, the same idea can be applied to compressed textures.

Everything that needs to be processed by the Runtime can be preprocessed, and the Runtime only needs to load the preprocessed products

So there’s this gPU-compressed texture extension that stores the Basis Transcode output and the Runtime downloads the corresponding preprocessed format based on the supported format.

Front knowledge

GLTF structure

Since the goal is a GLTF extension, you need to understand the GLTF format.

ExtensionsUsed tells parser that it needs an extension to parse GLTF. Other tables are similar to relational tables, but are associated with subscripts. For example:

Nodes [j]. Nodes [I]. Meshes [I]. Meshes [I]. Materials to materials [I] [I] normalTexture: point to textures textures [I] [I]. Source: [I]. [I] images to images uri: BufferViews [I] bufferViews[I]. Buffer: Buffers [I] buffers[I]. Uri: buffers

GLTF extension

Now that you have a brief understanding of how GLTF information is related, you can begin to understand how the GLTF extension is written. The need to implement a GLTF extension can also be interpreted as a degraded extension, quite similar to the EXT_texture_webp that Google implements.

function GLTFTextureWebPExtension(parser) {
  this.parser = parser;
  this.name = EXTENSIONS.EXT_TEXTURE_WEBP;
  this.isSupported = null;
}

GLTFTextureWebPExtension.prototype.loadTexture = function (textureIndex) {
  var name = this.name;
  var parser = this.parser;
  var json = parser.json;

  var textureDef = json.textures[textureIndex];

  if(! textureDef.extensions || ! textureDef.extensions[name]) {return null;
  }

  var extension = textureDef.extensions[name];
  var source = json.images[extension.source];

  var loader = parser.textureLoader;
  if (source.uri) {
    var handler = parser.options.manager.getHandler(source.uri);
    if(handler ! = =null) loader = handler;
  }

  return this.detectSupport().then(function (isSupported) {
    if (isSupported) return parser.loadTextureImage(textureIndex, source, loader);

    if (json.extensionsRequired && json.extensionsRequired.indexOf(name) >= 0) {
      throw new Error('THREE.GLTFLoader: WebP required by asset but unsupported.');
    }

    // Fall back to PNG or JPEG.
    return parser.loadTexture(textureIndex);
  });
};

GLTFTextureWebPExtension.prototype.detectSupport = function () {
  if (!this.isSupported) {
    this.isSupported = new Promise(function (resolve) {
      var image = new Image();

      image.src = 'data:image/webp; base64,UklGRiIAAABXRUJQVlA4IBYAAAAwAQCdASoBAAEADsD+JaQAA3AAAAAA';
      image.onload = image.onerror = function () {
        resolve(image.height === 1);
      };
    });
  }

  return this.isSupported;
};
Copy the code

DetectSupport and loadTexture are both logical and easy to understand. The loadTexture is triggered by GLTFLoader.

It’s easy to find custom GLTF extensions. Just search for this._invokeOne in GLTFLoader to see how many hook functions are supported

  1. loadMesh
  2. loadBufferView
  3. loadMaterial
  4. loadTexture
  5. getMaterialType

implementation

First sort out the general idea of implementation.

GLTF extension

  1. Define the extended scheme
  2. DetectSupport is obtained by getting GL to read extended support
  3. LoadTexture loads the corresponding data according to Scheme, produces CompressedTexture and returns

Tool parts

  1. From GLTF/GLB loaded, containing texture converts the basis, and then decode into astc | bc7 | DXT | PVRTC | etc1
  2. Export GLTF in Scheme format.

Define the scheme

Refer to EXT_texture_webp to see that the extension configuration is stored in extension.ext_texture_webp, that is, only this part of the format needs to be defined.

{
  "textures": [{"source": 0."extensions": {
        "EXT_GPU_COMPRESSED_TEXTURE": {
          "astc": 1."bc7": 2."dxt": 3."pvrtc": 4."etc1": 5."width": 2048."height": 2048."hasAlpha": 0."compress": 1}}}]."buffers": [{"name": "buffer"."byteLength": 207816."uri": "buffer.bin" },
    { "name": "image3.astc"."byteLength": 48972."uri": "image3.astc.bin" },
    { "name": "image3.bc7"."byteLength": 50586."uri": "image3.bc7.bin" },
    { "name": "image3.dxt"."byteLength": 10686."uri": "image3.dxt.bin" },
    { "name": "image3.pvrtc"."byteLength": 21741."uri": "image3.pvrtc.bin" },
    { "name": "image3.etc1"."byteLength": 22360."uri": "image3.etc1.bin"}}]Copy the code

Format is simple, a see understand, astc | bc7 | DXT | PVRTC | etc1 fields to buffers [I].

Generate GLTF of corresponding structure

Here you can refer to the basis webgl/texture/index.html for part of the cycle to generate 5 types of compressed texture products to the bin file, and then manually write the GLTF file.

At this point, the base version is ready to be written.

export class GLTFGPUCompressedTexture {
  constructor(parser) {
    this.name = 'EXT_GPU_COMPRESSED_TEXTURE';
    this.parser = parser;
  }

  detectSupport(renderer) {
    this.supportInfo = {
      astc: renderer.extensions.has('WEBGL_compressed_texture_astc'),
      bc7: renderer.extensions.has('EXT_texture_compression_bptc'),
      dxt: renderer.extensions.has('WEBGL_compressed_texture_s3tc'),
      etc1: renderer.extensions.has('WEBGL_compressed_texture_etc1'),
      etc2: renderer.extensions.has('WEBGL_compressed_texture_etc'),
      pvrtc:
        renderer.extensions.has('WEBGL_compressed_texture_pvrtc') ||
        renderer.extensions.has('WEBKIT_WEBGL_compressed_texture_pvrtc'),};return this;
  }

  loadTexture(textureIndex) {
    const { parser, name } = this;
    const json = parser.json;
    const textureDef = json.textures[textureIndex];

    if(! textureDef.extensions || ! textureDef.extensions[name])return null;
    
    const extensionDef = textureDef.extensions[name];
    const { width, height, hasAlpha } = extensionDef;

    for (let name in this.supportInfo) {
      if (this.supportInfo[name] && extensionDef[name] ! = =undefined) {
        return parser
          .getDependency('buffer', extensionDef[name])
          .then(buffer= > {
            // TODO:Support for compressed textures with MiPMap
            // TODO:ZSTD compression

            const mipmaps = [
              {
                data: new Uint8Array(buffer),
                width,
                height,
              },
            ];


            // The current buffer is directly passed to the GPU buffer
            const texture = new CompressedTexture(
              mipmaps,
              width,
              height,
              typeFormatMap[name][hasAlpha],
              UnsignedByteType,
            );
            texture.minFilter =
              mipmaps.length === 1 ? LinearFilter : LinearMipmapLinearFilter;
            texture.magFilter = LinearFilter;
            texture.generateMipmaps = false;
            texture.needsUpdate = true;

            returntexture; }); }}// Fall back to PNG or JPEG.
    returnparser.loadTexture(textureIndex); }}Copy the code

Rich details

  1. Since the basis produced by ETC1S is small in size but poor in quality, and UASTC is high in quality but large in size, lossless compression is required.
  2. Mipmap must be supported. GPU compression textures cannot quickly generate MIPMap on the GPU, and miPMap loading must be implemented
  3. Since compression is required, you may need to use Web Worker acceleration, WASM acceleration, SIMD acceleration, etc
  4. CLI conversion tool supports multi-process, batch processing, and output size statistics
  5. Write performance test cases, compare the compressed texture scheme of KTX2+ UASTC, and record data arrangement tables
  6. PC, mobile browser comparison, as well as ImageBitmapLoader, texture quantity size, resolution size and other comparison
  7. Use UI thread decode for fewer images and worker decode for more images
  8. Perfect resource release logic, dipose

Then there is a relatively perfect solution glTF-GPU-compressed -texture

A GLTF extension for GPU compression texture degradation, as well as a batch CLI conversion tool for THREE’s GLTFLoader, DEMO address, extended definition

The performance data

Use ImageBitmapLoader, THREE R129, localhost, disable cache: true

model parameter load render The total time consuming The model size Depend on the size
banzi_blue gltf-tc zstd no-mimap no-worker 36.10 ms 1.60 ms 37.70 ms 506kb 22.3 KB
banzi_blue gltf-tc no-zstd mimap no-worker 25.80 ms 1.50 ms 27.30 ms 2.2 MB 22.3 KB
banzi_blue gltf-tc zstd mimap no-worker 37.90 ms 1.60 ms 39.50 ms 648kb 22.3 KB
banzi_blue gltf ktx2 uastc 534.70 ms 1.70 ms 536.40 ms 684kb 249.3 KB
banzi_blue glb 32.80 QMS 6.00 ms 38.80 ms 443kb
banzi_blue gltf 27.70 ms 4.90 ms 32.60 ms 446kb
BoomBox gltf-tc zstd mipmap worker 153.50 ms 23.70 ms 177.20 ms 6.6 MB 22.3 KB
BoomBox gltf-tc zstd mipmap no-worker 241.10 ms 9.40 ms 250.50 ms 6.6 MB 22.3 KB
BoomBox glb ktx2 uastc 506.10 ms 9.30 ms 515.40 ms 7.1 MB 249.3 KB
BoomBox glb 156.10 ms 89.50 ms 245.60 ms 11.3 MB
BoomBox gltf 120.20 ms 58.80 ms 179.00 ms 11.3 MB

Since banzi_blue maps are less than 4, decode ZSTD in UI thread, because it will take a lot of time for worker to transfer data, compare all ZSTD decode in UI thread with KTX2Loader. Decode in Web Worker PR submitted dependency size 22.3KB obtained from online DEMO, HTTP-server –gzip does not work well

BoomBox gltF-TC ZSTD mipmap worker load+render time The time is not different from GLTF, but the model size has a big advantage

Test data under MI 8 can be viewed in the Screenshots directory

In wechat WebView, BoomBox is faster than GLB/GLTF, which is abnormal; in Chrome, it is normal; banzi_Blue is a little slower; KTX2 is still very slow

Command execution

Make sure ZSTD and Basisu are already in the PATH before using it

> npm i gltf-gpu-compressed-texture -S
# View help
> gltf-tc -h

  -h --help-i --input [dir] [? Outdir] [? Compress] [? Mipmap] convert GLTF textures to GPU compressed textures and support Fallback Examples: gltf-tc -i ./examples/glb ./examples/zstd gltf-tc -i ./examples/glb ./examples/no-zstd 0 gltf-tc -i ./examples/glb ./examples/no-mipmap 1false
  gltf-tc -i ./examples/glb ./examples/no-zstd-no-mipmap 0 false

# to perform
> gltf-tc -i ./examples/glb ./examples/zstd

done: 6417ms image3.png Normal:false      sRGB: true
done: 13746ms image2.png Normal:true       sRGB: false
donePNG: 14245ms image0.png Normal:false      sRGB: true
done: 14491ms image1.png Normal:false      sRGB: false
done: 577ms findi_touming01_nomarL1.jpg Normal:true       sRGB: false
donePNG normal: : 568ms findi_touming01_basecoler.pngfalse      sRGB: true
done: 1267ms lanse_banzi-1.jpg Normal:false      sRGB: true
donePNG: 577ms Findi_touming01_basecoler.png Normal:false      sRGB: true
done: 604ms findi_touming01_nomarL1.jpg normal:true       sRGB: false
done: 1280ms lvse_banzi-1.jpg Normal:false      sRGB: trueCost: 17.75s compress: 1, Summary: Bitmap: 11.22MB ASTC: 7.18MB ETC1:1.85MB BC7:7.16MB DXT: 3.04MB PVRTC: 2.28MBCopy the code

NPM package use

import { GLTFLoader, CompressedTexture, WebGLRenderer } from 'three-platfromzie/examples/jsm/loaders/GLTFLoader';
import GLTFGPUCompressedTexture from 'gltf-gpu-compressed-texture';

const gltfLoader = new GLTFLoader();
const renderer = new WebGLRenderer();
const scene = new Scene();

gltfLoader.register(parser= > {
  return new GLTFGPUCompressedTexture(parser, renderer, {
    CompressedTexture: THREE.CompressedTexture,
  });
});

gltfLoader.loadAsync('./examples/zstd/BoomBox.gltf').then((gltf) = > {
  scene.add(gltf.scene);
});
Copy the code

Found over

  1. Compressed textures minFilter and magFilter support is limited
  2. ZSTD is faster than PNG decode, so ZPNG format appears
  3. Az64 is better than ZSTD, but it’s not open source, and the actual performance is unknown
  4. Ktx2Loader uses zSTddec in UI thread decode, so put forward PR, implement worker pool decode
  5. Uint8Array(buffer, dataOffset); Uint8Array. From (new Uint8Array(buffer, dataOffset));
  6. Epic has similar Basis Transcode schemes and compressed oodle formats, closed source
  7. It is also possible to use ZSTD on top of tf models, but TF has its own data compression
  8. There are implementations on GPU decode Huffman, Massively Parallel Huffman Decoding on GPUs
  9. As mentioned earlier, basis-universal-transcoders, Babylon is already in use, but experimentally
  10. ZSTD WASM should be a non-SIMD version and was built last year, using the latest version of wASM but not running successfully
  11. IOS textures will block GIF uploads, while compressed textures will not

The last

Gltf-gpu-compressed -texture, star