(If you’re too lazy to write your own code, use my demo: github.com/karminski/P…) Don’t panic. Laziness is a virtue.
See some students say Lua library is less, need to build their own wheels. This is a magic trick that makes it easy to use the high performance C/CPP library in Luajit and avoid the pain of building your own wheels.
The magic is FFI (Foreign Function Interface), and I’m not going to go into detail about FFI, so in a nutshell, FFI implements a binary interface across languages. It has the advantage of being efficient and convenient. The disadvantage of calling the ABI directly is that it will hang if there is a problem, so you can check data carefully before it crosses critical sections.
Today we’ll just find a C library and use FFI to call this library in Luajit as a demonstration.
What? Is there a high-performance Base64 library?
Let’s take this repO as an example: github.com/aklomp/base… This is a Base64 encoding/decoding library written in C, and supports SIMD.
You can simply run the benchmark of this library:
karminski@router02:/data/works/base64$ make clean && SSSE3_CFLAGS=-mssse3 AVX2_CFLAGS=-mavx2 make && make -C test ... Testing with buffer size 100 KB, Fastest of 10 * 100 AVX2 encode 12718.47 MB/ SEC AVX2 decode 14542.81 MB/ SEC plain encode 3657.40 MB/ SEC plain decode SSSE3 encode 7269.55 MB/ SEC SSSE3 decode 8173.10 MB/ SEC...Copy the code
My CPU is Intel(R) Xeon(R) CPU E3-1246 V3 @ 3.50GHz. It can be seen that if THE CPU supports AVX2, it can reach more than 12GB/s, which is very powerful, even ordinary SSD can not keep up with it.
The first step we need is to compile the REPO into a dynamic library. However, the repo does not provide compilation options for dynamic libraries, so we modified the project’s Makefile.
CFLAGS += -std=c99 -O3 -Wall -Wextra -pedantic
# Set OBJCOPY if not defined by environment:OBJCOPY ? = objcopy OBJS = \ lib/arch/avx2/codec.o \ lib/arch/generic/codec.o \ lib/arch/neon32/codec.o \ lib/arch/neon64/codec.o \ lib/arch/ssse3/codec.o \ lib/arch/sse41/codec.o \ lib/arch/sse42/codec.o \ lib/arch/avx/codec.o \ lib/lib.o \ lib/codec_choose.o \ lib/tables/tables.o SOOBJS = \ lib/arch/avx2/codec.so \ lib/arch/generic/codec.so \ lib/arch/neon32/codec.so \ lib/arch/neon64/codec.so \ lib/arch/ssse3/codec.so \ lib/arch/sse41/codec.so \ lib/arch/sse42/codec.so \ lib/arch/avx/codec.so \ lib/lib.so \ lib/codec_choose.so \ lib/tables/tables.so HAVE_AVX2 = 0 HAVE_NEON32 = 0 HAVE_NEON64 = 0 HAVE_SSSE3 = 0 HAVE_SSE41 = 0 HAVE_SSE42 = 0 HAVE_AVX = 0# The user should supply compiler flags for the codecs they want to build.
# Check which codecs we're going to include:
ifdef AVX2_CFLAGS
HAVE_AVX2 = 1
endif
ifdef NEON32_CFLAGS
HAVE_NEON32 = 1
endif
ifdef NEON64_CFLAGS
HAVE_NEON64 = 1
endif
ifdef SSSE3_CFLAGS
HAVE_SSSE3 = 1
endif
ifdef SSE41_CFLAGS
HAVE_SSE41 = 1
endif
ifdef SSE42_CFLAGS
HAVE_SSE42 = 1
endif
ifdef AVX_CFLAGS
HAVE_AVX = 1
endif
ifdef OPENMP
CFLAGS += -fopenmp
endif
.PHONY: all analyze clean
all: bin/base64 lib/libbase64.o lib/libbase64.so
bin/base64: bin/base64.o lib/libbase64.o lib/libbase64.so
$(CC) $(CFLAGS) -o $@ $^
lib/libbase64.o: $(OBJS)
$(LD) -r -o $@ $^
$(OBJCOPY) --keep-global-symbols=lib/exports.txt $@
lib/libbase64.so: $(SOOBJS)
$(LD) -shared -fPIC -o $@ $^
$(OBJCOPY) --keep-global-symbols=lib/exports.txt $@
lib/config.h:
@echo "#define HAVE_AVX2 $(HAVE_AVX2)" > $@
@echo "#define HAVE_NEON32 $(HAVE_NEON32)" >> $@
@echo "#define HAVE_NEON64 $(HAVE_NEON64)" >> $@
@echo "#define HAVE_SSSE3 $(HAVE_SSSE3)" >> $@
@echo "#define HAVE_SSE41 $(HAVE_SSE41)" >> $@
@echo "#define HAVE_SSE42 $(HAVE_SSE42)" >> $@
@echo "#define HAVE_AVX $(HAVE_AVX)" >> $@
$(OBJS): lib/config.h
$(SOOBJS): lib/config.h
# o
lib/arch/avx2/codec.o: CFLAGS += $(AVX2_CFLAGS)
lib/arch/neon32/codec.o: CFLAGS += $(NEON32_CFLAGS)
lib/arch/neon64/codec.o: CFLAGS += $(NEON64_CFLAGS)
lib/arch/ssse3/codec.o: CFLAGS += $(SSSE3_CFLAGS)
lib/arch/sse41/codec.o: CFLAGS += $(SSE41_CFLAGS)
lib/arch/sse42/codec.o: CFLAGS += $(SSE42_CFLAGS)
lib/arch/avx/codec.o: CFLAGS += $(AVX_CFLAGS)
# so
lib/arch/avx2/codec.so: CFLAGS += $(AVX2_CFLAGS)
lib/arch/neon32/codec.so: CFLAGS += $(NEON32_CFLAGS)
lib/arch/neon64/codec.so: CFLAGS += $(NEON64_CFLAGS)
lib/arch/ssse3/codec.so: CFLAGS += $(SSSE3_CFLAGS)
lib/arch/sse41/codec.so: CFLAGS += $(SSE41_CFLAGS)
lib/arch/sse42/codec.so: CFLAGS += $(SSE42_CFLAGS)
lib/arch/avx/codec.so: CFLAGS += $(AVX_CFLAGS)
%.o: %.c
$(CC) $(CFLAGS) -o $@ -c $<
%.so: %.c
$(CC) $(CFLAGS) -shared -fPIC -o $@ -c $<
analyze: clean
scan-build --use-analyzer=`which clang` --status-bugs make
clean:
rm -f bin/base64 bin/base64.o lib/libbase64.o lib/libbase64.so lib/config.h $(OBJS)
Copy the code
It doesn’t matter if you don’t understand it, the Makefile is so complicated that I don’t understand it either, I just change it by feeling, and it just happens to run… Note that makefiles must be indent with “\t”, otherwise an error will be reported if they are not syntactically correct.
Then we compile:
AVX2_CFLAGS=-mavx2 SSSE3_CFLAGS=-mssse3 SSE41_CFLAGS=-msse4.1 SSE42_CFLAGS=-msse4.2 AVX_CFLAGS=-mavx make lib/libbase64.so
Copy the code
This gives us the libbase64.so dynamic library (in lib). Various SIMD options are also enabled incidentally. You can turn it off if you don’t need it.
The magic to start
Of course this is just magic, not alchemy, so it takes effort. We need to manually bridge dynamic libraries. First we need to see what parameters the function we are calling needs. These two definitions are simple enough to pass in:
- The string we want to encode or decode
const char *src
- The length of the string
size_t srclen
- A pointer to a return result
char *out
- Returns a pointer to the length of the result
size_t *outlen
- There is flag
int flags
void base64_encode(const char *src, size_t srclen, char *out, size_t *outlen, int flags);
int base64_decode(const char *src, size_t srclen, char *out, size_t *outlen, int flags);
Copy the code
Then we can start writing the FFI bridge. First, include all the libraries you need. Note that there is no harm in using local, which can be used to efficiently query locally and avoid inefficient global queries. Even functions in other packages can be used locally to improve performance.
Dynamic libraries are referred to in a dedicated FFi.load.
Then define a _M to wrap around our library. This is a lot like JavaScript, JavaScript has Windows in the browser, Lua has _G. We want to avoid throwing wrapped libraries out to the world as much as possible, so wrapping is a good idea.
-- init
local ffi = require "ffi"
local floor = math.floor
local ffi_new = ffi.new
local ffi_str = ffi.string
local ffi_typeof = ffi.typeof
local C = ffi.C
local libbase64 = ffi.load("./libbase64.so") -- change this path when needed.
local _M = { _VERSION = '0.0.1' }
Copy the code
Cdef is used to declare the ABI interface. It is easier to copy the function declaration from the header file of the source code.
-- cdef
ffi.cdef[[ void base64_encode(const uint8_t *src, size_t srclen, uint8_t *out, size_t *outlen, size_t flags); int base64_decode(const uint8_t *src, size_t srclen, uint8_t *out, size_t *outlen, size_t flags); ]]
Copy the code
Next comes the most important conversion:
-- define types
local uint8t = ffi_typeof("uint8_t[?] ") -- uint8_t *
local psizet = ffi_typeof("size_t[1]") -- size_t *
-- package function
function _M.base64_encode(src, flags)
local dlen = floor((#src * 8 + 4) / 6)
local out = ffi_new(uint8t, dlen)
local outlen = ffi_new(psizet, 1)
libbase64.base64_encode(src, #src, out, outlen, flags)
return ffi_str(out, outlen[0])
end
function _M.base64_decode(src, flags)
local dlen = floor((#src + 1) * 6 / 8)
local out = ffi_new(uint8t, dlen)
local outlen = ffi_new(psizet, 1)
libbase64.base64_decode(src, #src, out, outlen, flags)
return ffi_str(out, outlen[0])
end
Copy the code
We use FFi_typeof to define the data types to be mapped, and ffi_new to instantiate and allocate memory. Specifically speaking:
We define 2 data types, local uint8t = ffi_typeof(“uint8_t[? The “) type is used to transfer strings, followed by a question mark for the ffi_new function in local out = ffi_new(Uint8t, dlen), whose second parameter specifies the length to instantiate the data type. This gives us an empty array of strings to hold the result returned by the C function. Dlen calculates the length of the source string after base64 encode, and allocates that length.
Similarly, local psizet = ffi_typeof(“size_t[1]”) specifies a size_t * type. Size_t [0] is equivalent to site_t*. So we take a size_t array that has only one element and we get a pointer to size_t. And then when local outlen is equal to ffi_new(psizet, 1), it doesn’t matter what we’re going to put in there, it just doesn’t support putting in a placeholder, so we’re going to put in a placeholder.
We use this value as an array: return ffi_str(out, outlen[0]).
Note that the require “ffi” and ffi.load must be placed at the bottom of the code, otherwise table overflow will occur.
Finally, the file looks like this:
--[[ ffi-base64.lua @version 20201228:1 @author karminski
]]
@outlook.com>--
-- init
local ffi = require "ffi"
local floor = math.floor
local ffi_new = ffi.new
local ffi_str = ffi.string
local ffi_typeof = ffi.typeof
local C = ffi.C
local libbase64 = ffi.load("./libbase64.so") -- change this path when needed.
local _M = { _VERSION = '0.0.1' }
-- cdef
ffi.cdef[[ void base64_encode(const uint8_t *src, size_t srclen, uint8_t *out, size_t *outlen, size_t flags); int base64_decode(const uint8_t *src, size_t srclen, uint8_t *out, size_t *outlen, size_t flags); ]]
-- define types
local uint8t = ffi_typeof("uint8_t[?] ") -- uint8_t *
local psizet = ffi_typeof("size_t[1]") -- size_t *
-- package function
function _M.base64_encode(src, flags)
local dlen = floor((#src * 8 + 4) / 6)
local out = ffi_new(uint8t, dlen)
local outlen = ffi_new(psizet, 1)
libbase64.base64_encode(src, #src, out, outlen, flags)
return ffi_str(out, outlen[0])
end
function _M.base64_decode(src, flags)
local dlen = floor((#src + 1) * 6 / 8)
local out = ffi_new(uint8t, dlen)
local outlen = ffi_new(psizet, 1)
libbase64.base64_decode(src, #src, out, outlen, flags)
return ffi_str(out, outlen[0])
end
return _M
Copy the code
Ok, we are done, let’s write a demo call to try:
-- main.lua
local ffi_base64 = require "ffi-base64"
local target = "https://example.com"
local r = ffi_base64.base64_encode(target, 0)
print("base64 encode result: \n"..r)
local r = ffi_base64.base64_decode(r, 0)
print("base64 decode result: \n"..r)
Copy the code
Root @ router02: / data/works/libbase64 - ffi# luajit -v luajit 2.1.0 -beta3, Copyright (C) 2005-2020 Mike Pall. https://luajit.org/ root@router02:/data/works/libbase64-ffi# luajit ./main.lua base64 encode result: aHR0cHM6Ly9leGFtcGxlLmNvbQ== base64 decode result: https://example.comCopy the code
Done! Isn’t that easy? There are many similar FFI libraries, with varying degrees of support for each language. Everybody can try it. Finally, when you run into a similar problem, remember that FFI is a handy weapon in your Arsenal. The above.