In the last article, WE shared the relevant content of PDF format, including the related content of embedded fonts. However, due to the limited space, we did not expand into details. This article will use the common TTF font format to introduce some knowledge of fonts. If you have time, I suggest you read the previous article, especially the section on how to view binary data.

The background,

Font-size: 16px “> < span style =” box-sizing: border-box; color: RGB (51, 51, 51);

-apple-system,system-ui,Segoe UI,Roboto,Ubuntu,Cantarell,Noto Sans,sans-serif,BlinkMacSystemFont,”Helvetica Neue”,”PingFang SC”,”Hiragino Sans GB”,”Microsoft YaHei”,Arial;

In fact, the main reason for writing such a long fallback logic is the processing of fonts by the system. If you tell a program on a Windows machine that you want to render a string in Apple font, the system won’t be able to do that for you. If you happen to have Apple fonts installed on your Windows machine, that’s fine. If not, it may be rendered in the system default font (depending on your program logic). So we often come across word documents that look fine on my own computer, but not on someone else’s.

One of the big advantages of PDF over other formats is that it’s cross-platform, which means that a document can be viewed in the same format whether you’re viewing it on Windows, MAC or mobile, and that includes fonts. There is no font problem with PDF files, there is no dark magic behind them, just embed the font file in the file, and use the embedded font instead of the system font when rendering, then the problem is solved.

Second, common font formats

  • TTF: TrueType Font (TTF) is a Font format developed by Apple and Microsoft for PostScript. It is probably the most familiar or heard format.
  • OTF: OTF (OpenType Font) evolved from TTF and is the result of a joint effort between Adobe and Microsoft. OTF fonts contain a portion of screen and printer font data. OTF has several exclusive features, including support for multiple platforms and extended character sets. I understand that this is a superset of TTF fonts.
  • WOFF: WOFF (Web Open Font Format) is essentially metadata + SFNt-based fonts (such as TTF, OTF, or other Open Font formats). Created entirely for the Web, the format is a collaboration between the Mozilla Foundation, Microsoft, and Opera Software. WOFF fonts are compressed by WOFF coding tools, and the file size is generally 40% smaller than TTF, with faster loading speed and better embedding in web pages.
  • WOFF2: WOFF2 is the next generation of WOFF. The WOFF2 format has a 30% compression improvement over the original.

The above list lists some common font formats. It can be seen that TTF is not only the most familiar font, but also has a deep history with other fonts. Therefore, the focus of this article is to introduce TTF font format.

TTF file structure

3.1 an overview

Most TTF files have a. TTF suffix, but some files also have a. TTC suffix. TTC stands for TrueType Collection.

Let’s take Arial as an example to uncover the mystery of TTF.

The above image is a screenshot of Arial. TTF using the binary viewer. Binaries look like this, and a lot of them look like gibberish in ASCII mode. Personal think binary a larger advantage is small size, no redundant information, if you put a JSON serialization, will there is a lot of redundant information, but the downside is inflexible, JSON array can store any type of elements, but inside the binary if there is the concept of an array, it can’t elements are two different types, This also results in JSON serialization and deserialization being slower than binary, unless additional information is used.

Under this restriction, a binary file parsing rule is fixed. For example, TTF files have their own rules, and you must parse according to their rules to get the correct result.

3.2 directory

As we all know, books generally have two important parts, namely the contents and the text. The main function of the contents is to find the corresponding text quickly. The TTF file also has a similar concept, called Font Directory, which is the main content of the TTF file body, how many parts, where each part. Font Tables is the body of the TTF file. Each Table has its own function, which will be explained later. Then we will compare real files according to the rules of the directory.

As you can see, the first four bytes of the file represent the file type. 0x74727565 (true) or 0x00010000 are TTF format, as you can see the file is 0x00010000. The next two bytes are the number of tables, and you can see that there are 24 in total. The other six bytes we don’t care about right now.

This is just an overview of our file information, followed by the information for the 24 tables, mainly content offset positions, defined as follows:

Here we parse the first tables

Let’s translate the message:

  • Tag: 0x44534947 = DSIG
  • Checksum: 0 x7232a231
  • Offset: 0x000BA844 = 763972
  • Length: 0x00002430 = 9264

That is to say, the contents of the DSIG table start from 763932 and the total length is 9264. However, how to parse the contents of the DSIG table needs to be resolved according to the rules of the DSIG table.

As you can see, this is where directories come in, and it highlights the advantage of binary, which is random access on demand. I can parse only part of a file, imagine a JSON file that can parse only part of a file.

3.3 the Font Tables

Here, each table has its own function. Many tables are complicated and cannot be expanded completely due to space limitations. For details, you can refer to the official documents. This section only introduces some basic or commonly used tables.

Official documents mention the following tables are required.

glyf

This is the heart of the font and represents how a text is drawn. The letter B below, for example, is drawn using straight lines and Bezier curves, and the coordinate information for each line is contained in glyf. Of course, Glyf itself is complex, with simple and complex modes, which are not expanded here.

cmap

From the point of view of our code, we only know that the codepoint of the letter B is 98, so how can we find the corresponding glyf based on 98? The simplest way is that glyf is an array, and the 98th digit is exactly B.

If the font file has only one glyf and cdePoint is 65536, the first 65535 will be wasted, and if the characters are not consecutive, such as 97,107,207, the space between them will also be wasted.

So we need a map that maps codepInt to the indices of the Glyf array. And that mapping table is a Cmap. I’ll cover just one of the more clever Format4 here.

According to the documentation, it is suitable for continuous interval or scenarios with a lot of white space, namely maximum compression continuous interval. For example, if the characters are mapped from 10 to 100, you need to store 90 numbers, but if you use the algorithm above, you only need to store startCode 10 and endCode 100.

Let’s take an example from official documentation:

The codepoint is 10-20, 30-90, 100-153, 126 characters in total, so the final glyf array length is 127 (the 0th glyf is reserved, and no one can use it), that is, the subscripts are continuous from 1 to 126. But the codepoints of our characters are not completely contiguous.

According to the above algorithm, we use startCode to represent the beginning of the interval, endCode to represent the end of the interval, idDelta to represent the gap between codePoint and Glyf’s index,

The glyfindex mapped to CodePoint 10 is 10-9=1

The glyfindex mapped to CodePoint 20 is 20-9=11

Codepoint 30 maps to Glyfindex which is 30-18=12

The glyfindex mapped to CodePoint 90 is 90-18=72

The glyfindex mapped to CodePoint 100 is 100-27=73

Codepoint 153 maps to Glyfindex which is 153-27=126

You can see that the final mapping value is what we want from 1 to 126, at the same time only three data completed 126 characters of information storage, is not clever!

local

Once we have glyf subscript, we need to know its content offset, which is in local, so the length of the local array is theoretically the same as the length of the GLYf array.

head

Font global information, including the font version number, creation time, modification time, and the coordinate of the bounding box, unitsPerEm can be considered as the base unit, if this value is 1000, then if your character height is 2000, you need to divide 2000 into 1000 when calculating the real height.

hhea

Font information for horizontal typography, including Ascent, Descent, lineGap

3.4 Font Styles

Here the font style mainly refers to bold and italic, you can see that there is basically no information about this aspect in the font file itself. In fact. For example, Arial. TTF indicates normal font, Arial bold. TTF indicates Bold font, Arial italic. TTF indicates Italic font, And Arial Bold italic. TTF indicates Italic font. But in fact, many fonts are not designed in italics, such as many Chinese fonts. In Chrome, however, we can italicize Chinese fonts. The reason is that the browser italics are false italics. As mentioned earlier, the font is drawn with lines, so we can use the Transform matrix to influence it.

CSS transform: matrix(1, 0, -0.2, 1, 0, 0);

If you are interested, try the above style. What will happen to your text

Four, tools,

As you can see, these binaries are inefficient to read on your own, so you need to use some parsing tools. At present, opentype.js is more famous in the industry, which supports TTF, OTF, WOFF and has more powerful functions. Many libraries operate fonts based on it, such as font clipping tool: Font-carrier.

But because font parsing is mostly full (the glyf parsing was later optimized to be on-demand), parsing is relatively slow and memory intensive.

Back to the embedding fonts in PDF files mentioned at the beginning, we all know that font files are very large, especially Chinese fonts, which are basically above 10M. If you just embed all the fonts used in PDF files violently, the final volume will become very large, so you need to cut the font based on the text used.

Five, the summary

This article only introduces the tip of the iceberg of TTF font format. In addition to the format itself, the author also feels the impact of binary. Without Map, there is a lot of knowledge behind how to use binary files to store complex data and extremely compressed volume. Finally, welcome to correct the wrong place ~

Reference:

  • Introduction to Web fonts: TTF, OTF, WOFF, EOT & SVG
  • TTF official document
  • OTF official documentation