In modern life, we interact with text on a screen almost every day — text seems so mundane that many UI professionals have little understanding of the complexity behind it. The purpose of this series is to introduce some popular knowledge from the perspective of developers, from binary data of text to the flow between pixels, hoping to inspire students who are interested in it.

Font standards and formats

For those of you who like to play around with your systems, you will be familiar with the common font formats: For a long time on Windows, C:\Windows\Fonts has a large number of font files in TTF format. Accordingly, in the /Library/Fonts directory on macOS, there are also a bunch of Fonts: but there are also formats with suffixes TTC and OTF in addition to TTF. What does it matter?

An interesting question is why TTF fonts work on both macOS and Windows. Here lies the history of Microsoft’s py deal with Apple: In the 1980s, Adobe developed the vector-based proprietary font format Type 1 and the printing language PostScript (available in PDF format). Vector fonts compared to the dot matrix fonts of the time, there is no problem with the difference between infantry and cavalry:

Adobe was alive and well, but due to some non-technical (money) issues, Apple and Microsoft decided to start over. Apple developed TrueType, a standard for vector fonts, while Microsoft developed TrueImage, an alternative to PostScript. The two technologies are licensed between Mac and Windows, but the only real standard is Apple’s TrueType, which corresponds to the TTF font format.

Knowing that TTF stands for TrueType Font, all the other formats can be deduced:

  • How do you pack up a bunch of TTF fonts and publish them? Let’s do a Collection — TTC format.
  • Stop fighting, let’s collaborate and open up – hence OpenType OTF format.
  • We’re all about size on the Web, and your stuff is just too big to do it — hence the WOFF format.

Of course, knowing suffixes is no different from being proficient in Java/C++ spelling. Let’s take a look, what are hidden inside the font file?

Explore the TTF file

Many font format specifications emphasize that font files are made up of tables. Meow meow meow? Is this a spreadsheet like Excel? When you open a font file in TTF format, your first impression may not look like a table:

How many lines and columns are there? Tables, however, imply a relatively neat data structure. Sharp-eyed students may have noticed that the right-most column above is a set of four letters. This is no accident, according to the TTF format specification.

Before moving on to what they mean, let’s consider the following question: How do you design a data format that meets these requirements?

  • You have a variety of different fields to store, each of which has a fixed format but is of variable length.
  • The types of fields that need to be stored may vary, or new fields may be expanded at a later date. This should be compatible up and down.
  • You should be able to get basic information about fields (location, length, and so on) without walking through the file.
  • The data needs to be as small as possible, and it needs to support verification of content integrity.

JSON formats, which are currently popular in application layer development, are the first to be eliminated in terms of being as small as possible. The TTF specification provides an engineering practice for reference when designing data format specification:

  • Give each field a unique four-letter name, and each field’s content is a contiguous piece of binary data.
  • In the header of the file, you first store a table that “expresses the overall table structure.” Specify how many different fields there are, along with their length, starting position, and so on. This table is called the Offset table.
  • Immediately following this table, the contents of these field tables are pieced together to obtain the final TTF font.

Let’s take a look at how this design addresses these requirements:

  • The tables used to store font information are completely free in length and order.
  • There are no compatibility issues with the field type and subsequent extensions, and the degree of parsing can be determined from the Offset table to determine its support for the data.
  • In the Offset table, the Offset and length of each field data can be directly known.
  • Each field data is stored in the conventional binary form, and their checksum is also stored in the Offset table as a basic basis for judging the integrity.

A simple example illustrates the compactness of binary data structures. For example, when expressing metadata such as bold, italic, and equi-width fonts, the JSON format must specify a field of the form XXX: true for each state, which requires at least five bytes. And based on the provisions of the operation, in a 8-bit bytes can save eight such true | false Boolean type variable, often with redundancy. It also has unique advantages when it is necessary to store different precision data. Therefore, TTF is a table-driven design of some value when it comes to building proprietary data structures. In addition, when parsing such binary formats, the control flow of traditional imperative programming can be quite handy: don’t be drowned out by the noisy voices in the community, and learn the techniques that really apply to different scenarios.

Back to the original topic, what tables are stored inside fonts to express different content? Typr.js is a very simple Web tool that can be clicked out of the box, and looks something like this:

See the contents of the separate tables? They store critical information, from binary bits to screen pixels. So far, we’ve covered the format of a font file and the basic way to parse it. But how do you render text based on glyph data? See you in the next post (if there is one)

P.S.1 If you want to learn more about the data structure of fonts than this scientific article is enough, start with this official TrueType Reference Manual. Note that the Apple document begins with a link to Microsoft’s official website, which is hard to see elsewhere…

P.S.2 Our front end team welcomes xuebi at gaoding.com interested students who want to know more about rendering