This series of articles is a translation of the InnoDB series from Jeremy Cole’s Blog. This paper is the first of 16 articles. On Learning InnoDB: A Journey to the Core.

Due to the limited level of translation, in order to avoid misunderstanding to readers, some proper nouns will be followed by [] marking the original text.

About Learning InnoDB: A Journey through InnoDB’s core

I’ve been using InnoDB for about a decade, and by now I understand it pretty well, and most of the time I can do what I want with it. However, in order to achieve some of my efficiency-related goals, I found it necessary to take my understanding to a new level. Unfortunately, InnoDB documentation lacks a clear explanation of InnoDB’s internal data structures. It turns out that reading the source code is the only way to find the information you need.

However, I soon found that InnoDB’s internals and their uses (especially their interrelationships) were too complex to fully understand just by reading the code. Hopefully, you’ll be able to understand these structures correctly just by reading the code, and it’s possible (at least for me, there’s a lot of misunderstanding in reading the code).

I have long used a three-step approach to understanding complex but poorly documented content:

  1. Read existing documentation and existing code until basic understanding is reached. In this process, serious errors of understanding or analysis often occur.
  2. Writing my own implementation, even a very basic, incomplete one, is best done in a completely different language (to avoid copying and pasting). Revise my understanding based on what works and what doesn’t.
  3. Create new documents and charts based on the new understanding. Refactor my code implementation as needed (review everything for documentation and often find errors in the analysis), and write the correct documentation based on the new understanding of the refactoring code. Repeat this process until everything is correct.

Implement InnoDB disk data structure

I started the Innodb_Ruby project, where I implemented the InnoDB disk data structures in Ruby. I chose Ruby because it’s very flexible, very fast for prototyping, and it’s my favorite language by far. This can be done in any language, and performance is not really our concern (although we don’t want it to be too slow, or it will make testing very annoying).

At the beginning of the project, I finished parsing the FIL Header structure for 16KB pages in a few minutes (FIL Header is a structure common to all types of pages in InnoDB). After a few hours, I was able to parse the INDEX page header and was able to answer some very basic questions, such as how many records were in each INDEX page — an immediate and useful result.

I implemented other key data structures as needed, each of which gives us a deeper understanding of each layer of InnoDB storage. At this point Davi joined the project and finished some of the trickier parts, such as dealing with variable width field types in the records.

Now we basically have a read-only implementation of InnoDB’s main data structure.

Chart InnoDB’s disk data structures

When I discovered enough secrets in InnoDB, I felt I could start making charts without making serious mistakes, so I set about building clear and easy to understand charts for all the major InnoDB disk data structures. To do this I started the Innodb_Diagrams project and chose to build these diagrams in OmniGraffle.

So far, most of the charts have been completed for table space files that are stored on disk in Barracuda format (record format is COMPACT). The Antelope format (REDUNDANT format) table space files and the diagrams associated with the InnoDB compressed tables still need to be supplemented. The same goes for the related chart for the log file format.

Use code and diagrams

Now that we have the code for the demo, and the diagrams that can be a good complement, I plan to write a few articles describing some of the more interesting but less documented data structures. Stay tuned!