How to implement a simple database system from 0 to 1

As Web practitioners, we work with relational databases every day, but it’s just a black box to us, so we want to know how databases work. So I was trying to figure out how the database works, what are its basic principles? So I decided to start with the following questions to understand the database system

In what format is the data stored in the database in memory or on disk? When is data moved from memory to disk? (Persistence) Why does each table in the database have one and only one primary key? How does transaction rollback work in a database? How are indexes formatted in a database? When and how to scan the full table? In what format are prepared statements saved? In other words, how does a database work? I should point out that I wrote a simple mini database from scratch, which is modeled after SQLite, because its design is smaller than MySQL or PostgreSQL, and its design is easier to move around, understand, and imitate! I also have a better understanding of how databases work. I keep the entire database in a single file.

There are many articles on their websites about the actual implementation of SQLite. SQLite database system: Design and implementation.

A query can retrieve or modify data through a series of components. The front end consists of the following parts

The Tokenizer Code Generator inputs an SQL query from the front end and outputs SQLite virtual machine bytecode (essentially a compiler that can operate on a database). In other words we have a console for input and output, and when an SQL query is entered on the console, the console displays the bytecode of the SLqite VIRTUAL machine. The backend consists of the following

The bytecode generated at the front of a VIRTUAL machine is an instruction, which can then operate on a table or index of a database. These database tables or indexes are stored in the b-tree data structure. A VM is essentially a large switch statement of bytecode instruction type.

Each B-tree consists of multiple nodes. Each node is one page in length. B-tree can retrieve a page from disk or save it back to disk by sending a command to pager.

Pager can receive commands to read and write pages data. It is responsible for reading/writing database files at the appropriate offset. It also caches recently accessed pages in memory and determines when those cached pages are written back to disk.

At the OS interface level, depending on which operating system sqLite is compiled on, there will be no support for multiple platforms in this course.

A journey of a thousand miles begins with a single step, so let’s start with something more straightforward: the REPL

Make a simple REPL SQLite starting with the read-execut-print loop when you type certain commands

~ sqlite3 SQLite version 3.16.0 2016-11-04 19:09:39 Enter ". Help "for Usage hints. Connected to a transient in-memory database. Use ".open FILENAME" to reopen on a persistent database. sqlite> create table users (id int, username varchar(255), email varchar(255)); sqlite> .tables users sqlite> .exit ~Copy the code

To do this, our function needs an infinite loop to print the identifier, get a line of data, and then process the command:

int main(int argc, char* argv[]) { InputBuffer* input_buffer = new_input_buffer(); while (true) { print_prompt(); read_input(input_buffer); if (strcmp(input_buffer->buffer, ".exit") == 0) { close_input_buffer(input_buffer); exit(EXIT_SUCCESS); } else { printf("Unrecognized command '%s'.\n", input_buffer->buffer); }}}Copy the code

We need to define the InputBuffer as a little wrapper that contains the states we need to interact with getLine () (more on getLine later).

typedef struct {
  char* buffer;
  size_t buffer_length;
  size_t input_length;
} InputBuffer;

InputBuffer* new_input_buffer() {
  InputBuffer* input_buffer = (InputBuffer*)malloc(sizeof(InputBuffer));
  input_buffer->buffer = NULL;
  input_buffer->buffer_length = 0;
  input_buffer->input_length = 0;
  return input_buffer;
}
Copy the code

Next, we need to define a print_prompt() function to print the identifier and call it before each line is read

void print_prompt() { printf("db > "); }
Copy the code

Read each line of input, using the getLine () function

size_t getline(char **lineptr, size_t *n, FILE *stream);
Copy the code

Lineptr: pointer to a variable used to point to the buffer containing read rows. If set to ‘NULL’ it is allocated by getLine (), so the user will release it even if the command fails. N: Pointer to the variable used to allocate buffer size. Stream: Reads the input stream, which we will output as standard output. Return value: The size of the byte read, which may be smaller than the size of buffer. We will tell getLine () to store the read row in input_buffer->buffer and the allocated buffer size in input_buffer->buffer_lenth. We store the return value in input_buffer->input_length. Buffer starts null, so getLine () allocates enough memory to hold the input and make buffer point to the input.

void read_input(InputBuffer* input_buffer){
  ssize_t bytes_read =
      getline(&(input_buffer->buffer), &(input_buffer->buffer_length), stdin);

  if (bytes_read <= 0) {
    printf("Error reading input\n");
    exit(EXIT_FAILURE);
  }

  // Ignore trailing newline
  input_buffer->input_length = bytes_read - 1;
  input_buffer->buffer[bytes_read - 1] = 0;
}
Copy the code

We have now defined a suitable function to free the memory allocated to InputBuffer * and the respective structures of the elements in the buffer. The getline() function in read_input() allocates memory to input_buffer->buffer.

void close_input_buffer(InputBuffer* input_buffer){
  free(input_buffer->buffer);
  free(input_buffer);
}
Copy the code

Eventually we will parse and execute the command. Currently we have only one parse. exit command, which is used to terminate the program. Otherwise we print an error message and continue the loop.

if (strcmp(input_buffer->buffer, ".exit") == 0) {
  close_input_buffer(input_buffer);
  exit(EXIT_SUCCESS);
} else {
  printf("Unrecognized command '%s'.\n", input_buffer->buffer);
}
Copy the code

Wow we can finally test our results!! To run the program, we first type.tables, which prints out Unrecognized command. Tables and then.exit.

~ ./db
db > .tables
Unrecognized command '.tables'.
db > .exit
~
Copy the code

Now that we have successfully completed a REPL, in the next section we will start developing our command language. In the meantime, here is our code for this chapter:

#include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <string.h> typedef struct { char* buffer; size_t buffer_length; size_t input_length; } InputBuffer; InputBuffer* new_input_buffer() { InputBuffer* input_buffer = malloc(sizeof(InputBuffer)); input_buffer->buffer = NULL; input_buffer->buffer_length = 0; input_buffer->input_length = 0; return input_buffer; } void print_prompt() { printf("db > "); } void read_input(InputBuffer* input_buffer) { size_t bytes_read = getline(&(input_buffer->buffer), &(input_buffer->buffer_length), stdin); if (bytes_read <= 0) { printf("Error reading input\n"); exit(EXIT_FAILURE); } // Ignore trailing newline input_buffer->input_length = bytes_read - 1; input_buffer->buffer[bytes_read - 1] = 0; } void close_input_buffer(InputBuffer* input_buffer) { free(input_buffer->buffer); free(input_buffer); } int main(int argc, char* argv[]) { InputBuffer* input_buffer = new_input_buffer(); while (true) { print_prompt(); read_input(input_buffer); if (strcmp(input_buffer->buffer, ".exit") == 0) { close_input_buffer(input_buffer); exit(EXIT_SUCCESS); } else { printf("Unrecognized command '%s'.\n", input_buffer->buffer); }}}Copy the code

Follow me, there will be more wonderful content just for you

How to implement a simple database system from 0 to 1

Related Posts

Microservices series: Seata for Distributed Transactions Spring Cloud Alibaba

3-04 Multithreading related

NullPointerException when Java injects a Service into a static utility class