The originalBuilding a Production Database in Ten Years or LessPublished January 25, 2022 by Yury Selivanov (@1st1).
This article has a special HN discussion area.
Finally, you get tired of filling the $here with your usual database brand, so you have to go off and start a new database. You have a bunch of ideas, a bunch of notes, and an unwavering belief that this is the database the world has been waiting for. Now all I have to do is quit my job and have a 1.0 liver in a few months, right?
Maybe you can do it! But that was — to put it mildly — a little different from our experience.
Before we enter the post-1.0 era of EdgeDB, we want to reflect on the road we’ve traveled — bumpy, meaningful, and always much longer than we thought. Along the way we worked on two major open source projects (Uvloop and Asyncpg), introduced the async/await keyword to Python, grew into a 10-person open source company, and learned some lessons (hopefully), Hopefully it will be helpful to others who want to turn around the database. 🚀
After 2,100 PR sessions and 4,600 integration tests, we will release the first stable version of EdgeDB on the 10th Day of the first lunar month of the Year of the Tiger — just a few weeks from now. As part of the first EdgeDB Day, the “mini conference” will be streamed live online and in two hours, Answer your questions about EdgeDB: What it does, what a Graph-Relational database is, star EdgeQL, and more in a series of lightning talks.
Online liveOn YouTube, free registration is available; After the conference, the translator will try to make Chinese subtitles and post them on site B.
Let’s go back to the beginning.
In 2008: MagicStack
That year, Elvis, my EdgeDB co-founder, and I started MagicStack, a small, sophisticated software development workshop. Over the years, we’ve served a wide variety of clients, from early stage startups to Fortune 500 companies like GENERAL Electric and Microsoft.
2009-2014: A mess
We realized early on that we were solving the same problems over and over again on different projects. It’s a drag and takes time away from what should be fun and creative work.
Tired of existing technology and being pushed by Google Wave, we chose to incubate our own toolset, mostly written in Python, for various outsourcing projects.
- Component-based declarative user interface generator. (Familiar?)
- An RPC library for building back-end services, with support for decorators and some “metaprogramming” capabilities.
- A wrapper for loading media elements and stylesheets into Python files.
The ace in the hole is a data layer “super ORM” called Caos, which provides:
- An object-oriented schema definition language with a syntax similar to YAML.
- Supports schema mixin composition, indexes, constraints, dynamically computed attributes, and rich introspection.
- A query language that supports query composition and deep nesting (naturally called CaosQL,Like language).
Jumble parses jumble queries, validates them against the current schema, and compels them into equivalent SQL statements. That was our secret weapon at the time, and we delivered fast and delivered happily. With every project, the mess gets a little better. Many of the concepts that are important to EdgeDB today, such as linking, schema multiple inheritance, simple nested queries, and emphasis on introspection, were already in the “mess” of the day.
Because our code relies heavily on “metaprogramming,” in late 2012, when the Python community started looking for volunteers to design a better introspection API for Python (PEP 362), we jumped at the chance to make our first big push into open source, Because our RPC library also needs this introspection API.
Soon after, I started submitting PR to Asyncio and later became a Python core developer in 2013. Between 2013 and 2015, Elvis and I went into output mode, contributing more code and projects to the Python language itself and its ecosystem.
2014: Yacht Gathering
It was 2014, and we chose to start a product of our own because we didn’t want to consult anymore. “Chaos” has laid a very outstanding technical foundation for us, so our first product is naturally… Yacht chartering. You heard right, the product is called “Yacht Collection”.
Those months of hard work were fun, though ill-fated. With Chaos and the other toolsets we accumulated, Elvis and I were able to build a core product in record time, but we weren’t as quick to realize that yacht chartering wasn’t a solid business. It was a lifelong learning experience.
We looked back at the wheels we had saved up over the last 7 years, and most of them had been surpassed by other people with a JavaScript ecosystem — React replaced our component UI library, GraphQL handled most of the scenarios in our RPC library, Our packager can now be replaced with a better Webpack.
We haven’t seen similar breakthroughs at the database level, and frankly it seems to be going downhill — we’ve seen NoSQL data storage software take off without Schema, and it feels like the entire software industry, after years of struggling with SQL and relational models, has suddenly decided to capitulate en masse.
Then one night Elvis and I were walking briskly down the back road to work in Toronto when we had an Epiphany. “Clutter” represents a new direction for technology — not as an ORM framework, but as a database.
2015: I/O tangle
To make progress on EdgeDB, we still need to build several important pieces of infrastructure.
Our original plan was to refactor Clutter from a Python framework into a full-fledged database server capable of handling query requests received over the network. In order to support tens of thousands of concurrent connections, we can’t use the multithreaded blocking I/O model (for another blog), we can only use the asynchronous model.
Around the same time, we were working on a default API for EdgeDB’s Python client library, but we soon hit a snaggy path with the database transaction API — the coroutines were still yielding from, so the statement length was too long:
tx = yield from conn.start_transaction()
try:
print(yield from tx.query('... '))
except Exception:
yield from tx.rollback()
raise
else:
yield from tx.commit()
Copy the code
We realized that if Python could natively support the Async with syntax, transactions could be written elegantly. So I decided to start a PEP.
async with conn.transaction() as tx:
print(await tx.query('... '))
Copy the code
Soon, the draft of the PEP expanded async for (to support asynchronous database cursors) and basic async/await. It took me a few weeks to get PEP 492 out of my head, and it was all done in April 2015, finally approved before the Python 3.5 feature freeze, and PEP 492 became an official part of Python. The adoption of a proposal of this magnitude by the Python community is one of the most exciting things I have ever experienced.
Early 2016: Cython debuts
When we began to realize that pure Python was no longer fast enough for EdgeDB to handle I/O, we used Cython as a guinea pig because we could quickly prototype with python-like syntax comparable to native performance. So we spent a couple of weeks grafting Libuv, originally a non-blocking I/O network library for Node.js, onto Cython.
This is called uVLoop and can be used directly to replace Python’s built-in Asyncio event loop core, providing a two – to four-fold performance boost. The uvloop blog post was getting a lot of traffic at the time, and uvloop went viral almost overnight. Now we’re using it as a stepping stone in EdgeDB.
Mid-2016: High speed train to Postgres
EdgeDB uses Postgres as its underlying support because Postgres, as the world’s most advanced open source SQL database, provides the ultimate foundation for better new database abstractions.
Tip: If Postgres is already a database, and EdgeDB is now based on Postgres, then EdgeDB should be an ORM (relational Object mapper).
We don’t think so, because EdgeDB formally defines its own query language, EdgeQL, which not only benchmarks the full functionality of SQL, but also exceeds SQL in terms of simplicity; EdgeDB is not limited to the user’s programming language, and most ORM libraries are designed for a particular language; With its own type system, standard libraries, binary communication protocols, client libraries for multiple programming languages, command-line tools, development workflows, and usage conventions, EdgeDB is a database in any case.
With the async/await keyword and a high-performance asynchronous event loop library, we began to evaluate the state of asynchronous Postgres drivers in Python. Unfortunately, none of them worked, so we had to do our own rounds, and in the process learned a lot about the pros and cons of the Postgres binary protocol.
In 2016, AT the EuroPython conference in Bilbao, Spain, I was able to demonstrate our work, AsynCPG, because we finished the final development work two hours before the presentation, and Elvis, as always, made a crucial last-minute fix. In August, we launched Asyncpg on HN, and since then it has been widely used, earning 5,200 stars on GitHub.
2017-2018: Let’s make a database
With async/await, Cython, uvloop and Asyncpg we can finally implement EdgeDB. Although we still do outsourced consulting full time, we still do whiteboard design at night, and the main structure of EdgeDB today is the type system, EdgeQL syntax, and SDL (Schema Definition Language). With that in mind, we started trying to push a wave of technical preview EdgeDB.
To force ourselves to deliver on time, we signed a gold sponsorship contract for PyCon 2018, which is only five months away. We booked a booth, printed 3,500 brochures, and did everything we could think of. Naturally, we beat the whistle again for a preview, spending the night at an unnamed b&B in Cleveland, USA.
It turned out to be great. People at PyCon loved EdgeDB, we answered hundreds of questions from attendees, received a mountain of feedback, and delivered on our EdgeDB promise.
Early 2019: First alpha release
Since then, we’ve scaled back our outsourcing consulting work to a subsistence level, freeing up time to focus on the first alpha version of EdgeDB. The previous technical preview is a good prototype, but EdgeDB needs to get some heat up before the first release.
We seriously rectified the type system of EdgeDB, optimized the security and availability of operators and type conversion system, and added built-in GraphQL support. Based on our experience in asyncpg and understanding of Postgres protocol, we designed a set of EdgeDB binary communication protocol. And a Python client based on this protocol. After this time, the EdgeDB body has been finalized.
In April 2019, exactly one year after showing the EdgeDB technology preview, we released our first alpha release, which was well received on HN (see “How to Make Databases ten times More Efficient” in Chinese). Not long after, our other blog post, “That’s SQL? We can Do Better,” reached the top of the HN (and has been well discussed on V2EX). This reinforces our belief that there is considerable underlying dissatisfaction with the development experience of existing databases and that we must continue to work on it.
Late 2019: EdgeDB Inc
In the second half of 2019, we moved our corporate headquarters from Toronto, Canada to San Francisco and established EdgeDB.
2020: The Year of Alpha
Over the course of the year, we’ve grown from three people to nine and released six subsequent alpha releases.
These releases include a whole bunch of new tools, including:
- A new command-line tool (CLI) written in Rust;
- A data structure migration system and accompanying CLI development workflow;
- Continuous optimization of binary communication protocols;
- A JavaScript client library;
- Backup and recovery functions;
- A new EdgeQL syntax for automatically creating or updating data (UPSERT);
- And the best interactive query tools.
If you don’t mind, you can check out our blog for details on the evolution of all six alpha releases.
2021: Year of beta
The first beta release of EdgeDB was released in February 2021, and in the following year we released a total of three beta releases and three RC releases.
As EdgeDB’s core product matures, we shift our focus to the development workflow — how can EdgeDB’s developer experience fully transcend all existing databases? This includes database installation, database instance management, migration execution, document reading, use of command line tools, data query, and so on.
HMM… After some hard work, we redesigned our command line tool (RFC 1006), upgraded the client library, designed a more ergonomic API, added support for Go and Deno, and added a feature for Deno TLS ALPN. We rewrote the EdgeDB documentation and published an interactive book called The EdgeDB I Ching (Chinese version on the way!). , and created a gorgeous note-taking style EdgeQL tutorial (Chinese version already scheduled!) . We also started developing a query constructor for TypeScript, or consigned the ORM to the dustbin of history — we’ll see what happens next. 👀
2022: Fly toThe universeStability, the vast expanse!
That’s it for now. In a couple of weeks, we’ll be releasing EdgeDB stable 1.0. Hope you enjoy it. It has been a remarkable journey, to say the least.
This article has not said much about what EdgeDB is. If you are interested, please join our launch event on the 10th day of the Chinese New Year (the event is in English on YouTube, and the Chinese subtitle version will be posted on site B). It is a two-hour mini conference, hoping to answer all your questions. Click to collect tickets:
Thanks to Colin McDonnell and Elvis for feedback and edits on this post.
Welcome to our websiteThe official website,OSCHINA Project home page,Zhihu columnandThe nuggets columnFor more information.