Hello world! The team is excited to announce Prefect, an open source framework for building robust data applications. Prefect is inspired by looking at the conflicts between data engineers and data scientists and resolves these issues by using functional apis that define and execute data workflows.
The biggest problem data engineers face is the never-ending task we call “reverse data engineering.”
- Forward data engineering is what we usually think of engineers doing: writing code to achieve goals.
- Reverse data engineering is when engineers write defensive code to ensure that positive code actually runs. For example: What if the data arrives in an incorrect format? What if the database goes down? What if the computer on which the code is running fails? What if the code executes successfully but the computer malfunctions before reporting success? Reverse engineering is characterized by the need to predict such infinite possible failures.
Engineers tell us that they typically spend 90% of their time on reverse or defensive problems and only 10% on hiring them to build positive solutions. This means that focusing on reverse engineering has extraordinary impact: if we can reduce the proportion of reverse-engineering to just 80%, we can effectively double the engineers’ forward productivity because they can spend 20% of their time working on functional code.
However, if you look at the entire data ecology landscape, you will hardly see any recognition of this problem (a focus on reverse engineering). Most people will tell you that it is too difficult for third parties to solve these problems because, by their very nature, they are so specific to a company-specific business practice.
In Prefect, we know better. Considering a large number of unexpected but important adverse outcomes is something I have had to do throughout my career as a risk manager. In risk, one does not try to predict and avoid every possible outcome; Instead, a series of concepts and tools have been developed that are specific enough to be used, yet general enough to deal robustly with the unknown. One of the key requirements for doing this effectively is to be able to gather as much relevant experience as possible as quickly as possible.
For the past three years, I have been a PMC member of Apache Airflow, the most widely used open source software for data engineering workflows. In addition to giving me insight into the various technical challenges, it also meant that I received thousands of emails from data engineers and scientists asking for help solving problems they were encountering. Through these conversations, I gained a special insight into reverse engineering problems. In fact, taken in isolation, each problem does seem unique. But in general, striking paradigms emerged: the same common problems resurfaced over and over again. For a long time I attempted to resolve these issues within the scope of Airflow; When I have reached my Airflow limit, I begin to design Prefect. That was almost two years ago.
Today, Prefect is encoded by examples we observe in modern data engineering. We worked very hard to build a system that automatically enabled best practices, even for data applications that we had never seen before. To see how this works, consider how you can instantly recognize whether “the sky is blue” is true or “the sky is blue” is false without remembering every combination of English words. Just as your brain has broad rules for language, Prefect can detect when something is wrong, even if we can’t pinpoint exactly what or why. This ability makes reverse engineering much easier and saves our users a lot of time and headaches.
Prefect is a simple practice. Negative engineering problems are not always convoluted, complex, or difficult. On the contrary: they are usually secondary, annoying and repetitive. Thus, while their overall impact is impressive, and they are filtered through the sieve we use to identify major problems, at Prefect we have found that most data applications can be decomposed into simple structures, and by focusing on these basic building blocks we can solve reverse problems, Without sacrificing any functionality or adding complexity required for forward engineering. Our users allow creative license, and Prefect acts as a beacon of safety for them by combining these pieces in compelling and unexpected ways.
In our core framework, we provide two things. One is our open source framework, which operates like a hardware store: it has all the components you need to build great data applications. The other is our platform logic, which we view as the store manager: guiding users to the right tools and ensuring the success of their projects. Together, we can provide compelling solutions to both forward and reverse engineering problems.
Basically, we like to solve problems that users hate.
Positive and Negative Engineering