“What will be hot and what to learn in 2022? This article is participating in the” Talk about 2022 Technology Trends “essay campaign.

Original text: Batteries – Included Workflow Orchestration Tool: Flyte | by ODSC – Open Data Science | Medium author: Ketan Umare

Machine learning (ML) has been used for decades, and tools to support researchers and engineers in this area are still evolving. Robots once dubbed “data scientists” have now become data engineers, machine learning engineers, and machine learning researchers, and of course, data scientists still play a key role in an organization’s data team.

As models become complex and data sources become diverse, infrastructure becomes a bottleneck. Often, data scientists have to get bogged down in low-level infrastructure issues: Kubernetes, networking, Gpus, resource management, etc. In addition, ML experiments require the ability to perform independent experiments quickly, which requires a collaborative approach.

If the infrastructure is tied to the work of the data scientist, then the range is too large to handle. Building pipelines for training production models, as well as running increasingly complex models and team collaborations, will generate a lot of infrastructure debate. This is a large area that requires a dedicated team or platform.

Developers need DevOps; Similarly, machine learning requires MLOps. For most data scientists, combining the various components in the MLOps technology stack is a requirement, which has led to the development of tools like Flyte.

About the Flyte

Flyte is an open source, container-native, structured programming and distributed processing platform that enables building highly concurrent, scalable and maintainable workflows for machine learning and data processing. It enables users to focus on business logic while outsourcing infrastructure management to a more appropriate team. It also allows the team that maintains the platform to provide a self-service platform for their users.

Here is a simple example of Flyte code that uses the Python Flytekit API to define a Flyte task to calculate the total compensation. The output is pandas DataFrame:

import pandas as pd
from flytekit import Resources, task@task(limits=Resources(cpu="2", mem="150Mi"))
def total_pay(hourly_pay: float, hours_worked: int, df: pd.DataFrame) ->
pd.DataFrame:
     return df.assign(total_pay=hourly_pay * hours_worked)
Copy the code

The main advantages of Flyte

  • K8s native workflow automation
  • User-friendly SDK for Python, Java and Scala
  • Version control and auditing
  • Reproducible Pipeline
  • Strongly typed system

Flyte is a feature-rich platform. In this article, we’ll learn about three basic features of Flyte.

Type checking

  • Flytekit SDK supports Python, Java, and Scala.

Python is known for its ease of use in large part because of its dynamic nature. Recently, however, that perception has changed. The Python community encourages the introduction of types in Python code because it helps to write error-free code and improves the readability of code.

The Flyte team believes that typing is an important aspect of writing code; Therefore, we have introduced a native type system in Flyte.

The Flytekit Python SDK automatically maps Python type hints to cross-language types that Flyte understands. Refer to the documentation to see all types mapped from Python to the Flyte type system.

To understand why we care about types in Flyte, let’s examine a Python function that has no type.

def concat(a, b) :
    return a + b
Copy the code

In this function, we cannot predict the types exactly: they can be integers, floating point numbers, or strings. Because they’re all possible. When we try to develop code on this basis, we may need to re-examine the code, simulating a type system in our head every time we use the output returned by this function, which can be quite tedious. Also, when errors occur, it is difficult to debug because we may have to analyze all data types to find the source of the error.

All in all, types can help eliminate this tedious process to some extent:

def concat(a: int, b: int) - >int:
    return a + b
Copy the code

Now we know that a and b are integers. We can easily explain what the function output is! It’s easy to understand and easy to debug!

For this reason, Flyte needs to use types. It supports strong data typing and has a robust type system to support multiple types.

The following example shows using dataclass JSON as a type in Flyte:

import typing
from dataclasses import dataclass
from dataclasses_json import dataclass_json
from flytekit import task, workflow

@dataclass_json
@dataclass
class Datum(object) :
     """ Example of a simple custom class that is modeled as a dataclass. """     x: int
     y: str
     z: typing.Dict[int.str]@task
def stringify(x: int) -> Datum:
     """ A dataclass return will be regarded as a complex single JSON return. """
     return Datum(x=x, y=str(x), z={x: str(x)})@task
def add(x: Datum, y: Datum) -> Datum:
     """ Flytekit will automatically convert the passed in JSON to a DataClass. If the structures do not match, it will raise a runtime failure. """
     x.z.update(y.z)
     return Datum(x=x.x + y.x, y=x.y + y.y, z=x.z)@workflow
def wf(x: int, y: int) -> Datum:
     """ Dataclasses (JSON) can be returned from a workflow. """
     return add(x=stringify(x=x), y=stringify(x=y))if __name__ == "__main__":
     """ This workflow can be run locally. During local execution, the dataclasses are marshalled to and from json. """
     wf(x=10, y=20)
Copy the code

Extending the type system is easy, and Flyte has many custom types.

Here is an example of FlyteFile, which is a custom file type in Flyte.

from flytekit import task, workflow
from flytekit.types.file import FlyteFile@task
def t1(f: FlyteFile) - >str:
     with open(f) as fp:
          data = fp.readlines()
     return data[0]@workflow
def wf() - >str:
     return t1(
          f="https://raw.githubusercontent.com/mwaskom/seaborndata/master/iris.csv"
     )if __name__ == "__main__":
     print(f"Running {__file__} main...")
     print(f"Running wf(), first line of the file is '{wf()}'")
Copy the code

FlyteFile tries to download a remote CSV file; The URL of the code examples, however, does not exist (the actual URL is raw.githubusercontent.com/mwaskom/sea…). .

When we run the code, we see the following error:

flytekit.common.exceptions.user.FlyteAssertion: Failed to get data from https://raw.githubusercontent.com/mwaskom/seaborndata/master/iris.csv to /var/folders/6r/9pdkgpkd5nx1t34ndh1f_3q80000gn/T/flyteabhpuq8t/20211011_163211/local_flytekit/d50f36f4119018dda42d601f76 ea0999/iris.csv (recursive=False).Original exception: Value error! Received: 404. Request for data @ https://raw.githubusercontent.com/mwaskom/seaborndata/master/iris.csv failed. Expected status code 200Copy the code

Therefore, with the Flyte type system, we can avoid pipeline errors by simplifying the debugging process and code readability.

We understand that not all software can migrate to usage types overnight. Therefore, we have been working hard to provide enough flexibility for the type system in Flyte to add support for arbitrary Python data types, which will be implemented in Flytekit 0.24.0. So we still recommend that you use types for maintainability.

Reproducibility and fault tolerance

Can reproducibility

From an ML or data pipeline perspective, rolling back to the past version is very important because sometimes the previous version is better than the new version. Machine learning is a big field where we can process huge amounts of data through multiple pipelines and algorithms. Similarly, data pipelines handle a lot of data processing. In this case, we should carefully manage our work (at least code and data), a task made more complicated by the inherent complexity of machine learning.

Flyte essentially supports reproducibility. Reproducibility is about the ability to “reproduce” or “reuse” our work. The idea of including reproducibility in Flyte came out of a problem the founding members of the Flyte team faced at Lyft. One notable example we observed was when one of our colleagues left Lyft. This colleague has developed a system that uses Geohash to quantify pickup and drop-off points and estimate round-trip times. Due to the peculiar nature of machine learning at the time, it took the team several months to restore the original algorithm in order to reproduce the results.

That’s why Flyte was built for deterministic computing, which plays a crucial role in machine learning. We want our algorithm to produce the same set of outputs given the same set of inputs. To achieve this, we need to plug related repeatability mechanisms into our system.

Here’s Flyte’s reproducibility support:

  • Each task runs in a separate environment; This keeps tasks from affecting each other.
  • Use Docker, Protobuf and powerful version control system to take snapshots of task code.

Fault tolerance

We don’t want to lose services when a sudden failure interrupts workflow. Tolerance of fault is tolerance of failure. It enables the service to continue running without problems.

In machine learning and data pipelines, the complexity of code and computation requires fault tolerance. Flyte fully understands this; It has some built-in fault tolerance. Flyte therefore ensures that the platform is designed to be recoverable.

Here are Flyte’s fault tolerance support:

  • User and system retry
  • suspended
  • Caching/persistence
  • Ensure that any past workflows are restored to the point of failure

Incremental development

Incremental development is the process of gradually developing a system. When building complex models through machine learning or data pipelines, we want to do this by building a module step by step on different modules.

Flyte is firmly committed to incremental growth. Here are some examples:

  • Start locally, run the code in Python, and then continue to deploy the model into production
  • Run a single task in remote Flyte and then run a workflow
  • Flyte supports usage domains: development, staging, and production
  • Run the entire workflow, caching the successful steps and iterating over the failed steps

The cache

Caching helps retrieve work faster, execute faster, and minimize waste of computing resources. Caching is useful when you need to perform the same task repeatedly with the same input.

Flyte has a neat way to enable caching. Here’s an example:

@ task (cache = True, cache_version = "1.0") def fetch_dataset (...). - >... :Copy the code

The cache_Version field indicates that the task function has changed. Raising cache_Version is similar to invalidating a cache.

conclusion

This article only Outlines Flyte’s capabilities. It is a K8S-native platform that supports end-to-end orchestration of workflows, from converting raw data into usable form to deploying models in production to serve users. Once deployed on a system, it becomes the perfect partner for machine learning and data processing pipelines.