Make writing a habit together! This is the 12th day of my participation in the “Gold Digging Day New Plan · April More text Challenge”. Click here for more details.

Service definition, the embodiment of service-oriented Architecture (SOA), is the core building block in BentoML, where users define the service runtime architecture and model the logic of services.

This article examines and explains the key components in the service definition. Gives you an overview of what constitutes a service definition and the responsibilities of each key component.

A component

The model service definitions created in our earlier Quick start guide are shown below.

# bento.py
import bentoml
import numpy as np

from bentoml.io import NumpyNdarray

# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")

# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])

# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
    # Define pre-processing logic
    result = runner.run(input_array)
    # Define post-processing logic
    return result
Copy the code

It can be seen from the above that the BentoML service consists of three components:

  • Reasoning APIs
  • Runners
  • Service

Reasoning APIs

The inference API defines how to remotely access service functionality and customize pre-processing and post-processing logic.

# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_array: np.ndarray) -> np.ndarray:
    # Define pre-processing logic
    result = runner.run(input_array)
    # Define post-processing logic
    return result
Copy the code

By decorating a function with @svc.api, we declare that the function is part of the API that can be accessed remotely. A service can have one or more apis. The input and output parameters of the @svc.api decorator further define the EXPECTED IO (input and output) format of the API.

In the example above, the API defines the IO (input/output) type as numpy.ndarray through the NumpyNdarray IO descriptor. IO descriptors help verify that the input and output conform to the expected format and pattern, and convert them to and from the original type. BentoML supports a variety of IO descriptors, including PandasDataFrame, String, Image, and File.

The API is also a good place to define pre – and post-processing logic for model services. In the example above, the logic defined in the Predict function will be packaged and deployed as part of the service logic.

BentoML aims to parallelize API logic by starting multiple instances of an API server based on available system resources. For best performance, we recommend defining asynchronous apis. For more information, refer to API and IO descriptors.

Runners

Runners represent a unit of service logic that can scale horizontally to maximize throughput.

# Load the runner for the latest ScikitLearn model we just saved
runner = bentoml.sklearn.load_runner("iris_classifier_model:latest")
Copy the code

Runners can be created by calling the frame-specific load_runner() function or by using the decorator implementation class of the @svc.Runner decorator.

Framework-specific functions intelligently load the best configuration of the ML framework for the runner to achieve the most fixed support.

For example, if the ML framework publishes a Python GIL and natively supports concurrent access, BentoML will create a single global instance of Runner and route all API requests to the global instance; Otherwise, BentoML creates multiple instances of the runner based on available system resources.

Don’t worry, we also allow you to customize the runtime configuration to fine tune the runner performance.

The load_runner() function takes the name and version of the model we saved earlier. Using the latest keyword ensures that the latest version of the model is loaded. The load runner also declares to the builder that specific models and versions should be packaged into Bento when building the service. We can also define multiple runners in the service.

For more information, see the Runner advanced Guide.

Service

The Service consists of API and Runner and can be initialized via betoml.service ().

# Create the iris_classifier_service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier_service", runners=[runner])
Copy the code

The first parameter to the service is the name, which will become the name of Bento after the service is built.

Runners should be part of a Service, passed in through the Runners keyword argument. The build time and runtime behavior of the service can be customized through SVC instances.