Author: Chen Qiukai (Qiusuo)

preface

KubeDL is alibaba’s open source KUbernetes-based AI workload management framework, taken from the acronym of “Kubernetes-deep-learning”, hoping to rely on alibaba’s scene, large-scale machine Learning job scheduling and management experience back to the community. At present, KubeDL has entered the CNCF Sandbox project incubation, and we will continue to explore the best practices in cloud native AI scenarios to help algorithm scientists achieve innovation in a simple and efficient way.

With the latest Release of KubeDL 0.4.0, we have introduced the ability to manage model versions, which AI scientists can track, mark, and store as easily as they manage images. More importantly, in the classical machine learning pipeline, the two stages of “training” and “inference” are relatively independent. The “training -> model -> inference” pipeline in the perspective of algorithm scientists lacks a fault line, and the “model”, as the intermediate product of the two, can just play the role of “connecting the past and the future”.

Github:​​https://github.com/kubedl-io/kubedl​​

Website: https://kubedl.io/model/intro/

Current status of model management

Model files are the product of distributed training, and are the essence of algorithms retained after sufficient iteration and search. In the industry, algorithm models have become valuable digital assets. For example, Tensorflow training job usually outputs CheckPoint(.ckpt), GraphDef(.pb), SavedModel, etc., while PyTorch usually outputs the.pth suffix. Different frameworks parse the runtime data flow diagrams, run parameters, and their weights as they load the model, which is a file (or set of files) in a special format to the file system, just like JPEG and PNG image files.

Therefore, the typical management mode is to treat them as files and host them in unified object storage (such as Ali Cloud OSS and AWS S3). Each tenant/team is assigned a directory, and each member stores the model files in their corresponding subdirectory, and SRE controls the read and write permissions uniformly:

The pros and cons of this management style are obvious:

  • The advantage is that the API usage habits of users are retained. In the training code, you specify your own directory as the output path, and then mount the corresponding directory of cloud storage to the container of reasoning service to load the model.

  • However, this puts forward high requirements for SRE, unreasonable read/write authorization and misoperation may cause file permissions leakage, or even a large area of misdeletion; At the same time, file based management is not easy to achieve model version management, usually requires users to mark according to file name, or the upper platform to bear the complexity of version management; In addition, the corresponding relationship between model files and algorithm codes/training parameters cannot be directly mapped, and even the same file will be repeatedly overwritten in multiple trainings, making it difficult to trace the history.

Based on the above situation, KubeDL fully combines the advantages of Docker Image management and introduces a set of image-based Image management API to combine distributed training and reasoning services more closely and naturally, while greatly simplifying the complexity of model management.

Starting from the mirror image

Image is the soul of Docker and the core infrastructure of the container age. The image itself is a layered immutable file system in which the model file naturally acts as a separate mirror layer, and the combination of the two sparks other things:

  • Users no longer need to face the file management model, but directly use the ModelVersion API provided by KubeDL, training and reasoning services are bridged through ModelVersion API;

  • Just like mirroring, the model can be tagged to achieve version tracing and pushed to the unified Registry storage for authentication. At the same time, the storage backend of the mirror Registry can also be replaced with the user’s own OSS/S3, so that users can have a smooth transition.

  • Once the model image is built, it will become a read-only template, which can no longer be overwritten and usurped, practicing the concept of Serverless “immutable infrastructure”;

  • The image Layer reduces the cost of model file storage and speeds up the efficiency of distribution through compression algorithm and hashing.

On the basis of “model mirroring”, we can also fully combine the open source image management components to maximize the advantages brought by mirroring:

  • In large-scale inference service expansion scenarios, Dragonfly can be used to speed up image distribution efficiency. In traffic burst scenarios, stateless inference service instances can be displayed quickly, and traffic limiting that may occur when cloud storage volumes are mounted to large-scale inference service instances can be avoided.

  • In daily inference service deployment, ImagePullJob in OpenKruise can also be used to preheat model images on nodes in advance to improve the efficiency of capacity expansion and release.

The Model with ModelVersion

KubeDL model management introduces two resource objects: Model and ModelVersion. Model represents a specific Model, and ModelVersion represents a specific version during the iteration of that Model. A set of ModelVersions is derived from the same Model. Here’s an example:

apiVersion: model.kubedl.io/v1alpha1
kind: ModelVersion
metadata:
  name: my-mv
  namespace: default
spec:
  # The model name for the model version
  modelName: model1
  # The entity (user or training job) that creates the model
  createdBy: user1
  # The image repo to push the generated model
  imageRepo: modelhub/resnet
  imageTag: v0.1
  # The storage will be mounted at /kubedl-model inside the training container.
  # Therefore, the training code should export the model at /kubedl-model path.
  storage:
    # The local storage to store the model
    localStorage:
      # The local host path to export the model
      path: /foo
       # The node where the chief worker run to export the model
      nodeName: kind-control-plane
    # The remote NAS  to store the model
    nfs:
      # The NFS server address
      server: ***.cn-beijing.nas.aliyuncs.com
      # The path under which the model is stored
      path: /foo
      # The mounted path inside the container
      mountPath: /kubedl/models


---
apiVersion: model.kubedl.io/v1alpha1
kind: Model
metadata:
  name: model1
spec: 
  description: "this is my model"
status:
  latestVersion:
    imageName: modelhub/resnet:v1c072
    modelVersion: mv-3
Copy the code

The Model resource itself only corresponds to the description of a certain type of Model, and tracks the latest version of the Model and its image name to inform users. Users mainly define the configuration of the Model through ModelVersion:

  • ModelName: used to point to the corresponding modelName;

  • CreateBy: Creates this ModelVersion entity to trace upstream producers, usually a distributed training job;

  • ImageRepo: the address of the image Registry to which the image will be pushed after the model image is built;

  • Storage: the storage carrier of model files. Currently, we support NAS, AWSEfs and LocalStorage, and will support more mainstream storage methods in the future. The above example shows two model output modes (local storage volume and NAS storage volume). Generally, only one storage mode can be specified.

When KubeDL listens for the creation of ModelVersion, it triggers the model build workflow:

  1. Listen for ModelVersion events and initiate a model build;
  2. Create a PV and PVC based on the storage type and wait until the volume is ready.
  3. We created Model Builder for image construction in user mode. We adopted Kaniko’s scheme for Model Builder, and the construction process was exactly the same as the image format and the standard Docker, except that everything happened in user mode. Docker Daemon not dependent on any host;
  4. Builder will copy the Model file (either a single file or a directory) from the corresponding path of volume and use it as an independent mirror layer to build a complete Model Image.
  5. Push the generated Model Image to the mirror Registry repository specified in the ModelVersion object;
  6. End the entire build process;

At this point, the model corresponding to the ModelVersion is solidified in the mirror repository and can be distributed to subsequent inference services for consumption.

From training to modeling

Although ModelVersion supports independent creation and build initiation, we prefer to automatically trigger the build of the model after the successful completion of the distributed training job, naturally cascading into a pipeline.

KubeDL supports this submission mode. Take TFJob job as an example. When launching distributed training, the output path of the model file and the warehouse address pushed by the model file are specified. The creation of ModelVersion is not triggered when the job fails or terminates prematurely.

The following is an example of distributed MNIST training, which outputs the model file to the /models/model-example v1 path of the local node, and triggers the model construction when the successful operation is completed:

apiVersion: "training.kubedl.io/v1alpha1" kind: "TFJob" metadata: name: "tf-mnist-estimator" spec: cleanPodPolicy: None # modelVersion defines the location where the model is stored. modelVersion: modelName: mnist-model-demo # The dockerhub repo to push the generated image imageRepo: simoncqk/models storage: localStorage: path: /models/model-example-v1 mountPath: /kubedl-model nodeName: kind-control-plane tfReplicaSpecs: Worker: replicas: 3 restartPolicy: Never template: spec: containers: - name: tensorflow image: Kubedl/tf - mnist - estimator - API: v0.1 imagePullPolicy: Always the command: - "python" - "/keras_model_to_estimator.py" - "/tmp/tfkeras_example/" # model checkpoint dir - "/kubedl-model" # export dir for the saved_model formatCopy the code
% kubectl get tfjob
NAME                  STATE       AGE   MAX-LIFETIME   MODEL-VERSION
tf-mnist-estimator   Succeeded   10min              mnist-model-demo-e7d65
% kubectl get modelversion
NAME                      MODEL                    IMAGE                CREATED-BY          FINISH-TIME
mnist-model-demo-e7d65  tf-mnist-model-example   simoncqk/models:v19a00  tf-mnist-estimator   2021-09-19T15:20:42Z
% kubectl get po
NAME                                              READY   STATUS  RESTARTS   AGE
image-build-tf-mnist-estimator-v19a00   0/1     Completed     0         9min
Copy the code

With this mechanism, other Artifacts files that will be output only if the job executes successfully can also be solidified into the image and used in subsequent phases.

From model to inference

With this foundation, you can directly reference the built ModelVersion when deploying the inference service, load the corresponding model and directly provide the inference service externally. At this point, the algorithmic model life cycle (code -> training -> model -> deployment live) phases are linked through model-related apis.

When deploying an Inference service through the Inference resource object provided by KubeDL, you simply populate one of the Predictor templates with the corresponding ModelVersion name, The Inference Controller injected a Model Loader when creating Predictor, which pulled the image bearing the Model file to the local and mounted the Model file to the main container by sharing the Volume between containers to realize the Model loading. As mentioned above, combined with OpenKruise’s ImagePullJob, we can easily realize model image preheating to speed up model loading. For consistency of user perception, the model mount path of inference service is the same as the model output path of distributed training job by default.

apiVersion: serving.kubedl.io/v1alpha1 kind: Inference metadata: name: hello-inference spec: framework: TFServing predictors: - name: model-predictor # model built in previous stage. modelVersion: mnist-model-demo-abcde replicas: 3 batching: batchSize: 32 template: spec: containers: - name: tensorflow args: - --port=9000 - --rest_api_port=8500 - --model_name=mnist - --model_base_path=/kubedl-model/ command: - / usr/bin/tensorflow_model_server image: tensorflow/serving: 1.11.1 imagePullPolicy: IfNotPresent ports: - containerPort: 9000 - containerPort: 8500 resources: limits: cpu: 2048m memory: 2Gi requests: cpu: 1024m memory: 1GiCopy the code

For A complete inference service, it is possible to Serve multiple model versions of Predictor at the same time, such as A/B Testing to compare the effects of multiple model iterations in A common search recommendation scenario. This can be easily done through Inference+ModelVersion. We can use different versions of the models for different Predictors, and assign the appropriate weight of the flow, so that we can Serve different versions of the models and compare the results in a single inference service:

apiVersion: serving.kubedl.io/v1alpha1
kind: Inference
metadata:
  name: hello-inference-multi-versions
spec:
  framework: TFServing
  predictors:
  - name: model-a-predictor-1
    modelVersion: model-a-version1
    replicas: 3
    trafficWeight: 30  # 30% traffic will be routed to this predictor.
    batching:
      batchSize: 32
    template:
      spec:
        containers:
        - name: tensorflow
          // ...
  - name: model-a-predictor-2
    modelVersion: model-version2
    replicas: 3
    trafficWeight: 50  # 50% traffic will be roted to this predictor.
    batching:
      batchSize: 32
    template:
      spec:
        containers:
        - name: tensorflow
          // ...
  - name: model-a-predictor-3
    modelVersion: model-version3
    replicas: 3
    trafficWeight: 20  # 20% traffic will be roted to this predictor.
    batching:
      batchSize: 32
    template:
      spec:
        containers:
        - name: tensorflow
          // ...
Copy the code

conclusion

By introducing Model and ModelVersion two resource objects, KubeDL combined with the standard container image to achieve Model construction, marking and version tracing, immutable storage and distribution and other functions, liberating the extensive Model file management mode, mirroring can also be combined with other excellent open source community. Accelerate image distribution, model image preheating and other functions to improve the efficiency of model deployment. At the same time, the introduction of model management API well connects the two originally separated stages of distributed training and inference services, which significantly improves the automation degree of machine learning pipeline, as well as the experience and efficiency of algorithm scientists in on-line model and experimental comparison. We welcome more users to try KubeDL and give us valuable suggestions. We also look forward to more developers to pay attention and participate in the construction of KubeDL community!

KubeDL Github

​​https://github.com/kubedl-io/kubedl​​

Click here to learn about the KubeDL project now!