Written by Ali Tao Department F(X) Team – Mou mou
Pipcook 1.0 enabled Web developers to start machine learning at a relatively low barrier, opening and accelerating the era of front-end intelligence. In practice, we also found some problems, among which the most frequent feedback from users was that it was difficult to install Pipcook and the success rate was low. The installation often failed due to network problems. Even if the network was smooth, the installation took at least 3 minutes. It also takes a lot of time for the user.
We did a lot of refactoring and optimization in 2.0 to address the 1.0 issues. Let’s take a look at the installation speed of Pipcook 2.0:
From a few minutes in 1.0 to less than 20 seconds! You no longer need to install the daemon through pipcook init. Thanks to the decoupling of the heavy machine learning framework, package size is effectively controlled, with an installation success rate close to 100%.
Let’s take another look at the time from model training to service coming online for text categorization tasks:
Yes, it only takes 20 seconds from model training to text classification service online!
Next, take a hands-on look at how Pipcook 2.0 can be used to quickly train models and deploy them online.
Pipeline is introduced
Before we get started, we need to understand the Pipeline. In Pipcook, we use pipelines to represent the workflow of a model. Currently, four pipelines are implemented, which are:
The name of the Pipeline | Task type | Pipeline file CDN link |
---|---|---|
image classification MobileNet | Image classification | image-classification-mobilenet.json |
image classification ResNet | Image classification | image-classification-resnet.json |
text classification Bayes | Text classification | text-classification-bayes.json |
object detection YOLO | Target detection | object-detection-yolo.json |
The Text Classification Bayes model is demonstrated at the beginning of this article.
So what does this Pipeline look like? Pipeline uses JSON to describe the sample collection, data flow, model training/prediction phases and parameters associated with each phase.
{
"specVersion": "2.0"."type": "ImageClassification"."datasource": "https://cdn.jsdelivr.net/gh/imgcook/pipcook-script@5ec4cdf/scripts/image-classification/build/datasource.js?url=http:// ai-sample.oss-cn-hangzhou.aliyuncs.com/image_classification/datasets/imageclass-test.zip"."dataflow": [
"https://cdn.jsdelivr.net/gh/imgcook/pipcook-script@5ec4cdf/scripts/image-classification/build/dataflow.js?size=224&size = 224"]."model": "https://cdn.jsdelivr.net/gh/imgcook/pipcook-script@5ec4cdf/scripts/image-classification/build/model.js"."artifacts": []."options": {
"framework": "[email protected]"."train": {
"epochs": 10}}}Copy the code
As shown in JSON above, a Pipeline consists of version, Pipeline type, dataSource, Dataflow, and Model scripts, as well as build plug-in artifacts, Pipeline options. Currently supported Pipeline types include ImageClassification (ImageClassification), TextClassification (TextClassification), ObjectDetection (target detection), Support for other task types will continue to be added in subsequent iterations of Pipcook. Each script passes parameters via URI Query, and the parameters of the Model script can also be defined via options.train. Artifacts define a set of build plug-ins, each of which is called sequentially after the training to transform, package, deploy, and so on the output model. Options contains Framework definitions and training parameter definitions.
In this example Pipeline, the task type is ImageClassification, or ImageClassification. We also define the image classification script required data sources, data processing script, model script, we are prepared to data stored on the OSS (ai-sample.oss-cn-hangzhou.aliyuncs.com/image_class…). , the sample contains two categories, namely Avatar and blurBackground. You can also replace them with custom data sets to train your own classification model. The framework that the Pipeline used to run depends on is defined as TFJS 3.8, and the training parameters are 10 Epochs.
Next, we can run it through Pipcook.
Run the Pipeline
The installation
To install and run Pipcook, several conditions need to be met:
- Operating system: MacOS or Linux, (Windows has basic support, but not fully tested)
- Node.js v12.17.0 or above (not supported by V13)
Then run the command:
$ npm install @pipcook/cli -g
Copy the code
Wait until the installation is complete.
training
We save the Pipeline file as image-classification. Json and execute:
$pipcook train. / image - classification. Json - o my - pipcook ℹ preparing framework █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 100% of 133 MB or 133 MB ℹ preparing scripts █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 100% 1.12 MB / 231 kB █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 100% 11.9 kB / 3.29 kB █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ █ 123 kB / 23.2 100% KB ℹ Preparing artifact plugins ℹ initializing Framework Packages ℹ running datasource script
downloading dataset ...
unzip and collecting data...
ℹ running data flow script
ℹ running model script
Platform node has already been set. Overwriting the platform with [object Object]. 2021-08-29 23:32:08.647853: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructionsin performance-critical operations: AVX2 FMA
To enable them inother operations, rebuild TensorFlow with the appropriate compiler flags. loading model ... Epoch 0/10 start Iteration 0/20 result -- loss: 0.8201805353164673 Accuracy: 0.5 Iteration 2/20 result -- loss: 0.03593956679105759 accuracy: 1... Epoch 9/10 start Iteration 0/20 result -- loss: 1.1920930376163597E-7 Accuracy: 1 Iteration 2/20 result -- loss: 2.0116573296036222E-7 accuracy: 1 Iteration 4/20 result -- loss: 2.5331991082566674E-7 accuracy: 1 Iteration 6/20 result -- loss: 2.123416322774574E-7 Accuracy: 1 Iteration 8/20 result -- loss: 1.937151523634384E-7 Accuracy: 1 Iteration 10/20 result -- loss: 0.000002644990900080302 Accuracy: 1 Iteration 12/20 result -- loss: 0.000003799833848461276 Accuracy: 1 Iteration 14/20 result -- loss: 2.8312223321336205E-7 accuracy: 1 Iteration 16/20 result -- loss: 1.49011640360186E-7 accuracy: 1 Iteration 18/20 result -- Loss: 5.438936341306544E-7 Accuracy: 1 ℹ pipeline finished, the model has been saved at /Users/pipcook-playground/my-pipcook/modelCopy the code
We also save this example in the repository, you can also run it directly from the URL:
$ pipcook train https://cdn.jsdelivr.net/gh/alibaba/pipcook@main/example/pipelines/image-classification-mobilenet.json -o my-pipcook
Copy the code
The -o argument indicates that our training workspace is defined in./my-pipcook.
From the log, we can see that Pipcook will download some necessary dependencies during the preparation phase, respectively:
- Framework: A Framework is a set of packages, each of which provides the dependencies that a pipeline needs to run in a different environment, such as operating system, Node version, etc. Our running environment is MacOS, Node 12.22, relying on TFJS 3.8 framework, Pipcook will automatically select the appropriate framework file according to the current environment. This framework file is maintained by Pipcook, the default image is the domestic Ali Cloud OSS, so students with poor network need not worry about downloading problems. At the same time
us-west
Also maintain a copy, foreign students can also be easily downloaded. The file for each frame URL is downloaded only once and retrieved from the cache when used again. - Scripts: Pipcook’s model task consists of a bundle of Scripts. In this case, Scripts are stored on Github and can be CDN accelerated through JsDelivr. Scripts can be imported into the framework for data processing or model training.
- Plug-ins are used to process trained models, such as post-train operations such as uploading OSS. Because these plug-ins are lightweight and have a cache after each plug-in is installed, they are designed as NPM packages that are installed through the NPM client. In this example, the build plug-in is not configured, so this item is ignored.
After the preparation work is completed, we will run datasource, Dataflow and Model in sequence, start to pull training data, process samples, and feed the model for training.
We define the training parameter of Pipeline as 10 epochs, so the model training stops after 10 epochs.
At this point, our training artifacts are stored in the Model folder in the workspace./ my-Pipcook.
School Exercises ── School Exercises ─ Framework -> / Users/pipcook - playground /. Pipcook/framework/c4903fcee957e1dbead6cc61e52bb599 ├ ─ ─ image - classification. Json ├ ─ ─ model └ ─ ─ scriptsCopy the code
As you can see, the workspace contains everything you need for this training, where the Framework is soft linked to the Framework directory in the workspace after downloading.
To predict
Next we prepare a head picture.
Pipcook predict command input model was used for category prediction. The two parameters were workspace and image address to be predicted:
$pipcook predict./my-pipcook -s./avatar. JPG ℹ Preparing framework ℹ Preparing scripts ℹ Preparing artifact plugins ℹ Initializing Framework Packages ℹ Prepare Datasource
ℹ running data flow script
ℹ running model script
Platform node has already been set. Overwriting the platform with [object object]. 2021-08-30 00:08:28.070916: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructionsin performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
predict result: [{"id": 0."category":"avatar"."score"✔ : 0.9999955892562866}] Origin result: [{"id": 0."category":"avatar"."score": 0.9999955892562866}]Copy the code
As with training, necessary dependencies such as Framework, Scripts and artifacts are prepared and ignored when they already exist. This way, even if we move the model to another device, we can run it directly, and Pipcook will automatically prepare the runtime environment. From the output log, the model predicts that the type of the image is Avatar, with a confidence of 0.999.
The deployment of
To deploy the machine learning model using Pipcook, run the Pipcook serve < workshop-path > command.
$pipcook serve./my- Pipcook ℹ Preparing Framework ℹ Preparing scripts ℹ Preparing artifact plugins ℹ initializing framework packages Pipcook has served at: http://localhost:9091Copy the code
The default port number is 9091. You can also specify the port number using the -p parameter. Then we can access the interactive interface of picture classification task to test by opening http://localhost:9091 through the browser.
Select the image and click the Predict button:
Of course we can also access the prediction interface directly:
$ curl http://localhost:9091/predict -F "image=@/Users/pipcook-playground/avatar.jpg" -v
* Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 9091 (# 0)> POST /predict HTTP/1.1 > Host: localhost:9091 > user-agent: curl/7.64.1 > Accept: */* > content-length: 60452 > Content-Type: multipart/form-data; boundary=------------------------6917c53e808d414f > Expect: -continue > < HTTP/1.1 100 continue * We are completely matches and fine < HTTP/1.1 200 OK < X-powered-by: Express < Content-Type: application/json; charset=utf-8 < Content-Length: 64 < ETag: W/"40-kCOJxkKqWqcndfPNbdrICzIiW+A"
< Date: Mon, 30 Aug 2021 03:59:49 GMT
< Connection: keep-alive
< Keep-Alive: timeout=5
<
* Connection #0 to host localhost left intact
{"success":true."data": [{"id": 0."category":"avatar"."score":1}]}* Closing connection 0
Copy the code
Pipcook provides different interfaces and interfaces depending on the Pipeline type.
conclusion
This is the end of the introduction of Pipcook 2.0. If you are interested, welcome to star, issue and PR, and join the discussion of Dingding group.
Pipcook warehouse address: github.com/alibaba/pip… Script repository address: github.com/imgcook/pip…