Written by F(X) Team-Rem

Two months later, the second stable release (V1.2) of Pipcook has been released. Here are some of the improvements.

List of Important Features

In the past two months, the development team has made targeted optimization for service startup, plug-in installation and Pipeline execution time. In particular, the time of Pipeline execution training, which was most ridiculed by internal users, has changed from more than 5 minutes to start training model. Now optimized pipelines can start in 10 seconds. Training models faster In V1.0, each Pipeline was divided into stages, such as DataCollect to collect data sets, ModelDefine to define models, or DatasetProcess to process data sets. In the last stable release, training a simple component (image) classification task took nearly 2 minutes to process the data (time increases linearly with the size of the dataset). There are two reasons for this:

  • In the definition of V1.0 Pipeline, the next phase will not proceed until the data is completely processed in the previous phase, but in fact, there are a lot of I/O waiting time and CPU idle time in the process of data collection and processing.
  • In the v1.0 Pipeline definition, data class plug-ins (DataCollect, DataAccess, DataProcess) were passed through the file path, which not only resulted in a large number of repeated disk reads and writes in the Pipeline process. It has also made it impossible to do some of the much more numerically focused calculations like Normalization.

Therefore, in PR#410, the mechanism of asynchronous Pipeline is introduced and Sample is used as the unit for transferring data between plug-ins. The benefits of this are as follows:

  • Once the former plug-in produces the first Sample, it can start to load the later plug-ins, which solves the problem that the previous and subsequent plug-ins need to wait for all the data to be processed, and greatly advance the time of training.
  • Unnecessary and repeated read and write operations are reduced. Sample is passed in memory between plug-ins, and the processed values are stored in memory for use by plug-ins in later stages.

With the help of asynchronous Pipeline, we managed to reduce the Pipeline entry time from 1 minute 15 seconds to 11 seconds, and also shorten the overall training time. In the new version, we have optimized the process of installing plug-ins. Most pipelines in Pipcook are still dependent on the Python ecosystem, so they install both Python and Node.js dependencies. Prior to V1.2, Pipcook is installed in serial, so in PR#477, we parallelized the installation of Python and node.js packages to reduce the overall installation time. In future releases, we will continue to explore the optimization brought by parallelization and try to analyze each installation task (Python and Node.js package) and schedule the installation task to achieve a more reasonable parallel installation. Starting with Pipcook 1.2, users will no longer need to install Pipboard locally. We deployed Pipboard as an online service through Vercel and migrated all the code under IMGCook/Pipboard. Users can use pipboard via pipboard.vercel.app/, but there are still some tweaks, such as remote Pipcook daemons that are not yet supported. The subsequent Pipboard release cycle will be independent of Pipcook, which means that we encourage people to develop their own Pipboard based on the Pipcook SDK, and the Pipboard itself will be provided as a Demo or as a sample application provided by default. Support for Google Colab ** Users who have been following Pipcook for a long time must have noticed that some of the tutorials in the official documentation have a link to Google Colab at the beginning! Yes, Pipcook supports running on Google Colab, which means that beginners who are stuck without a GPU can now learn Pipcook from the free GPU/TPU available on Google Colab, starting with the following two links: Start your front-end component identification journey:

  • Front end components in the classification picture
  • Identify the front-end components in the picture

To make it easier for algorithm engineers to contribute models to Pipcook at a lower hurdle, we added support for a pure Python runtime. For contributors, in addition to defining an additional package.json, The plug-in (model class) can be developed without writing any JavaScript code, and for the convenience of algorithm engineers, we developed a NLP (NER) Pipeline based on Python plug-in runtime, related plug-ins are as follows:

  • pipcook-plugin-tensorflow-bert-ner-model
  • pipcook-plugin-tensorflow-bert-ner-model-train
  • pipcook-plugin-tensorflow-bert-ner-model-evaluate**

As mentioned earlier, we moved The Pipboard out of Pipcook and released it independently in the hope that developers could use the Pipcook SDK to develop Pipboard or any other form of application that fits their needs. Therefore, we will officially release the Pipcook SDK in V1.2, which supports the management of Pipeline and training tasks using specified Pipcook services in Node.js and JavaScript runtime environments.

const client = new PipcookClient('your pipcook daemon host', port);
const pipelines = await client.pipeline.list(); // Display all current pipelines
Copy the code

Pipcook SDK API documentation: Click here. To allow users to be more selective with Pipcook, we’ve updated our Release cycle over the past two months with the following rules:

  • Pipcook Init Beta or PipcOOK Init — Beta if you want to try the latest version.
  • The Release version
    • The cardinal version (e.g. 1.1, 1.3, etc.) is an unstable version, mainly incorporating some larger experimental properties
    • The even-numbered version (1.0, 1.2, etc.) is the stable version, which requires more fixes and optimizations for stability and performance
    • All releases will follow the Semver2.0 specification

Next Release Plan (V1.4)

We’re scheduled to release Pipcook V1.4 in two months, and the team is still focused on making Pipcook “fast.” For example, if you want to use a Node.js environment after training the model, you still need to perform a very lengthy step to install NPM (which will install Python and dependencies), and we want the model to be ready to use without any tedious steps. On the model side, we will support a more lightweight target detection model (YOLO/SSD), which can easily perform target detection tasks in some simple scenarios.

Develop reading