Two months later, the second stable release (V1.2) of Pipcook has been released. Here are some of the improvements.
List of Important Features
In the past two months, the development team has made targeted optimization for service startup, plug-in installation and Pipeline execution time. In particular, the time of Pipeline execution training, which was most ridiculed by internal users, has changed from more than 5 minutes to start training model. Now optimized pipelines can start in 10 seconds.
Training model is faster
In V1.0, there are different stages in each Pipeline, such as DataCollect, which collects data sets, and ModelDefine, which defines models, or DatasetProcess, which processes data sets. In the last stable release, training a simple component (image) classification task took nearly 2 minutes to process the data (time increases linearly with the size of the dataset).
There are two reasons for this:
-
In the definition of V1.0 Pipeline, the next phase will not proceed until the data is completely processed in the previous phase, but in fact, there are a lot of I/O waiting time and CPU idle time in the process of data collection and processing.
-
In the v1.0 Pipeline definition, data class plug-ins (DataCollect, DataAccess, DataProcess) were passed through the file path, which not only resulted in a large number of repeated disk reads and writes in the Pipeline process. It has also made it impossible to do some of the much more numerically focused calculations like Normalization.
So in PR#410 (github.com/alibaba/pip… By introducing the mechanism of asynchronous Pipeline and using Sample as the unit for transferring data between plug-ins, the benefits of this are as follows:
-
Once the former plug-in produces the first Sample, it can start to load the later plug-ins, which solves the problem that the previous and subsequent plug-ins need to wait for all the data to be processed, and greatly advance the time of training.
-
Unnecessary and repeated read and write operations are reduced. Sample is passed in memory between plug-ins, and the processed values are stored in memory for use by plug-ins in later stages.
With the help of asynchronous Pipeline, we managed to reduce the Pipeline entry time from 1 minute 15 seconds to 11 seconds, and also shorten the overall training time.
Faster plug-in installation
In the new version, we also optimized the plugin installation process. Most pipelines in Pipcook are still dependent on the Python ecosystem, so they install both Python and Node.js dependencies when installing plug-ins. Prior to V1.2, Pipcook is serial installed, so in PR#477 (github.com/alibaba/pip… In, we parallelized the installation of the Python and Node.js packages to reduce the overall installation time.
In future releases, we will continue to explore the optimization brought by parallelization and try to analyze each installation task (Python and Node.js package) and schedule the installation task to achieve a more reasonable parallel installation.
Faster initial startup
Starting with Pipcook 1.2, users will no longer need to install Pipboard locally. We deployed Pipboard as an online service through Vercel and migrated all the code to github.com/imgcook/pip… Under.
Users can use pipboard via pipboard.vercel.app/, but there are still some tweaks, such as remote Pipcook daemons that are not yet supported.
The subsequent Pipboard release cycle will be independent of Pipcook, which means that we encourage people to develop their own Pipboard based on the Pipcook SDK, and the Pipboard itself will be provided as a Demo or as a sample application provided by default.
Support Google Colab
If you’ve been following Pipcook for a long time, you’ll have noticed that some of the tutorials in the official documentation start with a link to Google Colab! Yes, Pipcook supports running on Google Colab, which means that beginners who are stuck without a GPU can now learn Pipcook from the free GPU/TPU available on Google Colab, starting with the following two links: Start your front-end component identification journey:
-
Classification in the front-end components (alibaba. Making. IO/pipcook / # / z…
-
Identification in the front-end components (alibaba. Making. IO/pipcook / # / z…
Plugin Python runtime for algorithm engineers
To make it easier for algorithm engineers to contribute models to Pipcook at a lower hurdle, we added support for a pure Python runtime. In addition to defining an additional package.json for contributors, The plug-in (model class) can be developed without writing any JavaScript code, and for the convenience of algorithm engineers, we developed a NLP (NER) Pipeline based on Python plug-in runtime, related plug-ins are as follows:
-
Github.com/imgcook/pip…
-
Github.com/imgcook/pip…
-
Github.com/imgcook/pip…
Pipcook SDK release
As mentioned earlier, we moved the Pipboard out of Pipcook and released it independently in the hope that developers could use the Pipcook SDK to develop Pipboard or any other form of application that would suit their needs. Therefore, we will officially release the Pipcook SDK in V1.2, which supports the management of Pipeline and training tasks using specified Pipcook services in Node.js and JavaScript runtime environments.
const client = new PipcookClient('your pipcook daemon host', port); const pipelines = await client.pipeline.list(); // Display all current pipelinesCopy the code
Pipcook SDK API documentation: alibaba. Making. IO/Pipcook/typ…
Daily (Beta) and Release versions
To make Pipcook more selective, we’ve updated our release cycle over the past two months with the following rules:
- Pipcook Init Beta or PipcOOK Init — Beta if you want to try the latest version.
- The Release version
The cardinal version (e.g. 1.1, 1.3, etc.) is an unstable version, mainly incorporating some larger experimental properties
The even-numbered version (1.0, 1.2, etc.) is the stable version, which requires more fixes and optimizations for stability and performance
All releases will follow the Semver2.0 specification
Plan for the next release (V.1.4)
We’re scheduled to release Pipcook V1.4 in two months, and the team is still focused on making Pipcook “fast.”
For example, if you want to use a Node.js environment after training the model, you still need to perform a very lengthy step to install NPM (which will install Python and dependencies), and we want the model to be ready to use without any tedious steps.
On the model side, we will support a more lightweight target detection model (YOLO/SSD), which can easily perform target detection tasks in some simple scenarios.
Develop reading
Pipcook 1.2 roadmap (github.com/alibaba/pip…
Pipcook 1.3 roadmap (github.com/alibaba/pip…
Imgcook/awesome – imgcook (github.com/imgcook/awe… Includes Pipcook Book, plug-in list, and front end intelligence tutorial
community
Nail group number: 30624012
Discord chat room: Discord. Gg /UbfXzGY