Written by F(X) Team – Zebai
This article mainly introduces the online interactive tutorial site Pipboard, through the online model training interactive experience, so that the experiencer can master and understand the knowledge and ability of machine learning. This model training interactive experience is a task related to image classification. By training image sample data of different categories, the correct image classification effect can be predicted eventually.
Remember before
It all started with a suspected alarm. A visit to our Github Pages domain, pipboard.imgcook.com, one day saw something like this and thought it was a Github alarm
I checked the official documentation related to Github Pages and suspected that there was a problem with the CNAME file. The CANME file was not generated in the new build link. Then we changed the build logic and republished the GH-Pages branch.
When you visit the page after publishing, it’s still PoC.
I started checking the configuration of the custom domain name in the Settings panel and it turned out to be empty. The CNAME file is missing and the CNAME file is empty
There is already a CNAME file in the GH-Pages branch and it failed to create the CNAME file when clicking the save button for the custom domain name.
Then the operation: delete the CNAME file, set the panel configuration and click Save.
Still no use.
At this time, I began to suspect whether the DNS Provider records affected the query of Github Pages. I wanted to contact Github Support to see how to solve the problem.
On second thought, it seemed to provide personal contact information, not official style. I tried a GitHub search and found a repository, and the README description was exactly the same as the page
And there is also a CNAME file lying in the warehouse. When I click on it, it turns out to be the domain name of pipboard.
Github warned that the domain name was occupied. The commit time is approximately 40 minutes after the pipboard repository merged the Pull Request missing CNAME file.
At this point, the whole course of the attack is probably clear. The white hat should have monitored the changes of CNAME files in a number of well-known enterprises or individuals’ warehouses by means of scanning, and found that after the CNAME files were missed in a commit, a COMMIT with corresponding CNAME files was submitted in his own warehouse, thus intercepting the right to use the domain name on Github. This should be a Github bug.
Why is he a white hat? He reports his problems to HackerOne, where he negotiates with companies or individuals for a reward.
If you have an application that uses Github Pages, take note of this scenario to avoid domain name truncation.
After reviewing the Github Action and build commands, we decided to use the build command to ensure that the CNAME file is packaged into assets, and to force the presence of the CNAME file during the deployment of Github actions. If not, interrupt the deployment, refer to the Action code
Why Pipboard
Pipcook is currently a machine learning framework for front-end developers, with the mission of front-end intelligence. However, in the process of development and promotion, it was found that front-end students still had doubts about the ability to use machine learning, and there was still a threshold to use Pipcook for model training.
So this is the original pipboard was born, this means that in addition to experience intuitively sense of machine learning can have what ability, also wants the entire model the inherent concept (below the parameters in the model of training), with the low cost of interaction form send to experience or a beginner, so as to achieve the effect of tutorials, Learn or understand the knowledge concepts related to machine learning in the process of use.
Experience with
Step one, accessPipboard page
The page is divided into three areas, left, middle and right. On the left is data sample collection, in the middle is model training correlation, and on the right is model prediction correlation.
Second, collect sample data
In the left area, you can click the camera and long press the photo button to collect image samples, or upload files to collect samples. For samples provided by each category, the model will have better effect.
If you want to change the name of the category, you can do so by clicking on the title of the category.
We provide a sample collection of images.
Step 3: Train the model
After all types of sample data are collected, model training can be carried out, namely the middle area
The training parameters of the model can be adjusted through the middle panel, and each parameter is introduced accordingly. After adjusting parameters, you can click the training model, and feedback will be given during the training process through the popup data panel on the left.
Accuracy refers to the percentage of models that are correctly classified during training. If the model correctly predicts the classification of 70 samples out of 100, the accuracy is 70/100 = 0.7.
Loss is a measure of how well an assessment model learns to predict the correct classification of a given sample set. If the prediction of the model is perfect, loss is 0. Otherwise, loss is greater than 0. Suppose you have two models: A and B. Model A correctly predicted the sample classification, but had only 60% confidence in the prediction. Model B also correctly predicted the classification of samples, but with 90% confidence in this prediction. The accuracy of the two models was the same, but the loss of model B was lower.
Step 4: Predict the results
After the training is completed, the prediction effect of the model can be displayed through the prediction panel on the right. The input of this panel is camera data. Pipboard will classify the images in the data sample and display them in the results. The model will predict the categories of the input images in real time and then display them in percentage.
As shown: the model recognizes the cup.
Step 5, export the model
Finally, click the Export module button to open the module export popup, download the model and code, and experience locally.
At this point, the whole model training process is over. The current model implementation has not been optimized too much. For scenarios with multiple categories and asymmetric data samples, there may be large deviation of prediction results, and iterative optimization will be carried out later. If you have any questions during the whole training process, you can give feedback on issues. We will try our best to explain the questions visually in the subsequent iterations to help you understand the process of model training.
Subsequent planning
At present, the overall interaction of the first version is a copy of the interaction mode of TeachableMachine 🐶. Thank you very much for the inspiration brought to us by Google, and WE also hope to convey this interaction to students who are not convenient to visit foreign sites.
Compared with teachablemachine, in addition to the online training experience, pipboard will carry out more visual expression of the training process and results of data collection (such as audio and images) and models in the future. For example, the visual assembly of model Layer and the visual JavaScript program editor similar to Blockly express the model in a visual form to analyze and explain the knowledge concepts in the field of machine learning.
When using Pipcook, developers can customize their own model training process by using existing plug-ins or developing new ones. They also hope to reduce the cost of plug-in development through visualization.
conclusion
The development process of pipboard online interactive tutorials is actually a learning process. Because I am still a beginner in the field of machine learning, only face the actual scene, to solve the problems, naturally will let you quickly master some practical knowledge. For example, how to translate an uploaded image into a tensor. Secondly, as a front-end, the compatibility of browser features is considered more in normal business development, and there is little understanding of features that are not commonly used. This time using webcam and other related apis, the whole process is really interesting, the browser is quite powerful.
Finally, do you have a preliminary understanding of machine learning? If you are interested, you can scan the following TWO-DIMENSIONAL code to enter the group discussion and exchange.