Introduction to the
TensorFLow is the most popular open source framework for deep learning and machine learning. It was originally developed by Google research team and dedicated to machine learning research for deep neural networks. It has been widely used since it was opened in 2015. Tensorboard, in particular, is a powerful tool for data scientists to work effectively.
Jupyter Notebook is a powerful data analysis tool that enables rapid development and sharing of machine learning code. It is a great tool for data science teams to experiment with data and collaborate within groups. It is also a great starting point for beginners in machine learning.
TensorFLow using Jupyter is also the first choice of many data scientists, but how to quickly build such an environment from scratch and configure the GPU to support the latest version of TensorFLow is both complex and a waste of effort for data scientists. On Aliyun’s Kubernetes cluster, you can create a complete set of TensorFlow experiment environment with simple button submission, including Jupyter Notebook development model, and adjust model with Tensorboard.
Prepare the Kubernetes environment
Alibaba Cloud container service Kubernetes 1.9.3 is now online, but the purchase of pay-as-you-go GPU computing server needs to apply for ECS work order. For details, see Creating a Kubernetes Cluster.
Experience deploying the TensorFlow lab from the application directory
Using The Helm to deploy MPI applications, this article uses OpenMPI as an example to show you how to quickly run MPI applications on container services. In fact, if you need to switch to another MPI implementation, you just need to replace the image.
2.1 From the application directory, click Ack-tensorflow-dev
2.2 Click Parameters. You can modify the parameters and click Deploy
The password here is Tensorflow, or you can change it to your own password
You can also log in to Kubernetes Master and run the following command
$ helm install --name tensorflow incubator/ack-tensorflowCopy the code
2.3 After the TensorFlow application is running, you can log in to the console to check the startup status of the TensorFlow application
Log in to the TensorFlow experiment environment
- Log in to the Kubernetes cluster using SSH to view the list of TensorFlow applications
$ helm list
NAME REVISION UPDATED STATUS CHART NAMESPACE
tensorflow 1 Thu Apr 12 07:54:59 2018 DEPLOYED ack-tensorflow-dev0.1. 0 defaultCopy the code
2. Use the helm Status to check the application configuration
$ helm status tensorflow
LAST DEPLOYED: Thu Apr 12 07:54:59 2018
NAMESPACE: default
STATUS: DEPLOYED
RESOURCES:
==> v1/Service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
tensorflow-ack-tensorflow-dev LoadBalancer 172.192.. 39 10.0. 01. 6006:32483/TCP,80:32431/TCP 13m
==> v1beta2/Deployment
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
tensorflow-ack-tensorflow-dev 1 1 1 1 13m
NOTES:
1. Get the application URL by running these commands:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get svc -w tensorflow-ack-tensorflow-dev'
export SERVICE_IP=$(kubectl get svc --namespace default tensorflow-ack-tensorflow-dev -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo http://$SERVICE_IP:Copy the code
You can see that the IP address of the external SLB is 10.0.0.1, the port of the Jupyter Notebook is 80, and the Tensorboard is 6006.
3. Log in to Jupyter through the Jupyter endpoint. In this example, the Jupyter address is http://10.0.0.1
4. Click the Terminal button
5. Run the nvidia-SMI command in Terminal to view the GPU configuration
6. Use git command to download the tensorflow sample code.
$ git clone https://code.aliyun.com/kubernetes/Tensorflow-Examples.gitCopy the code
7. Go back to the home page and you will see that TensorFlow-Examples have been downloaded to your working directory
8. To enter http://10.0.0.1/notebooks/Tensorflow-Examples/notebooks/4_Utils/tensorboard_basic.ipynb, run the program
Note: If you need to use Tensorboard to observe the training effect, please log to /output/training_logs.
9. The following is the output of training results
10. You can log in to Tensorboard to view the training effect. In this example, the address of Tensorboard is http://10.0.0.1:6006. Here you can see the convergence trend of the model definition and training.
conclusion
We can use Aliyun Kubernetes container service to easily set up TensorFlow environment on the cloud, run the deep learning laboratory, and use TensorBoard to track the training effect. We welcome you to use the GPU container service on Aliyun. While using the efficient computing capability of GPU, you can start your model development work easily and quickly.
The original link