Abstract:

Introduction to the

TensorFLow is the most popular open source framework for deep learning and machine learning. It was originally developed by Google research team and dedicated to machine learning research for deep neural networks. It has been widely used since it was opened in 2015. Tensorboard, in particular, is a powerful tool for data scientists to work effectively.

Jupyter Notebook is a powerful data analysis tool that enables rapid development and sharing of machine learning code. It is a great tool for data science teams to experiment with data and collaborate within groups. It is also a great starting point for beginners in machine learning.

TensorFLow using Jupyter is also the first choice of many data scientists, but how to quickly build such an environment from scratch and configure the GPU to support the latest version of TensorFLow is both complex and a waste of effort for data scientists. On Aliyun’s Kubernetes cluster, you can create a complete set of TensorFlow experiment environment with simple button submission, including Jupyter Notebook development model, and adjust model with Tensorboard.

Prepare the Kubernetes environment

Alibaba Cloud container service Kubernetes 1.9.3 is now online, but the purchase of pay-as-you-go GPU computing server needs to apply for ECS work order. For details, see Creating a Kubernetes Cluster.

Experience deploying the TensorFlow lab from the application directory

Using The Helm to deploy MPI applications, this article uses OpenMPI as an example to show you how to quickly run MPI applications on container services. In fact, if you need to switch to another MPI implementation, you just need to replace the image.

2.1 From the application directory, click Ack-tensorflow-dev

2.2 Click Parameters. You can modify the parameters and click Deploy

The password here is Tensorflow, or you can change it to your own password

You can also log in to Kubernetes Master and run the following command

$ helm install --name tensorflow incubator/ack-tensorflowCopy the code

2.3 After the TensorFlow application is running, you can log in to the console to check the startup status of the TensorFlow application

Log in to the TensorFlow experiment environment

  1. Log in to the Kubernetes cluster using SSH to view the list of TensorFlow applications
$ helm list
NAME          REVISION    UPDATED                     STATUS      CHART                       NAMESPACE
tensorflow    1           Thu Apr 12 07:54:59 2018    DEPLOYED    ack-tensorflow-dev0.1. 0    defaultCopy the code

2. Use the helm Status to check the application configuration

$ helm status tensorflow
LAST DEPLOYED: Thu Apr 12 07:54:59 2018
NAMESPACE: default
STATUS: DEPLOYED

RESOURCES:
==> v1/Service
NAME                           TYPE          CLUSTER-IP   EXTERNAL-IP     PORT(S)                      AGE
tensorflow-ack-tensorflow-dev  LoadBalancer  172.192.. 39  10.0. 01.  6006:32483/TCP,80:32431/TCP  13m

==> v1beta2/Deployment
NAME                           DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
tensorflow-ack-tensorflow-dev  1        1        1           1          13m


NOTES:
1. Get the application URL by running these commands:
     NOTE: It may take a few minutes for the LoadBalancer IP to be available.
           You can watch the status of by running 'kubectl get svc -w tensorflow-ack-tensorflow-dev'
  export SERVICE_IP=$(kubectl get svc --namespace default tensorflow-ack-tensorflow-dev -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
  echo http://$SERVICE_IP:Copy the code

You can see that the IP address of the external SLB is 10.0.0.1, the port of the Jupyter Notebook is 80, and the Tensorboard is 6006.

3. Log in to Jupyter through the Jupyter endpoint. In this example, the Jupyter address is http://10.0.0.1

4. Click the Terminal button

5. Run the nvidia-SMI command in Terminal to view the GPU configuration

6. Use git command to download the tensorflow sample code.

$ git clone https://code.aliyun.com/kubernetes/Tensorflow-Examples.gitCopy the code

7. Go back to the home page and you will see that TensorFlow-Examples have been downloaded to your working directory

8. To enter http://10.0.0.1/notebooks/Tensorflow-Examples/notebooks/4_Utils/tensorboard_basic.ipynb, run the program

Note: If you need to use Tensorboard to observe the training effect, please log to /output/training_logs.

9. The following is the output of training results

10. You can log in to Tensorboard to view the training effect. In this example, the address of Tensorboard is http://10.0.0.1:6006. Here you can see the convergence trend of the model definition and training.

conclusion

We can use Aliyun Kubernetes container service to easily set up TensorFlow environment on the cloud, run the deep learning laboratory, and use TensorBoard to track the training effect. We welcome you to use the GPU container service on Aliyun. While using the efficient computing capability of GPU, you can start your model development work easily and quickly.

The original link