This article was originally published by AI Frontier.Bind TensorFlow, open TPU, Google Cloud wants to use AI advantage to change lanes to overtake AWS?
Compile | Debra
The author | AI editorial board at the front
The beta version of Cloud TPU (Cloud TPU) is now available in beta as of Today. Jeff Dean introduced ten more tweets. According to official sources, Cloud TPU not only has faster model training speed, but also costs only $6.50 / hour.”
Google’s new killer: Cloud TPU
According to the official blog, Cloud TPU is a series of hardware accelerators designed by Google that are optimized to accelerate and extend specific ML workloads programmed using TensorFlow. Each cloud TPU consists of four custom ASics (hardware programming chips) that pack up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board.
These motherboards can be used individually or linked together over an ultra-fast private network to form multi-petaflop ML supercomputers called TPU pods. Later this year, Google will offer these massive supercomputers on GCP.
Cloud TPU has the following advantages:
- The controlled and customizable Google Compute Engine virtual machine provides interactive exclusive access to the networked Cloud TPU without waiting for jobs to be scheduled on the shared computing cluster.
- Instead of waiting days or weeks to train business-critical ML models, users can train multiple variants of the same model overnight on a series of cloud Tpus and deploy the most precise training model in production the next day.
- Users can train ResNET-50 to achieve the expected accuracy of the ImageNet Benchmark challenge in less than a day for less than $200.
On May 18, 2017, at Google I/O, we got our first look at the rumored TPU 2.0 architecture, which consists of four chips capable of handling 180 teraflops. Google also found a way to combine 64 TPus using a new computer network and upgrade them into so-called TPU Pods, which provide about 11,500 teraflops of computing power.
Easy ML model training
Traditionally, programming for custom asics and supercomputers has required deep expertise. In contrast, you can program a Cloud TPU using the advanced TensorFlow API and use Google’s open-source reference project for deploying a high-performance Cloud TPU model to get started quickly:
- ResNet-50 Cloud.google.com/tpu/docs/tu…
- Popular image types
Github.com/tensorflow/…
- Converter used in machine translation and language modeling cloud.google.com/tpu/docs/tu… , research.googleblog.com/2017/08/tra…
- RetinaNet for object detection Github.com/tensorflow/…
To save users time and effort, Google continuously tests the performance and convergence of these model implementations to achieve the desired accuracy on standard data sets.
Methods for deploying other models will also be open in the future. Using documentation and tools provided by Google, Adventurous ML experts can optimize other TensorFlow cloud TPU models themselves.
Google announced at NIPS 2017 that training time with full TPU Pods, ResNET-50 and Transformer will be reduced from a full day to less than 30 minutes with no code changes required.
The blog also mentions one company using Cloud TPU, investment management firm Two Sigma:
“We decided to focus our deep learning research on the cloud for a number of reasons, but mainly to get access to the latest machine learning infrastructure. Moving the TensorFlow workload to TPU greatly reduced the complexity of programming new models and training time, thus increasing our productivity. Using Cloud TPU instead of other accelerator clusters allowed us to focus on building the model without having to distract ourselves from the complexity of the cluster communication model. Alfred Spector, CTO, Two Sigma
Cloud TPU also simplifies the calculation and management of ML computing resources:
- Cloud TPU can accelerate optimization of ML models and dynamically adjust capacity according to demand.
- The large-scale, high-cluster ML model of Cloud TPU has been optimized for many years, requiring you to design, install, and maintain without dedicated energy, cooling, networking, and storage energy, time, and expertise.
- There is no need to struggle to keep a large number of workstations and servers up to date because the Cloud TPU is pre-configured without installing drivers!
- Enjoy the same sophisticated security protections as all Google Cloud services.
Bind your own TensorFlow and tap your own GPU
Google’s monopoly is getting stronger and stronger.
In May 2016, Google released a dedicated chip for machine learning, called TPU, and last year it launched its second-generation product, Cloud TPU. While the first generation was only able to handle reasoning tasks, the second generation TPU added features that could be used for training machine learning models. Google services such as Search, Street View, Google Photos and Google Translate now have one thing in common: they all use tensor processing units (TPUS) to speed up their neural network computing behind the scenes.
On the software side, Google’s reputation for machine learning is sure to bring new users to Cloud TPU services. And developers already using TensorFlow, Google’s deep learning framework, can use the service without making changes to their code. The TPU has also been optimized for TensorFlow, which means that TensorFlow runs better on THE TPU. In other words, people who want to use TPU may not choose other deep learning frameworks. In the long run, this gives Google Cloud a different way to stand out from AWS and Azure. After all, most vendors now offer the same basic cloud computing services, and containers make it easier to move workloads from one platform to another. By combining TensorFlow with TPU, Google has a unique advantage in the short term.
In terms of hardware, comparing TPUS to chips from Intel and Nvidia, Google has claimed that “we’ve found TPU performance to be 15-30 times better than today’s CPUS and Gpus, and 30-80 times better per watt. These advantages help many of Google’s services deliver state-of-the-art neural networks at scale and cost. Google gave an example at its I/O conference to show just how powerful the chip can be: a large-scale translation model that might take 32 Gpus a full day to train; But it only takes one afternoon, using one eighth of a TPU, to do the job.
In December, Nvidia unveiled its first Volta-based Titan V, officially the “world’s most powerful GPU,” targeting researchers in AI or deep learning, with 110 teraflops of computing power. It costs $2,999. And TPU, Google did not directly sell chips, but through the way of cloud services to provide services, so that everyone can use affordable, and can also provide technical support, so that users have no worries.
In the market of cloud computing, amazon, Microsoft, Google, Ali as the first echelon of the global pattern. Amazon was an early adopter of cloud computing and offers a full range of services. Microsoft has a unique advantage in security; As the leader of cloud computing services in China, Alibaba has maintained a high growth rate for many years in a row. Google’s core is its search business, which makes most of its money from advertising, but cloud computing services are increasingly important. Cloud TPUs are available via Google Cloud services, which allows anyone to rent a Cloud TPUs for the same price as a GPU. It certainly adds to the special advantages of Google’s cloud services.
Which cloud service giant manufacturer is strong?
It’s no surprise that the cloud computing craze has swept the world. For most companies, the days of grinding through complex server rooms and networks are over. Over the past decade, cloud computing has become more cost-effective, secure, and reliable. Major vendors in the cloud computing industry are now investing heavily in hardware, software and global network infrastructure to gain more market share and thus better computing performance. Good competition is good for consumers and supplier partners because it keeps costs down and suppliers innovate to stay ahead.
Typically, when we talk about Cloud computing providers, we’re mostly talking about the big three: Microsoft Azure, Google Cloud, and Amazon AWS. In this article, we’ll compare two of them: Google Cloud and AWS, and we’ll try to keep this article unbiased and explain everything in plain English. Both providers have their pros and cons, so no matter which vendor you end up with, you’re likely to run into some problems.
Cloud Computing Trends
Before we dive into comparing Google Cloud and AWS, let’s take a look at the latest Cloud computing trends. In January 2017, RightScale conducted its sixth annual Cloud Computing State survey, where they interviewed over 1,000 IT professionals to analyze current cloud computing trends, and there were a number of interesting findings.
- In 2016, 32 percent of respondents cited a lack of resources or expertise as the biggest challenge in cloud computing, a figure that dropped to 25 percent in 2017.
- In 2016, 29 percent of respondents said they were concerned about security issues related to cloud computing, while that number dropped to 25 percent in 2017.
- In 2016, 15 percent of respondents identified performance as an important challenge in cloud computing, while by 2017 only 11 percent did.
Challenges in cloud computing (from: RightScale)
From the data changes related to expertise mentioned above, we can see that the barriers to entry into the cloud computing industry are rapidly decreasing. You no longer need to be an expert to host your site using Google Cloud or AWS. Many cloud hosting providers now allow you to use cloud services directly without having to worry about technical expertise. Large companies are now even investing in their employees and engineers to become certified on Google Cloud, AWS, and Azure:
- Google Cloud Certification: Cloud Architect, Data Engineer, G Suite Administrator
- AWS Certification: Solution Architect, DevOps Engineer, developer, SysOps Administrator
- Azure certification: MCSA: Cloud Platform, MTA: IT Infrastructure, MCSA: Linux on Azure, MCSE: Cloud Platform and Infrastructure, etc
Performance and security have also advanced by leaps and bounds in the past few years, as cloud providers have had to invent new ways to host data more securely while providing faster computing speeds. Most providers now encrypt traffic between data centers by default.
Another interesting finding from the survey is the statistics on public cloud adoption in 2017 and 2016. AWS ‘public Cloud adoption numbers remain flat, while Azure and Google Cloud have both seen considerable growth. AWS is still the undisputed leader, but that’s mainly because they were the first to get into cloud computing. Google Cloud and Azure will certainly continue to catch up.
Public cloud Adoption (from: RightScale)
Here are some other statistics and predictions for the cloud computing industry:
Deloitte technology
Www2.deloitte.com/content/dam…
By the end of 2018, IT as a service spending on data centers, software and services will reach $547 billion, according to Deloitte Technology Forecasts.
Bdo Technology Outlook survey
www.bdo.com/getattachme…
The BDO Technology Outlook survey found that 74% of technology chief financial officers (Cfos) said cloud computing would have a considerable impact on their business in 2018.
IDC FutureScape
Cofinaeventos. Pt/portugaldig…
IDC FutureScape predicts that by 2018, at least half of IT spending will be cloud-based, and by 2020 IT will account for 60% of all IT infrastructure spending and 60-70% of all software, services and technology spending.
wikipedia
Siliconangle.com/blog/2017/0…
Wikipedia predicts that enterprise cloud spending will grow at a 16% compound annual growth rate (CAGR) between 2016 and 2026.
It’s also interesting to look at Google Trends’ interest index over this period, as cloud computing has been steadily growing over the past five years.
Cloud computing provider Google Trends
Stack Overflow, one of the largest online communities for developers, also has a nifty trending tool where they analyze patterns based on the percentage of questions they ask each month. Developers are an important part of the industry. While the CHIEF technology officer may have the final say, it is the developers, engineers, and system administrators who actually implement cloud computing solutions.
Cloud trends on Overflow
Jefferies analyst John DiFucci released a quarterly snapshot of public cloud services on December 20, 2017. Despite Amazon’s dominance in the public cloud, Google’s cloud platform grew an incredible 125% year-over-year in 2017. Alibaba and Microsoft Azure are also growing fast.
Google Cloud vs AWS (IaaS/Paas market) (source: MarketWatch)
Google Cloud Platform
The Google Cloud Platform is made up of a number of different services and solutions that allow users to leverage the same software and hardware infrastructure that Google uses for its own products, such as YouTube and Gmail. They launched their first service, Google App Engine, in 2008 with a public preview.
Google Cloud Platform has more than 50 products, including: Google Compute Engine, Google App Engine, Google Container Engine, Google Cloud Bigtable, Google BigQuery, Google Cloud functions, Google cloud data storage, Google Storage, Google cloud CDN, Google cloud DNS, etc.
In this article, we’ll focus on Google Computing Engine and its associated services, which allow users to start virtual machines on demand.
Google Computing Engine was released in a public preview in June 2012 and the official version in December 2013. Google’s computing engine is now used by HTC, Best Buy, Ubisoft, Philips, Domino’s Pizza, Leadpages, Heathrow, PayPal, Coca-Cola, Evernote, SONY Music and many more. Google CEO Sundar Pichai said the Google Cloud platform is the company’s top three priorities. Research firm Canalys estimates Google’s cloud platform business will generate $870 million in revenue in the third quarter, up 76% from a year earlier.
Companies that use Google’s cloud computing engine
For more information, check out The Google Cloud Platformput history, annotated in depth by Reto Meier:
medium.com/@retomeier/…
Amazon Web Services (AWS)
Amazon Web Services (AWS), a subsidiary of Amazon.com, began offering cloud computing services for businesses and individuals in 2006. Just like Google Cloud platform, AWS has many different services and solutions. Amazon has definitely paved the way for cloud computing. We refer you to the TechCrunch article on how AWS came to be:
Techcrunch.com/2016/07/02/…
AWS has more than 200 products, including: Amazon EC2, AWS Elastic Beanstalk, Amazon EC2 container Service, Amazon DynamoDB, Amazon Redshift, Amazon S3, and more.
This article will focus on Amazon EC2 and its associated services, which provide the same services as Google’s computing engine. Amazon EC2 was released in beta to the public in August 2006, six years before Google’s Computing engine. Amazon EC2 is currently being used by Netflix, Time, NASA, Expedia, Airbnb and Lamborghini, among others.
Companies using Amazon EC2
Google Cloud vs AWS
Because Google Cloud and AWS are so similar, we can compare them on several different dimensions. Due to space constraints, it’s impossible to cover all the details of these two companies in this article, as they each have over 50 services! This article will focus on the following comparisons: compute instance, storage and disk, and charging and pricing.
The calculation example
The first comparison is how Google Compute Engine and AWS EC2 handle their virtual machines (instances). The technology behind the Google Cloud VIRTUAL machine is KVM, and the technology behind the AWS EC2 virtual machine is Xen. Both provide a rich variety of predefined instance configurations with a specific number of virtual cpus, RAM, and networks. But they have different naming conventions and can be confusing. Google Computing Engine refers to them as Machine types, while Amazon EC2 refers to them as instance types.
- You can pack Google Compute Engine instances with up to 96 virtual cpus and 624GB of RAM (new machine type released October 5, 2017).
- You can configure up to 128 virtual cpus and 3,904GB of RAM for AWS EC2 instances.
The following is a comparison of similar VMS, such as high memory, high CPU, SSD storage, and so on, between the two cloud computing vendors.
It is important to note that Google Cloud allows users to deviate from the above predefined configuration and customize the CPU and RAM resources of their instances based on their workload, which is called custom machine types. Other types include Google Cloud Preemptible VM and AWS EC2 Spot Instances.
Storage and Disk
The types of storage and disks used by cloud providers play a very important role because they have a direct impact on performance, such as expected throughput (IO), maximum number of IOps per volume/instance, and the ability to burst capacity in short bursts. When you compare Google with AWS: Block storage and object storage, there are two main types of storage options.
Block storage
Block storage is essentially a virtual disk volume used with a cloud-based virtual machine. Google Compute Engine provides persistent disks, while AWS EC2 provides them through Elastic Block Store (EBS).
Object storage
Object storage (sometimes called distributed object storage) is essentially a managed service for storing and accessing a large number of binary objects or BLOBs. Google Compute Engine provides this service through their Google Cloud Storage service, while AWS provides this service through its Amazon S3 service.
In addition to standard network blocks and object storage, both computing engines and Amazon EC2 allow users to use disks locally connected to the physical machine running the instance. Compared to permanent disks, local storage provides excellent performance with extremely high input/output operations per second (IOPS) and very low latency. This type of storage can even achieve read/write speeds of several gigabytes.
Google Cloud calls these local SSDS, and AWS EC2 calls them instance storage volumes. While Google allows you to connect local SSDS to any instance type, AWS only supports the following instance types: C3, F1, G2, HI1, I2, I3, M3, R3, and X1. In August 2017, Google Cloud also announced price cuts on local SOLID-state drives for on-demand and preemptable instances.
Billing and Pricing
Billing is handled quite differently when comparing Google Cloud and AWS. To be honest, unless users are very familiar with these platforms, they can be confused by the complexity of billing. If you’re a new user of both, you should be able to use these monthly meters:
- AWS simple monthly calculator calculator.s3.amazonaws.com/index.html
- Google Cloud Platform Pricing Calculator
Cloud.google.com/products/ca…
Estimating the monthly cost of the two cloud services is a challenge. There are even specialist tools like reOptimize or Cloudability to help customers better analyze their bills. AWS provides a dedicated dashboard for users to view their bills. Google Cloud platform lets users estimate their spending through its BigQuery tool. Both providers are looking for ways to cut costs and simplify billing.
A second billing
AWS announced per-second billing (at least 1 minute in EC2 instances) in September 2017. This provides greater flexibility for customers who need to create new instances and do a lot of work in a short amount of time. Not surprisingly, Google Cloud Platform also published their billing per second rule (the minimum number of Instances of Google computing Engine). The fact that AWS and GCP are launching new features at the same time speaks volumes about the competitiveness of the cloud platform market.
Google gives a good example in their post:
If your virtual machine life increases by an average of 30 seconds per minute, the savings from running 2,600 Vcpus per day will be enough to pay for your morning coffee (99 cents, assuming you can somehow find a 99-cent cup of coffee). By comparison, hourly billing generates enough waste to buy a coffee maker every morning (in this case, more than $100).
Promised usage discounts and reserved instances
Google Cloud and AWS both offer different benefits to users who are focused on investing in their platforms.
AWS EC2 provides what they call an example of reservation, offering significant discounts (up to 75%) compared to on-demand pricing and capacity reservation when used in a specific area of availability. They have different types of reserved instances:
- Standard reserved instance
- Convertible reserved instances
- Planned Reservation Instances
Google Cloud offers a “committed use discount,” which is available to all computing engine customers as of September 2017. This is basically the ability to purchase a commitment to use a contract in order to get a very good price for virtual machine use.
Right Scale compared Google Cloud’s promised usage discounts with AWS reserved instances and came to the following conclusions:
- When comparing Google’s discount for one-year use of THE AWS one-year standard RI, Google’s total cost is 28% less than AWS.
- When comparing Google’s three-year commitment discount with AWS’s three-year convertible discount, the total cost of the Google environment is 35% lower than AWS.
Google Cloud promises discounts and AWS reserved instances
Continue to discount
Another very cost-saving discount that Google Cloud offers is what is officially called the continuous use discount. These are automatic discounts offered by Google Cloud platform, and unlike AWS, where instances are reserved for a long time, Google offers them for a longer period of time. There may be users who are new to the cloud and don’t know which one to choose, but don’t worry, as both platforms offer free trials.
Google Cloud offers a 12-month trial worth $300. As of March 2017, they also have a free rating with no time limit. The following is an example of an instance that users can run for free with GCP: 0.2 virtual CPU, 0.60 GB of memory F1-Micro instance, supported by shared physical kernel. (US only) 5 GB cloud storage +30 GB disk AWS also offers a 12-month free trial. Here is an example of an instance you can run: T2.micro instance, 750 h/month 30GB disk (including 750 h/month hosted MySQL database) and 5GB cloud storage Please be sure to check each provider’s website for more details as they all offer free trial versions of many products, Not just examples of them. Who wins, Google Cloud or AWS? Both providers have their advantages and disadvantages. However, based on reviews, Google Cloud Platform is definitely the one you want when it comes to pricing and speed!
AWS has also provided cloud computing services to enterprises around the world over the past decade. They really were the pioneers that pushed the cloud computing industry forward, and are still something that cloud services like Google and Azure are trying to replicate and surpass. Their support, redundancy and availability in each region are excellent.
Google Cloud and AWS offer many other products and services that we can’t cover in this article. But rest assured, the ongoing battle by cloud computing providers to gain more market share will only benefit consumers and partners. That means lower prices, more products and services, and higher performance.
The resources
Google Cloud vs AWS in 2018 (Comparing the Giants)
For more content, you can follow AI Front, ID: AI-front, reply “AI”, “TF”, “big Data” to get AI Front series PDF mini-book and skill Map.