Abstract:

Dynamic description of Aliyun e-MapReduce

  • The capacity of cloud HBase exceeds 300 GB. Apply for a work order
  • Phoenix is supported in HBase. Phoenix supports real-time analysis of massive data

information

  • The Cloud computing technology conference was held in Beijing. Song Jun, a technical expert from Alibaba, shared a speech titled “SparkSQL in ETL”. According to Song jun, ETL mainly has three steps: extraction, transformation and loading. First, data sources are read, cleaned and integrated, and finally these data are stored in the target storage. ETL requirements are simple and easy to use, support for multiple data sources, support for fault tolerant processing, rich operators, complex data types, fast calculation and other factors. How to achieve the above goals, Song Jun from DataSource, rich operator, Hive compatibility, performance, cloud ETL five aspects to make interpretation.
  • Google’s HBaseCon West 2017 Big Data Conference is just around the corner, Apache HBase is a distributed and extensible open source database implementation based on the Hadoop framework and Google Bigtable technology. In a blog post, Google said the HBase open source community has grown significantly thanks to support from enterprise users such as Alibaba, Apple, Facebook and Visa, and is building a robust big data “ecosystem” with key components including: Apache Phoenix, OpenTSDB, Apache Trafodion, Apache Kylin, etc.

technology

  • HBase Phoenix helps real-time analysis of massive data Phoenix meets real-time analysis requirements of massive data. It queries a small amount of data on massive data by creating indexes and returns data in real time. Support some complex SQL operations, including join, sub-query, etc. Not suitable for ETL, such as 10T data to 10T data.
  • Building A VPN to access the cloud HBase database in the cloud development environment Currently, the cloud HBase is in public testing, and many customers are using it. During the use, most developers need to connect to the cloud HBase service on their own computers, and have low requirements on performance. This article mainly describes how to build a test environment through VPN, VPC and other ways to meet the needs of development.
  • HBase as the core storage system for the index construction and online machine learning platform of Taobao, HBase is an important part of ali search infrastructure. This document describes the history, scale, application scenarios, and problems and optimization of HBase in Alibaba search.
  • This article will briefly introduce and compare the performance of Hive, Impala, Shark, Stinger and Presto, five mainstream open source big data query analysis engines, and finally summarize and forecast. The evolutionary maps of Hive, Impala, Shark, Stinger, and Presto are shown in Figure 1. Let’s take a look.
  • Kudu: Kudu is a distributed storage system based on Raft that aims to combine low latency write and high performance analysis scenarios and is well embedded in the Hadoop ecosystem, compared to other systems such as Cloudera Impala, For example, Connect to Apache Spark.