Abstract:
Aliyun E-MapReduce information
-
New features:
- EMR Hadoop cluster will add Flink component, version 1.4.0
- EMR Kafka clusters will add Schema Registry and Rest Proxy components
information
- This article mainly reviews the big data related projects promoted to Apache Top-level Project (TLP) in 2017, including Apache Beam (next-generation big data processing standard), Apache Eagle (Distributed Real-time Hadoop Data Security solution), Apache Ranger (Unified Authorization Management Framework), Apache Metron (Real-time Network security Detection framework), Apache SystemML (declarative machine learning platform optimized for big data), Apache CarbonData (Column Storage file Format), Apache Fluo (Large-scale Incremental Processing System), Apache DistributedLog (High performance Distributed Replication Logging System), Apache MADlib (SQL based extensible machine learning library), Apache RocketMQ (distributed messaging and streaming data platform), Apache Impala (new generation open source big data analysis engine), Apache Trafodion (transaction database engine based on Hadoop platform).
- Government data applications classic case | jiangxi teachers quality monitoring, evaluation and big data service platform Improve the education quality guarantee system Teachers in jiangxi province quality monitoring, evaluation and big data service platform, around the classroom teaching of primary and secondary schools in jiangxi province behavior assessment, to provide technical support for teacher quality dynamic monitoring, construct a scientific system of teachers’ quality assessment and diagnosis, It provides scientific decision-making basis for the government and schools to optimize the construction of teachers.
- Relying on big data platform to realize tax preferential policies in Chongqing, there are many enterprises like Digital City Technology company that enjoy tax preferential dividends. Chongqing Local Taxation Bureau gives full play to the supporting role of tax preferential policies, relying on the big data platform, focuses on promoting the precise implementation of tax preferential policies such as additional tax deduction for r&d expenses of high-tech enterprises, and wins the “addition” of enterprise development and economic structure transformation by tax “deduction”.
technology
- How to use ali Cloud SLS plug-in function and EMR to carry out mysqlbinlog quasi-real-time transmission.
- With the explosive development of Internet, IT, big data and other technologies, enterprise systems produce a large amount of burst data. For service data stored in the database, it can be synchronized and dropped to any sink end in real time without invasion through the logging mode of DBus data bus +Wormhole streaming processing platform for downstream system analysis. The log data generated by the business system contains many information, such as high and low business peaks, user traces, system exception/error messages, call chains, etc. It also contains priceless treasure. Some companies output information they want to monitor and track in anzhi data through burying points and other methods and means, so as to provide objective data support and make more efficient and accurate decisions.
- SparkRDMA: The SparkRDMA ShuffleManager developed and open-source by Mellanox Technologies uses the RDMA technology to Shuffle data on Spark. Instead of standard TCP. In the test results, the Shuffle data using RDMA is 2.18 times faster than the standard method!
- Performance comparison between Flink and Storm This paper compares the performance of the two frameworks, and provides suggestions and data support for resource planning, framework selection and performance tuning decisions of real-time computing platform and Flink platform construction, as well as certain reference for subsequent SLA construction.