2021 Quality creation blog in the field of big data, take you from the entry to the master, the blog is updated every day, gradually improve the knowledge system of big data articles, to help you learn more efficiently.

Those who are interested in big data can follow the wechat public account sanbang Big Data

directory

Hadoop Application at home and abroad

Some enterprises where Hadoop is used abroad

A, Yahoo!

Second, the Facebook

3, IBM,

Hadoop in the domestic application of some enterprises

A, baidu

Ii. Alibaba

Third, huawei

Four, tencent

Hadoop Application at home and abroad

Some enterprises where Hadoop is used abroad

A, Yahoo!

Yahoo is the biggest proponent of Hadoop, with more than 42,000 nodes in its Hadoop machine and more than 100,000 core cpus running Hadoop. The largest single-master cluster has 4500 nodes (each node has dual 4-core CPUboxesw, 4 x 1TB disks, and 16GB of ram). The total storage capacity of the cluster is greater than 350PB, and more than 10 million jobs are submitted every month.

Yahoo Hadoop applications mainly include the following aspects:

  • Support advertising system
  • User behavior analysis
  • Support for Web search
  • Anti-spam system
  • Personalized recommendation

Second, the Facebook

Primarily used to store copies of internal logs as a source for data mining and log statistics. Two main clusters are used: A cluster consisting of 1100 nodes, including 8800 cpus (8 cores per machine), and 12000TB of raw storage (12 TB of hard disk per machine). A cluster consisting of 300 nodes, including 2400 cpus (8 cores per machine). And 3000TB of raw storage (i.e., 12 TB of hard disk per machine) to develop a SQL syntax based project: HIVE.

Facebook has more than 1400 machine nodes using Hadoop cluster, with a total of 11,200 core cpus and more than 15PB original storage capacity. Each commercial machine node is configured with 8-core CPU and 12TB data storage, mainly using StreamingAPI and JavaAPI programming interface. Facebook also built an advanced data warehouse framework called Hive on top of Hadoop, and Hive has officially become an Apache level 1 project based on Hadoop.

3, IBM,

IBM Blue Cloud also uses Hadoop to build cloud infrastructure. IBM Blue Cloud uses Linux operating system images virtualized by Xen and PowerVM and Hadoop parallel workload scheduling, and has released its own Hadoop distribution and big data solution.

Hadoop in the domestic application of some enterprises

A, baidu

The size of Hadoop cluster has reached nearly ten, with more than 2800 machine nodes in a single cluster and tens of thousands of Hadoop machines. The total storage capacity exceeds 100PB, and more than 74PB have been used. Thousands of jobs are submitted every day, and the daily data input exceeds 7500TB and output exceeds 1700TB.

Baidu’s Hadoop cluster provides unified computing and storage services for the entire company’s data team, big search team, community product team, advertising team and LBS group. The main applications include:

  • Data mining and analysis
  • Log Analysis platform
  • Data warehouse system
  • Recommendation engine system
  • User behavior analysis system

Ii. Alibaba

The Hadoop cluster of alibaba has about 3200 servers, about 30,000 physical CPU cores, 100TB of memory, and more than 60PB of storage capacity. The number of jobs per day exceeds 150,000, the number of hive query queries per day exceeds 6,000, and the amount of scanned data per day is about 7.5PB. The number of files scanned every day is about 400 million, and the storage utilization is about 80%. The AVERAGE CPU utilization is 65%, and the peak value can reach 80%.

Hadoop cluster has 150 user groups and 4500 cluster users. It provides basic computing and storage services for e-commerce network platforms. Its main applications include:

Data platform system.

Search support.

E-commerce data.

Recommendation engine system.

Search leaderboards.

Third, huawei

Huawei is one of the companies contributing to Hadoop, ranking ahead of Google and Cisco. Huawei has in-depth research on Hadoop HA solution and HBase, and has introduced its own big data solution based on Hadoop to the industry.

Four, tencent

TDW (Tencent Distributed Data Warehouse) is built based on open source software Hadoop and Hive, breaking the limitations of traditional Data Warehouse that can not be linear expansion and poor controllability. And according to Tencent’s large amount of data, complex calculation and other specific circumstances of a large number of optimization and transformation.

TDW service covers most of Tencent’s business products, with a single cluster size of 4,400 units, a total number of CPU cores of about 100,000, and a storage capacity of 100PB. The daily operation number is more than 1 million, the daily calculation amount is 4PB, and the concurrent operation number is about 2000. The actual amount of stored data is 80PB, and the number of files and blocks reaches more than 600 million. The storage usage is about 83%, and the CPU usage is about 85%. After more than four years of continuous investment and construction, TDW has become Tencent’s largest offline data processing platform. The functional modules of TDW include Hive, MapReduce, HDFS, TDBank, and Lhotse.

  • 📢 : lansonli.blog.csdn.net
  • 📢 welcome to like 👍 collect ⭐ message 📝 if there is an error please correct!
  • 📢 this article was originally written by Lansonli and originally appeared on CSDN blog 🙉
  • 📢 big data series of articles will be updated every day, stop to rest do not forget that others are still running, I hope that we seize the time to learn, strive for a better life ✨