Spark is an open source Hadoop MapReduce-like general parallel framework developed by UC Berkeley AMP Lab. It is a fast and universal big data processing...
Hadoop Distributed File System HDFS is an open source implementation of Google Distributed File System (GFS). It is one of the two core components of...
This article is the 11th session of Meitu Internet Technology Salon shared by guests. The public account replied to "Meitu Big Data Platform" to get...
Why develop an HDFS file system in a Hadoop project? Better support for distributed computing. Storage model Architecture Design Role Function Metadata persistence security mode...
The HDSF is essentially a distributed file system, but it scales out relatively easily compared to ordinary computers and comes with its own high availability...
1.1 Campus NETWORK NAT Configuring static IP Address Note that: The hotspot cannot be enabled during NIC sharing. You can cancel NIC sharing when the...
Hadoop is an open source big data framework written in Java that provides reliable, scalable distributed computing and is now hosted by Apache. The Hadoop's...
A medium-sized e-commerce platform generates millions of pieces of raw data and hundreds of millions of user behavior data every day. Generally speaking, there are...
To use the HDFS API, import the hadoop-client dependency. For the CDH version of Hadoop, you also need to specify the repository address: FileSystem is...
1. When it comes to big data processing, I believe that many people's first thought is Hadoop MapReduce. Yes, Hadoop MapReduce laid the groundwork for...
1. HDFS read process The client uses the DistributedFileSystem to request the NameNode to download files. The NameNode queries metadata and finds the address of...
1. Inconsistent use of class types (Mapper and Reduce) This is because the inheritance classes of Mapper and Reduce use inconsistent T parameters. For example,...
With the rapid development of bytedance's services, the HDFS service scale has reached the "double 10" level: 100,000 nodes in a single cluster and 10EB...
Mechanism of SecondaryNameNode 1) Stage 1: Starting the Namenode 1: Starting the Namenode for the first time, formatting the Namenode, creating fsimage and Ed
This paper briefly introduces the running steps of ReduceTask, and focuses on how ReduceTask uses iterator mode to read data, avoiding the problem of OOM...