1. Core-site.xml (tool module). Includes common Hadoop utility classes, renamed from the original Hadoopcore part. It mainly includes system Configuration tool Configuration, remote procedure call RPC, serialization mechanism, Hadoop abstract FileSystem FileSystem, etc. They provide basic services for building cloud computing environments on general-purpose hardware and provide apis for software development running on the platform.
2. HDFS -site. XML (data storage module). Distributed file systems that provide high throughput, high scalability, and high fault tolerant access to application data. It is the basis of data storage management in Hadoop system. It is a highly fault-tolerant system that detects and responds to hardware failures and is designed to run on low-cost general-purpose hardware. HDFS simplifies the file consistency model and provides high-throughput application data access through streaming data access. It is suitable for applications with large data sets.
namenode+ datanode + secondarynode
Mapred-site. XML (data processing module), a large-scale data set parallel processing system based on YARN. Is a computational model used for large data volume calculations. The MapReduce implementation of Hadoop, together with Common and HDFS, constitutes the three components in the early development of Hadoop. MapReduce divides applications into Map and Reduce steps. Map performs specific operations on individual elements in a data set and generates an intermediate result in the form of key-value pairs. Reduce prescribes all the “values” of the same “key” in the intermediate result to get the final result. Functional partitioning like MapReduce is very suitable for data processing in a distributed parallel environment consisting of a large number of computers.
Yarn-site. XML (Job scheduling + Resource Management platform), task scheduling, and cluster resource management. resourcemanager + nodemanager