The fastest shortcut in the world is to keep your feet on the ground. Focus on the sharing place.

Recently, I needed to re-build a Hadoop cluster with Ambari at work, so I recorded the process of setting up a Hadoop cluster, hoping to provide a reference for others who have the same needs.

Author: Graph header data

Ambari Ubuntu14.04 latest version 2.2.1

HDP Ubuntu14.04 latest version 2.4.3.0

What is Ambari

Apache Ambari is a Web-based tool that supports provisioning, management, and monitoring of Apache Hadoop clusters.

Ambari supports most Hadoop components, including HDFS, MapReduce, Hive, Pig, Hbase, Zookeeper, Sqoop, and Hcatalog.

Apache Ambari supports centralized management of HDFS, MapReduce, Hive, Pig, Hbase, Zookeepr, Sqoop, and Hcatalog. It is also one of the top five Hadoop management tools. (Is an open source Hadoop one-click installation service)

What can we do with him? Why do we use it?

We can use Ambari to quickly set up and manage Hadoop and frequently used service components.

For example, HDFS, YARN, Hive, hbase, Oozie, SQOOP, Flume, ZooKeeper, and Kafka. (In plain English, you can steal a lot of laziness.)

Why do we use it again

  • The first is that Ambari was an early Hadoop management cluster tool
  • Second, Ambari is now recommended on the Hadoop website.
  • Cluster provisioning is simplified with a step-by-step installation wizard.
  • You can check whether Hadoop Core (HDFS and MapReduce) and related projects (such as HBase, Hive, and HCatalog) are healthy by configuring key operation and maintenance metrics.
  • Supports visualization and analysis of job and task execution to better view dependencies and performance.
  • Monitoring information is exposed through a complete RESTful API that integrates existing O&M tools.
  • The user interface is intuitive, allowing users to view information and control the cluster easily and efficiently.

Ambari uses Ganglia to collect metrics and Nagios to support system alerts that will be emailed to administrators when they need to be concerned (for example, when a node is down or there is insufficient disk space).

In addition, Ambari supports Hadoop security by being able to install secure (Kerberos-based) Hadoop clusters, providing role-based user authentication, authorization, and auditing capabilities, and integrating LDAP and Active Directory for user management.

The cluster structures,

1. Let’s do some preparatory work before installation

## Tell servers who they are and what their nicknames are (modify the configuration hosts file)Vim /etc/hosts 10.1.10.1 master 10.1.10.2 slave1 10.1.10.3 slave2
## Then let us go in and out of their house with the access card.SSH /id_rsa.pub >> ~/. SSH /authorized_keys ## Write the public key to the authorized_keys file#Write all public keys to the master server
#Second, write the master public key to slave1,slave2
#I'm not going to tell you my password is "What time is it, Wolf?"
scp ~/.ssh/authorized_keys slave1:~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys slave2:~/.ssh/authorized_keys

#Update the time zone and system localization configurationApt-get install localepurge ## enter Dpmg-reconfigure localepurge && locale-gen zh_cn.utF-8 En_us. UTF-8 ## enter apt-get update && apt-get install -y tzdata echo "Asia/Shanghai" > /etc/timezone ## /etc/localtime dpkg-reconfigure -f noninteractive tzdata vi /etc/ntp.conf server 10.1.10.1Copy the code

2. Then do some optimization of Ubuntu system

###1.1 Disable swap partitionsSwapoff -a vim /etc/fstab ## swap was on /dev/sda2 during installation
#UUID=8aba5009-d557-4a4a-8fd6-8e6e8c687714 none swap sw 0 0

### 1.2 Change the number of open file descriptors and add ulimit at the endVi/etc/profile ulimit - SHn 512000 vim/etc/security/limits the conf # # resize all increase 10 times * soft nofiles 600000 * hard nofiles 655350  * soft nproc 600000 * hard nproc 655350### 1.2 Using the command to make the changes take effect
source /etc/profile

###1.3 Modify kernel configuration
vi /etc/sysctl.conf
#Just post it
fs.file-max = 65535000
net.core.somaxconn = 30000
vm.swappiness = 0
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 16384 16777216
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.ip_local_port_range = 1024 65000
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1
#Run the command to make the configuration take effect
sysctl -p

#Configure the kernel to disable THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
## Permanently shut down.
vi /etc/rc.local   
if test -f /sys/kernel/mm/transparent_hugepage/enabled; then  
   echo never > /sys/kernel/mm/transparent_hugepage/enabled  
fi  
if test -f /sys/kernel/mm/transparent_hugepage/defrag; then  
   echo never > /sys/kernel/mm/transparent_hugepage/defrag  
fi  

Copy the code

Install and deploy ambari-Server (Ubuntu 14.04 + Ambari 2.2.1)

## Update the download sourcewget -O /etc/apt/sources.list.d/ambari.list http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0/ambari.list apt - key adv - recv - keys -- keyserver  keyserver.ubuntu.com B9733A7A07513CAD apt-get update#Install ambari-server on master node 
apt-get install ambari-server -y
#Install ambari-agent on all nodes 
apt-get install ambari-agent -y
Copy the code

4. Modify the ambari-agent configuration to point to ambari-server

vi /etc/ambari-agent/conf/ambari-agent.ini
## change the hostname
[server] 
hostname=master
url_port=8440
secured_url_port=8441

#Create a Database for the ambari server (JDK 1.7)Ambari -server setup ##
## start ambari
ambari-server start
ambari-agent start

Copy the code

5, after the headache Shell command, began to connect some human things.

Use your browser to access http://10.1.10.1:8080/ and the default password is amdin/admin. Click LAUNCH INSTALL WIZARD and let’s get started

6. Give the cluster a name

7, this should pay attention to make sure your HDP version or there will be trouble later

** HDP2.4.3 **

Example: public-repo-1.hortonworks.com/HDP/debian7…

Click Next to check whether the data source is working properly. If an error is reported, click “Skip Repository Base URL Validation (Advanced) “to Skip the check

9, enter hostname master slave1 slave2 to install ambari-agent on slave

10. Check the server status – here you need to wait and restart the ambari-server if the wait time is too long

11. Select the required service HDFS YARN ZK

12. Use the default Ambari assignment mode and click Next to start the installation

Now is the time to consider the speed of the network

14. After the installation is complete, click Next to refresh the main page to see that our Hadoop cluster is started by default

15. Go to HDFS and click Restart ALL to restart ALL components

To verify the installation, click NameNodeUI

17. Basic Information page

Do you want to run a task and try it?

#Enter the server to execute
## # to create HDFS directory can again http://master:50070/explorer.html#/ interface
hdfs dfs -mkdir -p /data/input 
#Upload files from the server to HDFSHDFS DFS -put file /data/input/### Test using the examples provided on the official websiteHadoop jar HDFS ://tesla-cluster/data/hadoop-mapreduce-examples-2.7.1.2.4.0.0-169.jar wordcount /data/input /data/output1Copy the code

19. The result is _SUCCESS and file

The following is not a serious statement

NameNode and ResouceManage are both single-point modes. Ambari supports HA(high availability). Because of the limited space, we will explain it in a separate page at the end of the diagram.