The big data ecosystem has certain requirements for the technical version of each component, and if the version is not adapted, various problems are likely to occur.
X, 2. X, and 3. X are different from each other. When building a hive data warehouse or hbase database based on Hadoop-HDFS, the version selection is preferred.
Like the usual version of support, there are many articles on the Internet, but based on Hadoop2.10.0 may be rare, so here is a summary:
Resource address (Hive, Hadoop, ZooKeeper, hbase, mysql database driver, etc.) :
Link: pan.baidu.com/s/1n4wRfi9G… Extract code: S8YX
Link: https://pan.baidu.com/s/1n4wRfi9G5Ff9yfcKlMdVLg extraction code: s8yx
One: Hadoop2.10.0 installation
Reference environment:
- MAC Pro with 16GB of ram
- Parallels DeskTop
- Install OS CentOs7 on the VM
- JDK version: Sun JDK 1.8
- Hadoop2.10.0 High availability (HA) mode, Hive2.3.7 single-node, HBase2.2.4 cluster (No standby Master is configured), ZooKeeper3.4.14 (three-node cluster), Hive metadata is stored in mysql
- Hadoop cluster Start the HDFS cluster and YARN
- Four virtual machines centos
Pre-preparation:
- Install jdk1.8 on the four VMS and configure the /etc/profile environment variables JAVA_HOME and path. For details, see the following
export JAVA_HOME=/usr/local/ jdk1.8.0 _65exportHADOOP_HOME = / home/hadoop/hadoop - 2.10.0exportHIVE_HOME = / home/hadoop/apache - hive - 2.3.7 - binexportHBASE_HOME = / home/hadoop/hbase - 2.2.4export PATH=$JAVA_HOME/bin:$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin:$HBASE_HOME/bin export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar export HADOOP_CLASSPATH=${JAVA_HOME}/lib/tools.jar export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeCopy the code
- Set the node name, and add node01: IP address, node02:IP address, node03: IP address, and node04: IP address to /etc/hosts. The Settings on the four machines are the same, and the SCP command is used to distribute the IP addresses. For details about how to configure /etc/hosts:
19.211.55.3 node01 19.211.55.4 node02 19.211.55.5 node03 19.211.55.6 node04Copy the code
- Configure SSH no-password login for the four VMS (each vm must be configured in the /root directory,.ssh/ is the finished..)
Ssh-keygen ssh-copy-id -i /root/.ssh/id_rsa.pub node01(node name)Copy the code
- The four VMS synchronize time and aliyun
yum install ntpdate ntpdate ntp1.aliyun.comCopy the code
- Upload the tar.gz package of hadoop2.10.0 to the /home/hadoop directory on the VM and decompress the package. (For convenience, do not configure an independent user and use the root user to perform startup operations.)
- Set the /etc/profile environment variable, run the source /etc/profile command, and distribute the command to other nodes for the same operation
- Prepare the ZooKeeper cluster and set the ZooKeeper cluster on Node02, node03, and node04. The configuration of the /home/hadoop/zookeeper-3.4.14/conf/zoo. CFG file is as follows: (/var/zfg/ zooKeeper set myid file, and set 1, 2, and 3 as zooKeeper identification tags according to the service name)
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/var/zfg/zookeeper server.1=node02:2888:3888 server.2=node03:2888:3888 server.3=node04:2888:3888 # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1Copy the code
- Set hadoop configuration files (write both HDFS and YARN configuration files for simultaneous distribution), add and modify hdFS-site. XML,mapred-site. XML,core-site. XML, YARn-site. XML, and slaves. Set the parameters of hadoop-env.sh, including the JDK directory.
- Reference to the HDFS -site. XML configuration file:
<? xml version="1.0" encoding="UTF-8"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesinthis file. --> <configuration> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>node01:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>node02:8020</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>node01:50070</value> </property> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>node02:50070</value> </property> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://node01:8485; node02:8485; node03:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/var/sxt/hadoop/ha/jn</value> </property> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration> Copy the code
- Core-site. XML configuration file reference:
<? xml version="1.0" encoding="UTF-8"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesinthis file. --> <configuration> <property> <name>hadoop.tmp.dir</name> <value>/var/abc/hadoop/cluster</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>node02:2181,node03:2181,node04:2181</value> </property> </configuration>Copy the code
- Reference to the yarn-site. XML configuration file:
<? xml version="1.0"? > <! -- Licensed under the Apache License, Version 2.0 (the"License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <! -- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <property> <name>yarn.resourcemanager.zk-address</name> <value>node02:2181,node03:2181,node04:2181</value> </property> <property> <name>yarn.resourcemanager.cluster-id</name> <value>mashibing</value> </property> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>node03</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>node04</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>1</value> </property> <property> <! - the client through this address to submit the application to the RM operations - > < name > yarn. The resourcemanager. Address. Rm1 < / name > < value > master: 8032 < value > / < / property > <property> <! -ResourceManager access address exposed to ApplicationMaster. ApplicationMaster uses this address to apply for or release resources from RM. --> <name>yarn.resourcemanager.scheduler.address.rm1</name> <value>node03:8030</value> </property> <property> <! - RM HTTP access address, check the cluster information - > < name > yarn. The resourcemanager. Webapp. Address. Rm1 < / name > < value > node03:8088 < value > / < / property > <property> <! - NodeManager exchange information through the address - > < name > yarn. The resourcemanager. Resource - tracker. Address. Rm1 < / name > < value > node03:8031 < / value > </property> <property> <! - administrators through the address send administrative commands to the RM - > < name > yarn. The resourcemanager. Admin. Address. Rm1 < / name > < value > node03:8033 < value > / < / property > <property> <name>yarn.resourcemanager.ha.admin.address.rm1</name> <value>node03:23142</value> </property> <property> <name>yarn.resourcemanager.address.rm2</name> <value>node04:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address.rm2</name> <value>node04:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>node04:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address.rm2</name> <value>node04:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address.rm2</name> <value>node04:8033</value> </property> <property> <name>yarn.resourcemanager.ha.admin.address.rm2</name> <value>node04:23142</value> </property> </configuration>Copy the code
- Slaves Document Reference:
node02 node03 node04Copy the code
- Mapred-site. XML file reference
<? xml version="1.0"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed under the Apache License, Version 2.0 (the"License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License forthe specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <! -- Put site-specific property overridesin this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>Copy the code
- Distribute the unzipped Hadoop directory to node02, Node03, and Node04 machines.
- Start the ZooKeeper cluster
- Start Initializing the Hadoop cluster.
- Start the HDFS cluster, yarn, or start-all.sh
2: Install and start hive2.3.7 (stand-alone +mysql)
Pre-preparation:
- Mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql > install mysql Also note that the Settings allow access to machines outside the native machine.
- Gz package of Hive 2.3.7 to /home/hadoop/ and decompress it
- Configure hive-site. XML, including mysql connection parameters and HDFS address
- Hive-site. XML reference:
<? xml version="1.0" encoding="UTF-8" standalone="no"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this workfor additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except inThe compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed toin writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License forthe specific language governing permissions and limitations under the License. --><configuration> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> < name > javax.mail. Jdo. Option. ConnectionURL < / name > < value > JDBC: mysql: / / 46.77.56.200:3306 / hive? createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> </configuration>Copy the code
- Configure the /etc/profile environment variable and copy the mysql database connection driver to the Hive lib library
- Initialize hive metadata and save it to mysql
Three: HBase2.2.4 cluster installation
Pre-preparation:
- Gz package of hBase2.2.4 to the /home/hadoop directory and decompress it
- Configure hbase-site. XML and hbase-env.sh, and copy the HDFS -site. XML configuration file in the Hadoop cluster to the hbase conf directory. Hbase – site. XML reference:
<? xml version="1.0"? > <? xml-stylesheettype="text/xsl" href="configuration.xsl"? > <! -- /** * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this workfor additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except inThe compliance * have the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 * * * * Unless required by applicable law or agreed toin writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://mycluster/hbase</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>node02,node03,node04</value> </property> <property> <name>hbase.unsafe.stream.capability.enforce</name> <value>false</value> </property> <property> <name>hbase.master.info.port</name> <value>60010</value> </property> </configuration>Copy the code
- Set the environment variables in /etc/profile, run the source /etc/profile command, and distribute them to other nodes for the same operation. Set the hbase-env.sh parameter, JDK path, and do not use the zooKeeper provided with it.
- Modify the RegionServers file to set the region distribution on node02, node03, and node04
- Node distribution (the four clusters have the same directory)
- Start the hbase cluster and perform basic operations. After the hbase shell is installed, run the status and processlist commands to check whether the hbase cluster is started properly
Possible problems during installation are listed and solved as follows:
Problems during the Hadoop cluster installation and configuration:
- Different versions of Hadoop may cause various problems in the process of cluster construction, because each version of Hadoop is quite different. What we need to do is to carefully look at the error log and look at the port number to quickly locate the problem and then fix it.
Faults during the Installation and configuration of the ZooKeeper cluster:
- Note When configuring the ZooKeeper cluster, ensure that the value of myID is the same as that of server.1 in the zoo.cfg configuration file. Otherwise, nodes cannot be combined with each other and the cluster fails to start.
Problems during the Hive installation and configuration:
- Note That the mysql driver must be copied to the Hive Lib library, and the driver must correspond to the connected mysql version.
Mysql installation and configuration
- Note that mysql is case-sensitive when it is installed on Linux. The default value is differentiated, but the default value is not differentiated. If this value is not set, various problems may occur during web development or automatic table creation. Mysql is case insensitive on Windows by default
Faults during hbase cluster installation and configuration:
- The Hmaste process disappears after a few seconds after it is started (you can run the JPS command to view information), or the Hmaster process exists on the standby server, but the Hmaster process disappears
- When commanline is commanline using the hbase shell, type Status or processList and press Enter.
- hbase error: KeeperErrorCode = NoNode for /hbase/master
- In the log: Java. Lang. RuntimeException: HMaster Aborted at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:160) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:104) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:76) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2120)
- A NULL or 500 error occurs after the webui is started
These problems are related. Either the default file path /hbase/master in ZooKeeper is not generated, which may be caused by startup failure or verification and synchronization failure. You can see the details in the log file.
Add this parameter to hbase-site. XML (to resolve verification failures) :
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>Copy the code
Open the hbase webUI (hbase-site.xml) and perform the following operations:
<property>
<name>hbase.master.info.port</name>
<value>60010</value>
</property>Copy the code
Use port 60010 to access the hbase webUI.
KeeperErrorCode = NoNode for /hbase/master In this case, stop the hbase cluster. If stop-hbase.sh cannot be stopped (it takes too long to stop), kill -9 to kill the process and delete the /hbase directory path in the ZooKeeper cluster.
Sh command to go to zk Commandlined, enter ls/command to view, and then run RMR /hbase command to delete the node path.
Check logs to see whether other errors, such as sync check, are recorded in the logs. If any, add the first configuration item and restart hbase. After hbase shell, enter processList and Status.