Hadoop-3.2.2 Hive-3.1.2 hbase-2.3.5 Installation

I. Virtual machine environment

1.1 Configuring a static IP address for campus NETWORK NAT

Reference:

Ubuntu Server 20.04LTS NAT Mode Configure a static IP address

Win10 解决 ping 错误

Note that the hotspot cannot be enabled during NIC sharing. You can disable NIC sharing when the hotspot needs to be enabled, and then enable NIC sharing again.

The IP address assigned to the host VMnet8 NIC is 192.168.137.1/24. Therefore, the SUBNET network segment of the VM VMnet8 is set to 192.168.137.0/24, and the gateway IP address is set to the same as that of the host VMnet8 NIC.

Install ubuntu 1.2 to 20.04

Reference: Install ubuntu-20.04-live-server-amd64.iso for the VM

Image source choice: http://mirrors.aliyun.com/ubuntu.

Configure the static IP address, hostname, and image source during the installation to facilitate subsequent modification.

Static IP

ip addr
vim /etc/netplan/00-installer-config.yaml
Copy the code

Network: Ethernets: ens33: addresses: -192.168.137.100/24 gateway4: 192.168.137.1 Nameservers: addresses: - 192.168.137.1 Search: [] version: 2Copy the code

One caveat: VIM’s default formatting for YAML is not particularly good, so consider adjusting the indentation manually.

netplan apply
Copy the code

hostname

hostnamectl set-hostname ubuntu
#Log back in
Copy the code

Image source

Reference: [Linux]Ubuntu 20.04 for Ali source

vim /etc/apt/sources.list
Copy the code

deb http://mirrors.aliyun.com/ubuntu focal main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu focal-updates main restricted  universe multiverse
deb http://mirrors.aliyun.com/ubuntu focal-backports main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu focal-security main restricted universe multiverse
Copy the code

Enable the root SSH

#Setting the root Password
sudo passwd root

#Uncomment PermitRootLogin and change its value to yes
sudo vim /etc/ssh/sshd_config

#Restart the SSHD
sudo systemctl restart sshd
Copy the code

Root user does not have bash color highlighted:

sudo cp /etc/skel/.bashrc /root

#Uncomment force_COLOR_prompt =yes
sudo vim /root/.bashrc

#Log back in
Copy the code

hosts

vim /etc/hosts
Copy the code

#.
192.168.137.1   win10
192.168.137.101 node101
192.168.137.102 node102
192.168.137.103 node103
Copy the code

Note: in order to access HDFS normally through web pages, you also need to configure domain name resolution on the host (C:\Windows\System32\drivers\etc\hosts).

SSH Login without password

# node101 node102 node103
ssh-keygen -t rsa
ssh-copy-id root@node101
ssh-copy-id root@node102
ssh-copy-id root@node103
Copy the code

Install the open – jdk8

apt install openjdk-8-jdk
Copy the code

The default installation location is usr/lib/jvm/java-8-openjdk-amd64.

vim /etc/profile
Copy the code

#.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Copy the code

source /etc/profile && source ~/.bashrc
Copy the code

1.3 SSH and FTP Schemes

Windows Terminal

Documentation: Windows/Development Environment/Windows Terminal/Overview

Cascadia Code: github.com/microsoft/c…

To make up for the missing “send command to all Windows” feature, some scripts can be used instead:

vim ~/bin/cluster.sh
Copy the code

#! /bin/bashcase $1 in exec) for host in node101 node102 node103; do echo "========== ${host} ==========" ssh "${host}" "${@:2}" done ;; rsync) for host in node101 node102 node103; do if [[ "$(hostname)" != "${host}" ]]; then echo "========== ${host} ==========" rsync -a -r -v "$2" "${host}":"$2/.. /" fi done ;; esacCopy the code

chmod 777 ~/bin/cluster.sh
Copy the code

~/bin/cluster.sh exec "jps"
~/bin/cluster.sh rsync "$HADOOP_HOME/etc"
Copy the code

When you need to transfer files between the host and virtual machine, you may have to use some tools, such as XFTP, MobaXterm, etc. But of course there are other ways.

`sftp`

cd <local_dir>

sftp <user>@<host>

cd <remote_dir>

put -r <local_dir>

get -r <remote_file>
Copy the code

Nginx File server +`curl`

#.http { # ... server { # ... root /usr/share/nginx/html; charset utf-8; #... location /public { autoindex on; autoindex_localtime on; autoindex_exact_size off; }}}Copy the code

curl <url> -o <filename>
Copy the code

Docker

docker cp <local_file> <container_id | container_name>:<container_dir>

docker cp <container_id | container_name>:<container_file> <local_dir>
Copy the code

1.4 other

VIM

vim /etc/vim/vimrc
Copy the code

"... filetype plugin indent on set showcmd set showmatch set ignorecase set smartcase set incsearch set autowrite set hidden set number set ruler set expandtab set tabstop=4 set cursorline set confirm set hlsearchCopy the code

The time zone

timedatectl set-timezone Asia/Shanghai

timedatectl
Copy the code

Second, the Hadoop

2.1 install hadoop – 3.2.2

Documents:

Apache Hadoop 3.2.2 – Hadoop: Setting up a Single Node Cluster.

Apache Hadoop 3.2.2 – Hadoop Cluster Setup

The curl https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz - o hadoop - 3.2.2. Tar. Gz Mkdir /opt/hadoop tar -zxvf hadoop-3.2.2.tar.gz -c /opt/hadoopCopy the code

vim /etc/profile
Copy the code

#.Export JAVA_HOME=/usr/lib/ JVM /java-8-openjdk-amd64 export HADOOP_HOME=/opt/hadoop/hadoop-3.2.2 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinCopy the code

2.2 a Zookeeper cluster

You can start a copy of Docker-compose directly on the host:

version: '3'

services:
  zk1:
    image: Zookeeper: 3.7
    hostname: zk1
    ports:
      - 2181: 2181
    volumes:
      - ./zk1/data/:/data/
      - ./zk1/datalog/:/datalog/
    environment:
      ZOO_MY_ID: 1
      ZOO_SERVERS: Server. 1 = 0.0.0.0:2888-3888; 2181 server.2=zk2:2888:3888; 2181 server.3=zk3:2888:3888; 2181

  zk2:
    image: Zookeeper: 3.7
    hostname: zk2
    ports:
      - 2182: 2181
    volumes:
      - ./zk2/data/:/data/
      - ./zk2/datalog/:/datalog/
    environment:
      ZOO_MY_ID: 2
      ZOO_SERVERS: server.1=zk1:2888:3888; 2181 Server. 2 = 0.0.0.0:2888-3888; 2181 server.3=zk3:2888:3888; 2181

  zk3:
    image: Zookeeper: 3.7
    hostname: zk3
    ports:
      - 2183: 2181
    volumes:
      - ./zk3/data/:/data/
      - ./zk3/datalog/:/datalog/
    environment:
      ZOO_MY_ID: 3
      ZOO_SERVERS: server.1=zk1:2888:3888; 2181 server.2=zk2:2888:3888; 2181 Server. 3 = 0.0.0.0:2888-3888; 2181

networks:
  default:
    external: true
    name: local_net
Copy the code

docker network create local_net

docker-compose up -d
Copy the code

2.3 a Hadoop cluster

The configuration file

Default configuration file:

core-default.xml

hdfs-defautl.xml

hdfs-rbf-default.xml

mapred-default.xml

yarn-default.xml

Reference:

Hadoop Why do I need to reconfigure JAVA_HOME in Hadoop-env.sh?

HDFS_NAMENODE_USER, HDFS_DATANODE_USER & HDFS_SECONDARYNAMENODE_USER not defined

`hadoop-env.sh`

vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Copy the code

#.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#.
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
Copy the code

`workers`

vim $HADOOP_HOME/etc/hadoop/workers
Copy the code

node101
node102
node103
Copy the code

`core-site.xml`

vim $HADOOP_HOME/etc/hadoop/core-site.xml
Copy the code

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://hdfs-cluster</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>The/opt/hadoop/hadoop - 3.2.2 / TMP</value>
    </property>
    <property>
        <name>hadoop.http.staticuser.user</name>
        <value>root</value>
    </property>

    <! -- HDFS zookeeper address -->
    <property>
        <name>ha.zookeeper.quorum</name>
        <value>win10:2181,win10:8182,win10:2183</value>
    </property>

    <! -- YARN Zookeeper address -->
    <property>
        <name>hadoop.zk.address</name>
        <value>win10:2181,win10:2182,win10:2183</value>
    </property>
</configuration>
Copy the code

Note: Hadoop.tmp. dir cannot reference environment variables.

`hdfs-site.xml`

vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
Copy the code

<configuration>
    <! -- HDFS HA -->
    <property>
        <name>dfs.nameservices</name>
        <value>hdfs-cluster</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.hdfs-cluster</name>
        <value>nn1,nn2,nn3</value>
    </property>

    <! -- NameNode RPC communication address -->
    <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn1</name>
        <value>node101:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn2</name>
        <value>node102:8020</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.hdfs-cluster.nn3</name>
        <value>node103:8020</value>
    </property>

    <! -- NameNode HTTP communication address -->
    <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn1</name>
        <value>node101:9870</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn2</name>
        <value>node102:9870</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.hdfs-cluster.nn3</name>
        <value>node103:9870</value>
    </property>

    <! -- JournalNode -->
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://node101:8485; node102:8485; node103:8485/hdfs-cluster</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>The/opt/hadoop/hadoop - 3.2.2 / TMP/DFS/journalnode /</value>
    </property>

    <! -- Isolation mechanism of split brain -->
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
            sshfence
            shell(/bin/true)
        </value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
    </property>

    <! -- HDFS automatic failover -->
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.hdfs-cluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
</configuration>
Copy the code

`mapred-site.xml`

vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
Copy the code

<configuration>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
Copy the code

`yarn-site.xml`

vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
Copy the code

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>

    <! -- ResourceManager HA -->
    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>

    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn-cluster</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2,rm3</value>
    </property>

    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>node101</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>node102</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm3</name>
        <value>node103</value>
    </property>

    <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>node101:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>node102:8088</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address.rm3</name>
        <value>node103:8088</value>
    </property>

    <! -- ResourceManager automatically restores -->
    <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
    </property>
</configuration>
Copy the code

Log gathered

`mapred-site.xml`

<configuration>
    <! -... -->

    <! -- JobHistoryServer -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>node101:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>node101:19888</value>
    </property>
</configuration>
Copy the code

`yarn-site.xml`

<configuration>
    <! -... -->

    <! -- Log aggregation -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.log.server.url</name>
        <value>http://node101:19888/jobhistory/logs</value>
    </property>
    <! -- 3600 * 24 * 7 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
</configuration>
Copy the code

Synchronizing configuration Files

# node101
rsync -a -r -v $HADOOP_HOME/etc node102:$HADOOP_HOME
rsync -a -r -v $HADOOP_HOME/etc node103:$HADOOP_HOME
Copy the code

Initializing a cluster

#Start the JournalNode
# node101 node102 node103
hdfs --daemon start journalnode

#Format the NameNode
# node101
hdfs namenode -format

#Synchronize NameNode metadata
# node101
hdfs --daemon start namenode
# node102 node103
hdfs namenode -bootstrapStandby

#Formatting ZookeeperFailoverController
# node101
hdfs zkfc -formatZK
Copy the code

Delete TMP and logs before reformatting:

rm -rf $HADOOP_HOME/tmp $HADOOP_HOME/logs
Copy the code

Start the cluster

Reference: In Hadoop HA mode, after the active Namenode node is killed, the standby Namenode node fails to automatically start

# node101
start-dfs.sh
start-yarn.sh
mapred --daemon start historyserver
Copy the code

Start separately:

hdfs --daemon <start|stop> namenode
hdfs --daemon <start|stop> secondarynamenode
hdfs --daemon <start|stop> datanode

yarn --daemon <start|stop> resourcemanager
yarn --daemon <start|stop> nodemanager

mapred --daemon <start|stop> historyserver
Copy the code

Check the HA status:

hdfs haadmin -getAllServiceState

yarn rmadmin -getAllServiceState
Copy the code

WordCount

cd $HADOOP_HOME mkdir input echo -e "i keep saying no\nthis can not be the way it was supposed to be\ni keep saying no\nthere has gotta be a way to get you close to me" > input/word.txt hadoop fs -mkdir /input hadoop fs -put Input/word. TXT/input hadoop jar share/hadoop/graphs/hadoop - graphs - examples - 3.2.2. Jar wordcount/input/outputCopy the code

2.4 HDFS

HDFS Architecture

NameNode

Manages the HDFS namespace
Configuring a Copy Policy
Manages data block mapping information
Handle client read/write requests

DataNode

Stores actual data blocks
Perform read/write operations on data blocks

SecondaryNameNode

Copy the NameNode and share its workload. For example, merge Fsimage and Edits regularly and push to the NameNode
In an emergency, the NameNode can be recovered

Client

File sharding: When uploading files to the HDFS, the Client divides the files into blocks and uploads them
Interacts with NameNode to obtain file location information
Interacts with Datanodes to read or write data
The Client provides several commands to manage HDFS, such as NameNode formatting
The Client can use some commands to access the HDFS, such as adding, deleting, modifying, and querying HDFS

Common HDFS Commands

Documents: FileSystemShell

#Shear upload
hadoop fs -moveFromLocal <local_file> <hdfs_dir>

#Copy to upload
hadoop fs -copyFromLocal <local_file> <hdfs_dir>
hadoop fs -put <local_file> <hdfs_dir>

#Additional upload
hadoop fs -appendToFile <local_file> <hdfs_file>

#download
hadoop fs -copyToLocal <hdfs_file> <local_dir>
hadoop fs -get <hdfs_file> <local_dir>

#Setting the number of copies
hadoop fs -setrep <replication> <hdfs_file>
Copy the code

HDFS Java API

Install Hadoop 3.3.0 on Windows 10 Step by Step Guide

Download: github.com/kontext-tec…

Pom. XML:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.2.2</version>
</dependency>
Copy the code

Check file status:

public static void main(String[] args) throws IOException, InterruptedException {
    FileSystem fs = FileSystem.get(URI.create("hdfs://node101:8020"), new Configuration(), "root");
    FileStatus status = fs.getFileStatus(new Path("/input/word.txt"));
    System.out.println(status);
    fs.close();
}
Copy the code

2.5 graphs

Graphs process

A complete MapReduce program runs in distributed mode with three types of instance processes:

ApplicationMaster: Responsible for process scheduling and state coordination of the entire program
MapTask: Responsible for the whole data processing process of the Map phase
ReduceTask: Responsible for the entire data processing process in the Reduce phase

MapTask working mechanism

Read phase

The RecordReader obtained by MapTask in InputFormat is used to parse KV from InputSplit.
The Map phase

In this stage, the resolved KV is handed over to the map() function written by the user for processing, and a series of new KV are generated.
Collect phase

In a user-written map() function, outputCollector.collect () is typically called to output the results when the data processing is complete.

Inside the function, it partitions the generated KV and writes it to a ring buffer.
Spill phase

When the ring cache is full, MapReduce writes data to the local disk to generate a temporary file.

Note that before data is written to the local disk, the data must be sorted locally and merged or compressed if necessary.

ReduceTask Working mechanism

Copy stage

ReduceTask Remotely copies a piece of data from each MapTask. If the size exceeds a certain threshold, the data is written to the disk; otherwise, the data is directly stored in the memory.
Sort stage

When remotely copying data, ReduceTask starts two background threads to merge files on the memory and disk to prevent excessive memory usage or excessive files on the disk.

According to the MapReduce semantics, the input data for the user-written Reduce () function is a set of data aggregated by Key.

To bring together key-identical omissions, Hadoop uses a sortion-based strategy.

Since each MapTask has implemented local sorting of its own processing results, the ReduceTask only needs to perform a merge sort for all data.
Reduce phase

The reduce() function writes the calculated results to HDFS.

WordCount

public class WordCount {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        Job job = Job.getInstance();

        job.setJarByClass(WordCount.class);

        job.setMapperClass(WordCount.WordCountMapper.class);
        job.setReducerClass(WordCount.WordCountReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        boolean successful = job.waitForCompletion(true);
        System.exit(successful ? 0 : 1);
    }

    public static class WordCountMapper extends Mapper<LongWritable.Text.Text.IntWritable> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] words = line.split("");
            for (String word : words) {
                context.write(new Text(word), new IntWritable(1)); }}}public static class WordCountReducer extends Reducer<Text.IntWritable.Text.IntWritable> {
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int count = 0;
            for (IntWritable value : values) {
                count += value.get();
            }
            context.write(key, newIntWritable(count)); }}}Copy the code

Package upload execution:

#Delete the /output directory
hadoop fs -rmdir --ignore-fail-on-non-empty /output
#performHadoop jar word - count - 0.0.1. Jar xyz. Icefery. Mr. Wc. The WordCount/input/word. TXT/outputCopy the code

2.6 YARN

YARN Architecture

ResourceManager

Handle client requests
Monitor the NodeManager
Start or monitor ApplicationMaster
Resource allocation and scheduling

NodeManager

Manage resources on a single node
Process the commands from ResourceManager
Process commands from ApplicationMaster

ApplicationMaster

Request resources for the application and assign them to internal tasks
Task monitoring and fault tolerance

Container

Container is a resource abstraction in YARN. It encapsulates multi-dimensional resources on a node, such as memory, CPU, disk, and network resources

YARN working mechanism

The MR program is submitted to the node where the client resides
The YarnRunner applies for an Application from ResourceManager
ResourceManager returns the resource path of the application to YarnRunner
The program submits the required resources to the HDFS
After application resources are submitted, apply to run ApplicationMaster
ResourceManager initializes user requests into a Task
One NodeManager receives the Task
The NodeManager creates the Container and generates the ApplicationMaster
Container Copies resources from the HDFS to the local
ApplicationMaster applies to ResourceManager for running MapTask resources
ResourceManager assigns the MapTask running task to the other two NodeManagers. The other two NodeManagers receive the task and create a container respectively
The MR starts scripts to the two NodeManager spreaders that receive the task, and the two NodeManagers respectively start the MapTask, which partitions the data
ApplicationMaster waits for all MapTasks to run, applies for a container from ResourceManager, and runs ReduceTask
ReduceTask Obtains the data of the corresponding partition from the MapTask
After the program runs, the MR sends a request to ResourceManager to deregister himself

Third, Hive

3.1 install the hive – 3.1.2

Reference: Hive startup error: Java. Lang. NoSuchMethodError: com.google.com mon. Base. The Preconditions. CheckArgument

The curl https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz - o Gz mkdir /opt/hive tar -zxvf apache-hive-3.1.2-bin.tar.gz -c /opt/hiveCopy the code

Environment variables:

#.Export HIVE_HOME = / opt/hive/apache - hive - 3.1.2 - bin export PATH = $PATH: $HIVE_HOME/binCopy the code

Jar package conflict:

Rm $HIVE_HOME - rf/lib/guava - 19.0. Jar cp $HADOOP_HOME/share/hadoop/common/lib/guava - 27.0 - the jre. Jar $HIVE_HOME/libCopy the code

3.2 QuickStart

MySQL

Create hive metadata database by starting MySQL on host:

create database hive;
Copy the code

Add a driver to the Hive lib directory:

The curl https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.24/mysql-connector-java-8.0.24.jar - o $HIVE_HOME/lib/mysql connector - Java - 8.0.24. JarCopy the code

The configuration file

Reference:

Hive Use Hiveserver2 or Beeline to start impersonate impersonate User: root is not allowed to impersonate root

Apache Hadoop 3.2.2 – Proxy user-superusers Acting On Behalf Of Other Users

`core-site.xml`

vim $HADOOP_HOME/etc/hadoop/core-site.xml
Copy the code

<configuraiton>
    <! -... -->
    <property>
        <name>hadoop.proxyuser.root.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.root.groups</name>
        <value>*</value>
    </property>
</configuraiton>
Copy the code

`hive-site.xml`

vim $HIVE_HOME/conf/hive-site.xml
Copy the code


      

      
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.cj.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://win10:3306/hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>root</value>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node101:9083</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node101</value>
    </property>
</configuration>
Copy the code

Example Initialize the metadata database

schematool -initSchema -dbType mysql
Copy the code

Start the MetaStore

nohup hive --service metastore 1>/dev/null 2>&1 &
#Use the JPS andkill- the end of September
Copy the code

Hive Shell

create database test;
use test;
Copy the code

create table student (
    name      string,
    gender    tinyint,
    deskmates array<string>,
    score     struct<chinese:int, math:int, english:int>,
    refs      map<string, string>
) row format delimited fields terminated by '|' collection items terminated by ', ' map keys terminated by ':' lines terminated by '\n';
Copy the code

The insert statement:

insert into student values ('leader'.1.array('Vivian jade'.'fang'), named_struct('chinese'.90.'math'.90.'english'.90), map('height'.'165'.'weight'.'55'.'eyesight'.'0.2'));
Copy the code

File import:

vim ~/student.txt
Copy the code

Fang | 2 | hierarch | 120110120 | height: 165, weight: 50, eyesight: 1.0 Vivian jade | 2 | hierarch | 110120120 | height: 160, weight: 45, eyesight: 0.2Copy the code

load data local inpath '/root/student.txt' into table student;
Copy the code

select * from student;
Copy the code

dfs -ls /user/hive/warehouse;
Copy the code

JDBC access to the Hive

#Start the hiveserver2
nohup hive --service hiveserver2 1>/dev/null 2>&1 &
Copy the code

The WEB interface is http://node101:10002

Connect using the Beeline client:

beeline -u jdbc:hive2://node101:10000 -n root
Copy the code

DataGrip also supports Hive connections.

View logs:

tail -n 300 /tmp/root/hive.log
Copy the code

Four, HBase

4.1 installation hbase – 2.3.5

Curl https://mirrors.tuna.tsinghua.edu.cn/apache/hbase/2.3.5/hbase-2.3.5-bin.tar.gz - o hbase - 2.3.5 - bin. Tar. Gz mkdir -p /opt/hbase tar -zxvf hbase-2.3.5-bin.tar.gz -c /opt/hbaseCopy the code

Environment variables:

#.Export HBASE_HOME = / opt/hbase/hbase - 2.3.5 export PATH = $PATH: $HBASE_HOME/binCopy the code

4.2 HBase cluster

The configuration file

HDFS configuration file soft link

ln -s $HADOOP_HOME/etc/hadoop/core-ste.xml $HBASE_HOME/conf/core-site.xml
ln -s $HADOOP_HOME/etc/hadoop/hdfs-ste.xml $HBASE_HOME/conf/hdfs-site.xml
Copy the code

`hbase-env.sh`

#.
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
#.
export HBASE_MANAGES_ZK=false
#.
export HBASE_DISABLE_HADOOP_CLASSPATH_LOOKUP=true
#.
Copy the code

`regionservers`

node101
node102
node103
Copy the code

`backup-masters`

node102
node103
Copy the code

`hbase-site.xml`

<configuration>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://hdfs-cluster/hbase</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>win10:2181,win10:2182,win10:2183</value>
    </property>
</configuration>
Copy the code

Start the cluster

# node101
start-hbase.sh
Copy the code

The WEB interface is http://node101:16010 or http://node101:16030

HBase Shell

#Create a table
create 'student', 'info', 'score'

#List the table
list

#Viewing table structure
describe 'student'
Copy the code

#Insert data
put 'student', '1', 'info:name', 'icefery'
put 'student', '1', 'info:gender', '1'
put 'student', '1', 'score:math', '120'
put 'student', '2', 'info:name', 'fang'
put 'student', '2', 'info:gender', '2'
put 'student', '2', 'score:math', '110'
put 'student', '3', 'info:name', 'wenyu'
put 'student', '3', 'info:gender', '2'
put 'student', '3', 'score:math', '120'
Copy the code

#Scan for table data
scan 'student'

#Count table rows
count 'student'

#View specified rows
get 'student', '1'
get 'student', '1', 'info:name'
Copy the code

#Delete the specified row
delete 'student', '1', 'info:name'
deleteall 'student', '1'

#Disable the table
disable 'student'
#Delete table
drop 'student'
Copy the code

Hadoop-3.2.2 Hive-3.1.2 hbase-2.3.5 Installation

I. Virtual machine environment

1.1 Configuring a static IP address for campus NETWORK NAT

Install ubuntu 1.2 to 20.04

Static IP

hostname

Image source

Enable the root SSH

hosts

SSH Login without password

Install the open – jdk8

1.3 SSH and FTP Schemes

Windows Terminal

sftp

Nginx File server +curl

Docker

1.4 other

VIM

The time zone

Second, the Hadoop

2.1 install hadoop – 3.2.2

2.2 a Zookeeper cluster

2.3 a Hadoop cluster

The configuration file

hadoop-env.sh

workers

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

Log gathered

mapred-site.xml

yarn-site.xml

Synchronizing configuration Files

Initializing a cluster

Start the cluster

WordCount

2.4 HDFS

HDFS Architecture

NameNode

DataNode

SecondaryNameNode

Client

Common HDFS Commands

HDFS Java API

2.5 graphs

Graphs process

MapTask working mechanism

ReduceTask Working mechanism

WordCount

2.6 YARN

YARN Architecture

ResourceManager

NodeManager

ApplicationMaster

Container

YARN working mechanism

Third, Hive

3.1 install the hive – 3.1.2

3.2 QuickStart

MySQL

The configuration file

core-site.xml

hive-site.xml

Example Initialize the metadata database

Start the MetaStore

Hive Shell

JDBC access to the Hive

Four, HBase

4.1 installation hbase – 2.3.5

4.2 HBase cluster

The configuration file

HDFS configuration file soft link

hbase-env.sh

regionservers

backup-masters

hbase-site.xml

Start the cluster

HBase Shell

Related Posts

`sftp`

Nginx File server +`curl`

`hadoop-env.sh`

`workers`

`core-site.xml`

`hdfs-site.xml`

`mapred-site.xml`

`yarn-site.xml`

`mapred-site.xml`

`yarn-site.xml`

`core-site.xml`

`hive-site.xml`

`hbase-env.sh`

`regionservers`

`backup-masters`

`hbase-site.xml`