This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from the real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.

0 Initial operation and maintenance practice

1 / root @ Master the conf # netstat - an | grep 8050 tcp6 0 0 10.44.219.80:8050: : : * LISTEN/root @ Master the conf # netstat apn | Grep 8088 tcp6 0 0 :::8088 ::* LISTEN 2468/ Java 2 Replace localhost with qinkaixin Master=qinkaixin sed "S /localhost/"$Master"/g" aaa. XML > bbb. XML Pod: kubectl exec web-67c6b4476c-hds7q it /bin/bash 2: kubectl version 3: kubectl get nodes 4: kubectl get nodes Will be a mirror to run kubectl run sonarqube - image = 192.168.32.131:5000 / sonarqube: 5.6.5 - replicas = 1 - port = 9000, 5: Kubectl get Deployment list kubectl get Deployment list kubectl get Pods -o wide 7: Kubectl logs sonarqube-7c45b4d4bb-b77q6-f 9 kubectl logs sonarqube-7c45b4d4bb-b77q6-f 9 kubectl logs sonarqube-7c45b4d4bb-b77q6-f 9 kubectl logs sonarqube-7c45b4d4bb-b77q6-f 9 kubectl logs sonarqube-7c45b4d4bb-b77q6-f 9 Kubectl exec web-67c6b4476c-hds7q hostname 10: kubectl exec web-67c6b4476c-hds7q hostname 10 Kubectl exec-it web-67c6b4476c-hds7q /bin/bash 11: Need to see all the information in the overall kubectl cluster - info Kubernetes master is running at https://rancher.k8s.cn/k8s/clusters/c-cc2mt KubeDNS is running at https://rancher.k8s.cn/k8s/clusters/c-cc2mt/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy 12: the service shall be carried out in accordance with the name filter kubectl get service | grep nginx 13: kubectl apply to use the files or standard input to change the configuration information - > modify kubectl apply - f Yaml 14: Kubectl scale implementation deployment nginx/nginx.yaml 14: Kubectl scale implementation deployment Expansion replications kubectl scale - current - replicas = 3 -- replicas = 6 deployment/nginxdeploymentCopy the code

1 Kylin configuration (start StandAlone mode)

1 kylin.properties

Kylin. Env. Hadoop - conf - dir = / usr/local/soft/install/hadoop - 2.7.6 / etc/hadoop/kylin server query - metrics2 - enabled = true kylin.metrics.reporter-query-enabled=true kylin.metrics.reporter-job-enabled=true kylin.metrics.monitor-enabled=true kylin.web.dashboard-enabled=true ##kylin.engine.spark-conf.spark.master=yarn kylin.engine.spark-conf.spark.master=spark://Master:7077 kylin.engine.spark-conf.spark.dynamicAllocation.enabled=false kylin.engine.spark-conf.spark.driver.memory=2G kylin.engine.spark-conf.spark.executor.memory=6G kylin.engine.spark-conf.spark.executor.cores=6 kylin.engine.spark-conf.spark.network.timeout=600 kylin.engine.spark-conf.spark.shuffle.service.enabled=false kylin.engine.spark.rdd-partition-cut-mb=10 kylin.engine.spark-conf.spark.yarn.archive=hdfs://Master:9000/kylin/spark/spark-libs.jarCopy the code

2: Establish samples

Sh Create an experiment sample. / kilin. sh start StartCopy the code

3. Kylin intermediate data cleaning:

./kylin.sh org.apache.kylin.storage.hbase.util.StorageCleanupJob --delete true
Copy the code

This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from the real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. Looking forward to joining the most combative team in the IOT era. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.

2 the spark configuration

1 spark-defaults.conf

spark.eventLog.enabled true spark.eventLog.dir hdfs://Master:9000/spark-events spark.eventLog.compress true Spark. Yarn. Jars = HDFS: / / Master: 9000 / sparkJars/jars / * 2 spark - env. Sh export JAVA_HOME = / usr/local/soft/install/jdk1.8.0 _171  export SPARK_MASTER_IP=Master export SPARK_WORKER_CORES=8 export SPARK_WORKER_MEMORY=7g export SPARK_EXECUTOR_MEMORY=6g  export SPARK_MASTER_PORT=7077 export SPARK_EXECUTOR_INSTANCES=1 export SPARK_WORKER_INSTANCES=1 export SPARK_MASTER_WEBUI_PORT = 8080 export HADOOP_CONF_DIR = / usr/local/soft/install/hadoop - 2.7.6 / etc/hadoop export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=50 -Dspark.history.fs.logDirectory=hdfs://Master:9000/spark-events"Copy the code

3 Soft link hive-site.xml

4 Copy the Mysql JAR package to jars

5 Save the Spark JAR package in the specified directory for Kylin to use

jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ .
hadoop fs -mkdir -p /kylin/spark/
hadoop fs -put spark-libs.jar /kylin/spark/
Copy the code

5: Copy the Spark JAR package to /sparkJars/jars/. Yarn needs to use the jar

hadoop fs -mkdir -p /sparkJars/jars/
hadoop fs -put jars/* /sparkJars/jars/
Copy the code

6 Spark join

Hive dimension table create External table customer(ID string, base_create_time string, base_last_update_time string, customer_code string, customized_domain string, b_full_name string, b_title string, b_two_domain string, user_id string, user_name string, company_id string )partitioned by (env String) row format delimited fields terminated by '|'; spark.sql("use accesslog") spark.sql("load data inpath '/DB_warehouse/customer/' overwrite into table customer partition  (env='dev')") val left =spark.sql("select host, count(*) as total from accesslog a group by host order by total desc limit 5") case class LeftcaseFrame(host:String,total:Long) val leftDs=left.as[LeftcaseFrame] LeftcaseFrame(a.host.substring(1), a.total)) val joinright=spark.sql("select distinct * from elev") joinleft.join(joinright,joinleft("host")===joinright("sub_domain"),"outer").show Joinleft. Join (joinRight, joinLeft ("host")=== joinRight ("sub_domain"),"left_outer"). Show Mysql aggregation analysis: Val JDBC = spark. Read. The format (" JDBC "). The option (" url ", "JDBC: mysql: / / 192.168.1.160:3306 / test"). The option (" dbtable ", "tb_customer").option("user", "root").option("password", "123").load()Copy the code

7: indicates the start of Spark

spark-shell --master spark://bd-master:7077 --total-executor-cores 40 --executor-memory 4096m --executor-cores 4
spark-shell --master yarn --executor-memory 4096m --num-executors 10 --executor-cores 4
Copy the code

8 Spark-Submit Standalone Cluster Mode Client mode:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://master:7077 --executor-memory 512m - num - executors 3. / examples/jars/spark - examples_2. 11-2.0.0. Jar, 100Copy the code

9. Local single-machine mode:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[1] . / examples/jars/spark - examples_2. 11-2.0.0. Jar 10Copy the code

10: Package the submission application

1: class 2: running mode 3: JAR package location 4: input parameter 1 5: Input parameter 2 6: running memoryCopy the code

10: Spark-submit debugging, here run the spark built-in Streaming example, the submission method is as follows:

./bin/spark-submit --class  org.apache.spark.examples.streaming.NetworkWordCount  --master spark://master:7077 --executor-memory 512m --num-executors 3  ./examples/jars/spark-examples_2.11-2.0.0.jar localhost 9999
Copy the code

3 hive

Home: / usr/local/soft/cdh_support/hive – 1.1.0 – cdh5.9.3 / conf

Hive – site. 1 configuration XML

<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://Master:3306/metastore? createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>Copy the code

2: defines the log path

mkdir -p /log_all/hive
chown admin:admin /log_all/hive
Copy the code

3: initializes the database

Hive schema in mysql schematool -dbType mysql-initschemaCopy the code

4: Copy the mysql driver

Copy the mysql driver to hive/libCopy the code

5 to start

Home /usr/local/soft/cdh_support/hbase-1.2.0-cdh5.9.3 hive --service metastore --presto $HIVE_HOME/bin/ hiveserver2& is required for startup -- Zeeplin startup requiredCopy the code

6: Configure transactions

<property>
    <name>hive.support.concurrency</name>
    <value>true</value>
</property>
<property>
    <name>hive.exec.dynamic.partition.mode</name>
    <value>nonstrict</value>
</property>
<property>
    <name>hive.txn.manager</name>
    <value>org.apache.hadoop.hive.ql.lockmgr.DbTxnManager</value>
</property>
<property>
    <name>hive.compactor.initiator.on</name>
    <value>true</value>
</property>
<property>
    <name>hive.compactor.worker.threads</name>
    <value>1</value>
</property>
Copy the code

7: Set up transaction tests

use test;
create table t1(id int, name string) 
clustered by (id) into 8 buckets 
stored as orc TBLPROPERTIES ('transactional'='true');

insert into t1 values (1,'aaa');
insert into t1 values (2,'bbb');
update t1 set name='ccc' where id=1;
delete from t1 where id=2; 
Copy the code

8 hive-env.sh

Export JAVA_HOME = / usr/local/soft/install/jdk1.8.0 _171 export HADOOP_HOME = / usr/local/soft/cdh_support/hadoop - server - cdh5.9.3 export HIVE_CONF_DIR = / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 / confCopy the code

9 Process case test

    #local to hive
hive -e "USE bd_device_health; LOAD DATA LOCAL INPATH '/root/opendir/cdDataTemp/"HtData`date +%Y%m%d`"/cdonline' OVERWRITE INTO TABLE cdonline_tds;" >>/root/opendir/test-data/LocalToHive.log
Copy the code

4 hbase

1: Deploy ZooKeeper and set the path

mkdir -p /zookeeper/dataDir
chown admin:admin /zookeeper/dataDir
Copy the code

2. Configure hbase-site.xml.

    <property>
		<name>hbase.rootdir</name>
		<value>hdfs://Master:9000/hbase</value>
    </property>
    <property>
		<name>hbase.cluster.distributed</name>
		<value>true</value>
    </property>
    <property>
		<name>hbase.master</name>
		<value>60000</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
    <property>
		<name>hbase.zookeeper.quorum</name>
		<value>Master,Worker1,Worker2</value>
    </property>
    <property>
		<name>hbase.zookeeper.property.dataDir</name>
		<value>/zookeeper/dataDir</value>
    </property>
    <property>
		 <name>hbase.rpc.controllerfactory.class</name>
		 <value>org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory</value>
     </property>
Copy the code

3 Establish a soft connection between core-site. XML and hdFS-site. XML

Ln -s /usr/local/sof/cdh_support/hadoop-2.6.0-cdh5.9.3 /etc/hadoop/core-site. XML ln -s / usr/local/soft/cdh_support/hadoop - server - cdh5.9.3 / etc/hadoop/HDFS - site. XMLCopy the code

4: transfer

SCP - r hadoop - server - cdh5.9.3 Worker2: / usr/local/soft/cdh_support/hbase - 1.2.0 - cdh5.9.3Copy the code

5 for

http://master:60010/master-status (old version) http://master:16010/master-status (new version)Copy the code

6 configuration hbase – env. Sh

7 configuration regionservers

Master
Worker1
Worker2
Copy the code

8 Start the Hbase service

Sh start zookeeper $bin/hbase-daemon.sh start master $bin/hbase-daemon.sh start regionServer Or: zkserver. sh start regionServer $bin/start-hbase.sh Corresponding stop command: $bin/stop-hbase.sh./hbase-daemon.sh start thriftCopy the code

5 hadoop

1 home

/ usr/local/soft/cdh_support/hadoop - server - cdh5.9.3 CD/usr/local/soft/cdh_supportCopy the code

2 Log location:

mkdir -p /log_all/hadoop
Copy the code

3 metaData location:

mkdir -p /hadoop/datanode mkdir -p /hadoop/namenode chown admin:admin /log_all/hadoop chown admin:admin /hadoop/datanode  chown admin:admin /hadoop/namenodeCopy the code

4 core-site.xml:

    	<property>
    		<name>hadoop.tmp.dir</name>
    		<value>/log_all/hadoop</value>
    	</property>
    	<property>
    		<name>fs.defaultFS</name>
    		<value>hdfs://Master:9000</value>
    	</property>	
Copy the code

5 HDFS – site. XML:

<! Replication </name> <value>3</value> </property> <property> <name>dfs.datanode.max.xcievers</name> <value>8192</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/hadoop/datanode</value> </property> <property> <name>dfs.namenode.datanode.registration.ip-hostname-check</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/hadoop/namenode</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property>Copy the code

6 yarn – site. XML:

<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.shuffleHandler</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>Master:8025</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master:8030</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>Master:8050</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>40960</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>4096</value> </property> <property> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>10</value> </property> <property> <name> yarn.nodeManager. vmem-pmem-ratio</name> <value>2.1</value> </property> <property> &emsp; &emsp; <name>hadoop.proxyuser.admin.hosts</name> &emsp; &emsp; <value>*</value> </property> <property> &emsp; &emsp; <name>hadoop.proxyuser.admin.groups</name> &emsp; &emsp; < value > * < value > / < / property > SCP - r hadoop - server - cdh5.9.3 Worker2: / usr/local/soft/cdh_support SCP - r Hadoop - server - cdh5.9.3 Worker1: / usr/local/soft/cdh_supportCopy the code

7 hadoop-env.sh:

8 mapred-site.xml

    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
     </property>
     <property>
            <name>mapred.job.tracker</name>
            <value>master:54311</value>
     </property>
Copy the code

9 start

Hadoop namenode-format Mr -jobhistory-daemon.sh start historyServer Hadoop Balancing and Security mode 8088 50070 Hadoop namenode-format hadoop dfsadmin -safemode leave hdfs dfsadmin -refreshNodes start-balancer.shCopy the code

6 telnet

Telnet installation: yum install Telnet - server. X86_64 - y RPM - qa | grep Telnet yum - y install Telnet RPM - qa | grep TelnetCopy the code

This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from the real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. Looking forward to joining the most combative team in the IOT era. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.

7 Flume

1: Socket simulation experiment:

a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

$ bin/flume-ng agent --conf conf/ --name a1 --conf-file conf/flume-telnet.conf 
-Dflume.root.logger==INFO,console

telnet localhost 44444
Copy the code

2: Logs are dynamically uploaded to Hdfs to listen:

Copy hadoop-related JARS to Flume lib

Share/hadoop/common/lib/hadoop - auth - 2.5.0 - cdh5.3.6. Jar share/hadoop/common/lib/Commons configuration - 1.6. The jar Share/hadoop/mapreduce1 / lib/hadoop - HDFS - 2.5.0 - cdh5.3.6. Jar share/hadoop/common/hadoop - common - 2.5.0 - cdh5.3.6. JarCopy the code

Create the flume-hdfs.conf file

a2.sources = r2 a2.sinks = k2 a2.channels = c2 a2.sources.r2.type = exec a2.sources.r2.command = tail -f /usr/local/soft/log_all/hive/hive.log a2.sources.r2.shell = /bin/bash -c a2.sinks.k2.type = hdfs a2.sinks.k2.hdfs.path = HDFS: / / Master: 9000 / flume / % % m % d/Y % H # upload file prefix a2. The sinks. K2. HDFS. FilePrefix = # events - the hive - whether in accordance with the time rolling folder A2. Sinks. K2. HDFS. Round = true # how many time units to create a new folder a2. Sinks. K2. HDFS. RoundValue = 1 # define time unit a2. Sinks. K2. HDFS. RoundUnit = Hour # whether to use local timestamp a2. Sinks. K2. HDFS. UseLocalTimeStamp = true # accumulate many Event only flush to the HDFS once a2. Sinks. K2. HDFS. BatchSize = 100 Set the file type, Can support compression a2. Sinks. K2. HDFS. The fileType = DataStream #. How long does it take to generate a new file a2 sinks. K2. HDFS. RollInterval = 30 # set each file size of the scroll A2. Sinks. K2. HDFS. RollSize = 134217700 # file rolling has nothing to do with the number of Event a2. Sinks. K2. HDFS. RollCount = 0 # minimum redundancy a2.sinks.k2.hdfs.minBlockReplicas = 1 a2.channels.c2.type = memory a2.channels.c2.capacity = 1000 A2. Channels. C2. TransactionCapacity = 1000 a2. Sources. R2. Channels = c2 a2. Sinks. K2. The channel = c2 monitoring configuration $bin/flume - ng agent --conf conf/ --name a2 --conf-file conf/flume-hdfs.confCopy the code

3.1 AvRO Server Configuration (Flume-tailsource-Avro-server) :

a1.sources = r1 a1.sinks = k1 a1.channels = c1 #source a1.sources.r1.type = avro a1.sources.r1.bind= Master a1.sources.r1.port= 55555 #sink a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink a1.sinks.k1.kafka.topic = TestTopic a1.sinks.k1.kafka.bootstrap.servers = Master:9092,Worker1:9092,Worker2:9092 a1.sinks.k1.kafka.flumeBatchSize =  20 a1.sinks.k1.kafka.producer.acks = 1 a1.sinks.k1.kafka.producer.linger.ms = 1 #channel a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 #bind a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1Copy the code

3.2 AvRO client configuration (Flume-tailsource-Avro-client) :

a2.sources = r2 a2.sinks = k2 a2.channels = c2 a2.sources.r2.type = exec a2.sources.r2.command = tail -f /usr/local/openresty/nginx/logs/access.log a2.sources.r2.shell = /bin/bash -c a2.sinks.k2.type = avro a2.sinks.k2.hostname=Master a2.sinks.k2.port=55555 a2.channels.c2.type = memory a2.channels.c2.capacity = 1000 A2. Channels. C2. TransactionCapacity = 1000 a2. Sources. R2. Channels = c2 a2. Sinks. K2. The channel # = c2 sink configuration a1.sinks.k1.type=logger a1.sinks.k1.channel=c1Copy the code

3.3 Avro Client Configuration (Flume-Avro-client) :

a2.sources = r2
a2.sinks = k2
a2.channels = c2

#source
a2.sources.r2.type = TAILDIR
a2.sources.r2.positionFile = /usr/local/soft/log_all/WAF_log/taildir_position.json
a2.sources.r2.filegroups=f1
a2.sources.r2.filegroups.f1=/usr/local/openresty/nginx/logs/access.log
a2.sources.r2.fileHeader=true

#sinks
a2.sinks.k2.type = avro
a2.sinks.k2.hostname=Master
a2.sinks.k2.port=55555

#channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 1000

#bind
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2
Copy the code

3.4 test

Simulated client: Bin /flume-ng avro-client -c conf -h Master -P 55555 -f /usr/local/apache-flume-1.6.0-cdh5.5.4-bin/testdata/testdata Real client startup: Bin /flume-ng agent --conf conf --name a2 --conf-file conf/flume-avro.conf -dflume.root. logger=INFO,console real server start:  bin/flume-ng agent -c conf -f conf/flume-avro-hdfs.conf -n a2 -Dflume.root.logger=INFO,consoleCopy the code

8 sqoop

sqoop-version

Import user information to warehouse data:

Bin/sqoop import - connect JDBC: mysql: / / 192.168.1.160:3306 / test - the username root - 123456 - table tb_customer password --m 2Copy the code

Import user information to a specified directory:

Bin/sqoop import - connect JDBC: mysql: / / 192.168.1.160:3306 / test - the username root - password 123456 - target - dir /DB_warehouse/elev --table tb_customer --m 2Copy the code

Import elevator user information to the specified directory according to the conditions:

Bin/sqoop import - connect JDBC: mysql: / / 192.168.1.160:3306 / test - the username root - password inovance321 - where "back_color! ='null'" --target-dir /DB_warehouse/elev/test --table tb_customer --m 1Copy the code

Query Import elevator user information to the specified directory:

Bin/sqoop import - connect JDBC: mysql: / / 192.168.1.160:3306 / test - the username root - password inovance321 -- target - dir /DB_warehouse/elev/test9 --query 'select id, account from tb_customer WHERE address ! =" " and $CONDITIONS' --split-by id --fields-terminated-by '\t' --m 1Copy the code

From the above, we know that $CONTITONS is a Linux variable that is assigned (1=0) during execution, even though the actual SQL execution is strange. It’s time to find out exactly what CONTITONS are, so let’s start with the official documentation.

If you want to import the results of a query in parallel, then each map task will need to execute a copy 
of the query, with results partitioned by bounding conditions inferred by Sqoop. Your query must include 
the token $CONDITIONS which each Sqoop process will replace with a unique condition expression. 
You must also select a splitting column with --split-by.
Copy the code

If you want to import the results in parallel, each Map Task will need to execute a copy of the SQL query, and the results will be partitioned according to the boundary conditions assumed by SQOOP. Query must contain $CONDITIONS. Each Scoop program is then replaced with a separate condition. You must also specify –split-by. Partition

For example:
$ sqoop import \
  --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
  --split-by a.id --target-dir /user/foo/joinresults
Copy the code

9 Zookeeper Construction:

1: In the root directory of ZooKeeper, configure zoo. CFG

InitLimit The ZooKeeper cluster mode contains multiple ZooKeeper processes, one of which is the leader process and the rest are followers. When a follower is initially connected to the leader, a considerable amount of data is transmitted between them, especially since the follower's data lags far behind that of the leader. InitLimit Configures the maximum time for synchronization between the follower and the leader after a connection is established. SyncLimit sets the maximum duration for sending messages, requests, and replies between followers and the leader. TickTime tickTime is the basic unit of the preceding two timeout configurations. For example, if the value of initLimit is 5, the timeout duration is 2000ms x 5 = 10 seconds. Server. id=host:port1:port2 Where ID is a number, indicating the ID of the ZK process, which is also the content of the myID file in the dataDir directory. Host is the IP address of the ZK process, port1 is the port used by the followers to exchange messages with the leader, and port2 is the port used by the leader to elect the leader. The dataDir configuration has a similar meaning to that in single-machine mode, except that in cluster mode there is also a myID file. The myID file contains only one line and can contain only a number between 1 and 255. This number is the ID of the ZK process described above in server.id.Copy the code

Time interval for sending heartbeat, unit: DataDir =/modules/zookeeper-3.4.5-cdh5.11.1/data # Log directory DataLogDir =/modules/zookeeper-3.4.5- cDH5.11.1 /dataLog # port clientPort=2181 # The maximum number of heartbeat intervals that the leader and follower can tolerate when initializing the connection InitLimit =5 # send messages between the leader and follower. The maximum number of ticktimes syncLimit=2 # List of ZooKeeper machines, Server. order order is incremented according to the number of machines in the cluster. Server1, server2, and server3 indicate machine IP addresses server.1= Server1:2888:3888 server.2= Server2:2888:3888 server.3= Server3:2888:3888Copy the code

2: In each dataLogDir directory, create myids in the correct order

Then modify the numbers in the myID file under the data directory, in this case, change the myID content of server2 to 2, change the myID content of server3 to 3. For different clusters, change as needed to be consistent with the order in the configuration file. /zkServer.sh startCopy the code


10 Kafka structures,

Kafka nohup bin/kafka-server-start.sh config/server.properties & 2: kafka nohup bin/kafka-server-start.sh config/server.properties & 2:  bin/kafka-topics.sh --zookeeper master:2181,data1:2181,data2:2181 --topic TestTopic4 --replication-factor 1 -- Partitions 1 --create 3: Create producers bin/kafka - the console - producer. Sh - broker - list master: 9092, data1:9092, data2:9092 - topic TestTopic4 4: Create consumer bin/kafka - the console - consumer. Sh - they are master: 2181, data1:2181, data2:2181 - topic AdRealTimeLog - from - beginningCopy the code

11 MySQL structures,

Installing MySQL on Ubuntu is very easy with just a few commands.

1. sudo apt-get install mysql-server 2. apt-get install mysql-client 3. sudo apt-get install libmysqlclient-dev 4. The installation process you will be prompted to set the password of what, pay attention to set up and don't forget, after the installation is complete can use the following command to check whether the installation is successful: sudo netstat - tap | grep mysql 5. If a mysql socket is in listen state, the installation is successful. To log in to the mysql database, run the following command: mysql -u root -p 6 Add users that allow remote access or allow remote access for existing users, granting root full access to any database on any host (%). Mysql > grant all PRIVILEGES on *.* to 'root'@'%' identified by 'root' with grant option; Update user set host='%' where user='root' and host='localhost' update user set host='%' where user='root' and host='localhost'; Sudo service mysql restart 10: delete from user where user='';Copy the code

12 Phoenix deployment

1: Set environment variables.2: Go to the phoenix installation directory and find Phoenix-4.8.1-hbase-1.2-server. jar. Copy the JAR file to the hbase lib directory of each node (including the active node) in the cluster. 3: Restart hbase. 4: Start Phoenix.  sqlline.py Master,Worker1,Worker2:2181Copy the code

6: Create a table in Hbase shell: create 'test1','cf1' put 'test1','rk0001','cf1:NAME',' Zhang '7: Enter Phoenix shell:  create view "test1"(user_id varchar primary key,"cf1".NAME varchar); ! describe "test1" select * from "test1" 8: Configure the Squirrel client. (1) Download the Phoenix-4.11.0-hbase-1.2-client. jar file from the Phoenix installation directory on the server to the lib folder of the Squirrel - SQL installation directory on Windows. And click squirrel-sql.bat to start. (2) Add a New Driver (Drivers -> New Driver) as shown in the following figure:Copy the code



Query data:  [html] view plain copy select * from person create 'techer11','cf1','cf2' put 'techer11','rk0001','cf1:NAME','zhang' Put 'techer11','rk0001','cf1:age',12 put 'techer11','rk0001','cf2:num',16 put 'techer11','rk0001','cf2:sex',' male 'put 'techer11','rk0002','cf1:NAME','qinkaixin' put 'techer11','rk0002','cf1:age',12 put 'techer11','rk0002','cf2:num',16 put Create view "techer11"(user_id varchar primary key, "cf1".NAME varchar, "cf1"."age" varchar, "cf2"."num" varchar, "cf2"."sex" varchar); select * from "techer"Copy the code

13 zeppelin deployment

1. Deployment download address: I use http://zeppelin.apache.org/download.html version is Zeppelin 0.7.3 Hbase 1.4.4 Phoenix 4.14.0 Hbase - 1.4 Hive 2.2.0 Presto Zip zeppelin-0.7.3-bin-all.gz to Linux Change the zeppelin-site.xml.template and zeppelin-en.sh.template in the conf directory to zeppelin-site. XML and zeppelin-en.sh, respectively. Zeppelin.server. port defaults to 8080, which is the zeppelin server port. Note: Make sure you are not using the same port as the Zeppelin Web application development port. The port number in zeppelin-site. XML needs to be changed. I configured 8085. Zeppelin. Server.ssl. port The default is 8443, the Zeppelin Server SSL port (used when SSL environment/properties are set to true). Next, configure the zeppelin-en.sh file in the conf directory. To configure JAVA_HOME (mandatory, Zeppelin was developed in Java), configure the corresponding file path for everything else.Copy the code

Start zeppelin by executing the following statement in the bin directory. You can also stop and restart zeppelin. $ zeppelin-daemon.sh start $ zeppelin-daemon.sh stop $ zeppelin-daemon.sh restart The conf directory of zeppelin - site. The zeppelin in XML. Anonymous. Allowed default is true, by default, allow anonymous (anonymous) user. Use the account password if false, instead, will/zeppelin/conf/shiro ini. The template is modified to/zeppelin/conf/shiro ini, and then to modify its content, as follows:Copy the code

Configure the account name and password in users. I have configured two accounts, one is the account admin, password admin, and the other is the account qinkaixin, password 123456. Role1,2, and 3 indicate the roles they play, and their permissions can be set on the WebUI. The /**=authc in the last line of the figure shows that authentication is required for any URL access.Copy the code

Enter localhost: 8085 in the address box of the browser. The following page is displayed. You can specify the Owners, Readers, and Writers of the notebook, so that each notebook belongs to a specific user. This avoids the possible conflicts caused by multiple users using zeppelin at the same time. I chose to make Role1 read-only.Copy the code

1 Hbase Zeppelin The Hbase interpreter is configured by default, as shown in the following figure. If you have configured the HBASE environment variables, HBASE Interpreter will read your HBASE path by default. Note that to view the version of the JAR package in the Interpreter /hbase directory, I use hbase version 1.4.4, which is 1.0 by default, So can the Zeppelin/interpreter/hbase directory except the Zeppelin - hbase - 0.7.3. Delete all the other jar jar package package, Then cluster hbase installation path under all the jars in the lib copy to Zeppelin/interpreter/hbase.Copy the code

Then create the notebook to start the hbase service. If you select hbase from the default interpreter, you can write %hbase without specifying the interpreter. Mysql Zeppelin does not have an interpreter for mysql. In Interpreters click Create, enter your Interpreters name, and select its group as JDBC.Copy the code

Name Value default.driver com.mysql.jdbc.Driver default.url jdbc:mysql://localhost:3306/ default.user mysql_user Default. password mysql_password Add the following Dependencies: Artifact Excludes mysql:mysql-connector-java:5.1.46Copy the code

3 Hive Hive, like mysql, needs to configure its own interpreter and uses the JDBC group. The configuration is as follows:  Name Value default.driver org.apache.hive.jdbc.HiveDriver default.url jdbc:hive2://localhost:10000 default.user Hive_user default.password Hive_password Added dependency: Artifact Excludes org.apache.hive:hive-jdbc:1.2.1 org.apache.hadoop:hadoop-common:2.7.6 My Hive is from Version 2.2.0, You can also run JDBC with a lower version (too high a version that Zeppelin may not support). 4 presto Select the JDBC group and configure it as follows: I presto Value the default port is 9001 Name. The driver com. Facebook. Presto. JDBC. PrestoDriver default. The url JDBC: presto: / / localhost: 9001 / hive/default. The default user user. The default password "password," added depends on: An Artifact Excludes com. Facebook. Presto: presto - JDBC: 0.170 org. Apache. Hive: hive - JDBC: 1.2.1 Org.apache. hadoop:hadoop-common:2.7.6 My presto is 0.206, the test using 0.206 JDBC can not run, and 0.170 can run perfectly. 5 Phoenix Select a JDBC group. Configuration as follows the Name Value. The default driver org. Apache. Phoenix. JDBC. PhoenixDriver default. The url of JDBC: phoenix: localhost: 2181 Default. The user user. The default password "password," added depends on: an Artifact Excludes org. Apache. Phoenix: phoenix - core: 4.14.0 HBase - 1.4Copy the code

14 SupetSet deployment

Install python3 in CentOS: 1. Download python3&emsp. &emsp; https://www.python.org/downloads/ wget https://www.python.org/ftp/python/3.6.1/Python-3.6.1.tgz 2. Install python3 & emsp; &emsp; My personal preference is /usr/local/python3 &emsp; &emsp; Create a directory: mkdir -p /usr/local/python3&emsp; &emsp; Decompress the python-3.x.x.t gz package (the package name varies depending on the Python version you have downloaded, for example, Python3.6.1. Then I & emsp; Python-3.6.1.tgz) tar -zxvf python-3.6.1.tgz 3 Go to the decompressed directory and compile and install the software. CD python-3.6.1./configure --prefix=/usr/local/python3 make && make install 4. Establish python3 soft chain ln -s/usr/local/python3 / bin/python3 / usr/bin/python3 5. Add /usr/local/python3. bin to PATH vim ~/.bash_profile. bash_profile Get the aliases and functions if [-f ~/.bashrc]; then . ~/.bashrc fi User specific environment and startup programs PATH=$PATH:$HOME/bin:/usr/local/python3/bin export Source ~/.bash_profile 6 Check whether Python3 and PIp3 are available properly: Python3 -v Python 3.6.1 pip3 -v PIP 9.0.1 from/usr/local/python3 / lib/python3.6 / site - packages (Python 3.6) 7. If not in creating the soft links of pip3 ln -s/usr/local/python3 / bin/pip3 / usr/bin/pip3 behind all the execution of the need to use alternative pip3 PIP! Pip3 install --upgrade setuptools PIP pip3 install --upgrade setuptools PIP pip3 install --upgrade setuptools PIP fabmanager create-admin --app superset Initialize the database superset db upgrade Load some data to play with superset load_examples Create default roles and permissions superset init To start a development web server on port 1000, Use -p to bind to another port superset runserver -d To modify default configuration information (such as ICONS and default languages), To/usr/local/python3 / lib/python3.6 / site - packages/superset of the config directory. Py modify the following content. ------------------------------ GLOBALS FOR APP Builder ------------------------------ Uncomment to setup Your App name APP_NAME = '* * * * BI system Uncomment to setup an App icon APP_ICON ='/static/assets/images/inovance. PNG 'in the following directory to add Inovance. PNG this logo image/usr/local/python3 / lib/python3.6 / site - packages/superset/static/assets/images / 8 RMB to switch to mysql database Superset uses the SQLite lightweight database by default, which is a performance bottleneck and needs to be replaced with mysql database. Mysql > create database schema: superset_db (utf8);  CREATE DATABASE superset_db CHARACTER SET utf8 COLLATE utf8_general_ci; 2. The server superset also need to install mysql related rely on yum install mysql - community - client yum install mysql - rely on pip3 devel install Python connection mysql install mysqlclient 3. Modify the configuration file ${superset}/config.py to comment out the previous one. The SQLAlchemy connection string. SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db') SQLALCHEMY_DATABASE_URI = 'mysql://myapp@localhost/myapp' SQLALCHEMY_DATABASE_URI='mysql://root:password@localhost /superset_ db? This article does not cover how to install and use Redis. Pip3 install redis in the superset environment and then modify config.py. CACHE_DEFAULT_TIMEOUT = 900 # CACHE_CONFIG = {'CACHE_TYPE': 'redis', 'CACHE_REDIS_HOST': 'localhost', # CACHE_REDIS_PORT': 6379, 'CACHE_REDIS_URL': 'redis:// localhost:6379' # configure URL} TABLE_NAMES_CACHE_CONFIG = {'CACHE_TYPE': 'redis'} 5 Open the config.py file, search for ROW keyword, change all configuration with ROW to four digits, that is, order of magnitude in thousands. Too much data can cause a superset to stall. ROW_LIMIT =5000 VIZ_ROW_LIMIT = 1000 10 Initialize the superset again after the preceding operations are complete. Fabmanager create-admin --app superset Initialize The database superset DB upgrade Load some data to play with  superset load_examples Create default roles and permissions superset init To start a development web server on port In 1000, Use the -p to bind to another port superset runserver then executes d 11 container to RUN \ cp - rf/usr/share/zoneinfo/Asia/Shanghai/etc/localtime && \ echo 'Asia/Shanghai' >/etc/timezone RUN mkdir -p /usr/local/python3 #PYTHON copy ADD python-3.6.1.tgz /usr/local/python3 WORKDIR /root Compile RUN installation CD/usr/local/python3 / Python - 3.6.1 && \. / configure -- prefix = / usr/local/python3 && \ make && make install # establish python3 soft chain RUN rm - rf/usr/bin/python3 && \ rm - rf/usr/bin/lsb_release && \ ln -s/usr/local/python3 / bin/python3 The/usr/bin/ENV python3 # environment variable LANG = 'C.U TF - 8' LANGUAGE = 'zh_CN. Useful' LC_ALL = 'C.U TF - 8' ENV PATH = $PATH: / usr/local/python3 / bin RUN pip3 install --upgrade setuptools PIP mysqlClient && \ pip3 install superset && \ chmod +x /superset-entrypoint.sh #SUPERSET copy ADD superset.tar.gz /usr/local/ #ENTRYPOINT ENTRYPOINT ["/superset-entrypoint.sh"] # EXPOSE 1000 #! Copy/bin/bash # cp - f/usr/local/superset/config. Py/usr/local/python3 / lib/python3.6 / site - packages/superset / # superset background Superset runserver -d & # tail -f /dev/null FROM Ubuntu :14.04 # SSH config RUN apt-get update && apt-get install -y openssh-server wget vim GCC make build-essential libssl-dev libffi-dev python3.5-dev python-pip libsasl2-dev libldap2-dev libsqlite3-dev libmysqlclient-dev && \ ssh-keygen -t rsa -f ~/.ssh/id_rsa -P '' && \ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys && \ apt-get cleanCopy the code

15 Environment Variables

# JAVA_HOME export JAVA_HOME = / usr/local/soft/install/jdk1.8.0 _171 export JRE_HOME = ${JAVA_HOME} / jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar Export PATH=$PATH:$JAVA_HOME/bin #HADOOP_HOME export HADOOP_HOME=/usr/local/soft/cdh_support/ hadoop-2.6.0-cDH5.9.3 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin #SPARK_HOME SPARK_HOME = / usr/local/soft/install/spark - 2.3.0 - bin - hadoop2.7 export PATH = $PATH: $SPARK_HOME/bin: $SPARK_HOME/sbin # SCALA_HOME SCALA_HOME = / usr/local/soft/install/scala - 2.11.12 export PATH = $PATH: $SCALA_HOME/bin # ZOOKEEPER_HOME ZOOKEEPER_HOME = / usr/local/soft/install/zookeeper - 3.4.9 export PATH = $PATH: $ZOOKEEPER_HOME/bin # HIVE_HOME export HIVE_HOME = / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 export HIVE_CONF_DIR = / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 / conf export PATH = $PATH: $HIVE_HOME/bin export HADOOP_CLASSPATH=$PATH:$HADOOP_CLASSPATH:$HIVE_HOME/lib/* export Hive_dependency = / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 / conf: / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 / lib / * : / usr/local/soft/cdh_support/hive - 1.1.0 - cdh5.9.3 / hcatalog/share/hcatalog/hive - hcatalog - core - 2.2.0. Jar # HBASE_HOME Export HBASE_HOME = / usr/local/soft/cdh_support/hbase - 1.2.0 - cdh5.9.3 export HBASE_CONF_DIR=/usr/local/soft/cdh_support/ hbase-1.2.0-cDH5.9.3 /conf export PATH=$PATH:$HBASE_HOME/bin #KYLIN export KYLIN_HOME = / usr/local/soft/cdh_support/apache - kylin - 2.5.1 - bin - cdh57 export KYLIN_CONF = / usr/local/soft/cdh_support/apache - kylin - 2.5.1 - bin - cdh57 / conf export PATH = $PATH: $KYLIN_HOME/bin # PHOENIX Export PHOENIX_HOME = / usr/local/soft/install/apache - phoenix - 4.14.0 - HBase - 1.4 - bin export PHOENIX_CLASSPATH = $PHOENIX_HOME The export PATH = $PATH: $PHOENIX_HOME/bin # ZEPPELIN_HOME export ZEPPELIN_HOME = / usr/local/soft/install/zeppelin - 0.7.3 - bin - all export PATH=$PATH:$ZEPPELIN_HOME/binCopy the code

conclusion

Qin Kaixin in Shenzhen

This set of technical column is the author (Qin Kaixin) usually work summary and sublimation, through extracting cases from the real business environment to summarize and share, and give business application tuning suggestions and cluster environment capacity planning and other content, please continue to pay attention to this set of blog. Looking forward to joining the most combative team in the IOT era. QQ email address: [email protected], if there is any academic exchange, please feel free to contact.