0 background

Apache DolphinScheduler is a chinese-led task scheduling framework that supports Spark, Flink, Shell, and DataX.

Apache DolphinScheduler is a decentralized, easily extensible visual DAG workflow task scheduling system. It is committed to solving the complicated dependency relationship in the data processing process, so that the scheduling system can be used out of the box in the data processing process. – From Apache DolphinScheduler

Apache DolphinScheduler is reliable, easy to use, versatile, and scalable. It happened that the company was building a new big data cluster, so we decided to try it.

Clouderamanager is used for cluster management. Components such as Hadoop and Spark are deployed in CM. Flink builds CDH and integrates them into CM.

1 DolphinScheduler cluster planning

Here we chose Cluster deployment, with three nodes as recommended by official best practices. The node planning is as follows:

node role
node1 master/api
node2 worker/alert
node3 worker
  • Master: The MasterServer is responsible for tasks such as DAG segmentation, task submission, and monitoring, and monitoring the health status of other nodes.
  • Woker: WorkerServer is responsible for executing tasks and providing logging services.
  • Alert: The AlertServer provides alarm services, including email alarms.
  • API: Provides RESTful apis and USER Interface (UI) services.

2 Installation Preparations

2.1 the premise

  • Install PostgreSQL (8.2.15+) or MySQL (5.7 series)
  • Install the JDK (1.8+) and configure the environment variables
  • Installing ZooKeeper (3.4.6 +), DolphinScheduler is strongly dependent on zookepper, here we need to prepare in advance zookepper cluster: 2:2181, node3:2181, node4:2181

To upload resources, you need to install Hadoop (2.6+) or MinIO

  • Configure the hosts file and set hostname for each machine

2.2 download

Download the DolphinScheduler binary installation package and unpack it. Dolphinscheduler.apache.org/zh-cn/downl…

Mv apache - dolphinscheduler - incubating - 1.3.5 dolphinscheduler - bin dolphinscheduler - binCopy the code

3 Create a DolphinScheduler user

3.1 Creating a User

Create a new user on all the machines you want to deploy and give the user secret-free SUdo permission.

#Create a user as user root. The user name can be changed
useradd dolphinscheduler;

#Set dolphinscheduler123 as an example. Change the password
echo "dolphinscheduler123" | passwd --stdin dolphinscheduler
Copy the code

3.2 sudo from close

Set up sudo secret – free steps for users to search online

3.3 Setting Directory Permissions

Change the permissions for the previously unzipped directory

sudo chown -R dolphinscheduler:dolphinscheduler dolphinscheduler-bin
Copy the code

4 Initialize the database

4.1 Creating a DolphinScheduler library and adding user permissions

Execute the following SQL in mysql:

CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@The '%' IDENTIFIED BY 'dolphinscheduler123';
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'localhost' IDENTIFIED BY 'dolphinscheduler';
flush privileges;
Copy the code

The user and password can be replaced by themselves.

4.2 Importing the MySQL driver JAR Package

Copy mysql-connector-java-5.1.47.jar (or some other version) to the lib directory.

4.3 Modifying the Database Configuration File

vim conf/datasource.properties
Copy the code
#postgre #spring datasource. Driver-class-name = org.postgresql.driver # spring. The datasource. Url = JDBC: postgresql: / / localhost: 5432 / dolphinscheduler # # add mysql configure mysql Spring. The datasource. The driver - class - name = com. Mysql. JDBC. Driver spring. The datasource. Url = JDBC: mysql: / / XXX: 3306 / dolphinscheduler? useUnicode=true&characterEncoding=UTF-8&allowMultiQueries=true spring.datasource.username=dolphinscheduler spring.datasource.password=dolphinscheduler123Copy the code

4.4 Running the script for creating a table

sh script/create-dolphinscheduler.sh
Copy the code

5 Modify the environment variable configuration file

vim conf/env/dolphinscheduler_env.sh
Copy the code

The installation directory for the components are relevant to each CDH is modified to opt/cloudera/parcels/CDH/lib/address in the directory. Annotate components that are not needed.

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop #export SPARK_HOME1=/opt/soft/spark1 export SPARK_HOME2=/opt/cloudera/parcels/CDH/lib/spark #export PYTHON_HOME=/opt/soft/python export JAVA_HOME = / usr/Java/jdk1.8.0 _181 - cloudera export HIVE_HOME = / opt/cloudera/parcels/CDH/lib/hive export FLINK_HOME=/opt/cloudera/parcels/FLINK/lib/flink export DATAX_HOME=/export/datax/bin/datax.pyCopy the code

Add the JDK soft chain to /usr/bin/java

Sudo ln -s /usr/java/jdk1.8.0_181-cloudera/bin/java /usr/bin/javaCopy the code

6 Modify the deployment configuration file

vim conf/config/install_config.conf
Copy the code
Mysql or postgresql dbtype="mysql" dbhost="node1:3306" dbname="dolphin" Username =" Dolphinscheduler "# Database password. If there are special characters, use \ escape. Need to modify the {password} concrete values for the above password = "dolphinscheduler12" # Zookeeper address zkQuorum = "2:2181, node3:2181, node4:2181" Where do you want to install DS? / opt/soft/dolphinscheduler, different from the current directory installPath = "/ export/dolphinscheduler" # use which user deployment, Use the user deployUser=" Dolphinscheduler "# mail configuration created in Section 3 Take qq mailbox as an example # mailProtocol mailProtocol="SMTP" # mail service address mailServerHost="smtp.qq.com" # mail service port mailServerPort="25" # MailSender ="[email protected]" # mailUser="[email protected]" # mailPassword=" XXX "# StarttlsEnable ="true" # set SSL email to "true", otherwise "false" Note: SslEnable ="false" MailServerHost sslTrust="smtp.qq.com" HDFS,S3,NONE, single-node If you want to use a local file system, set it to HDFS because HDFS supports local file systems. Select NONE if resource upload is not required. One important point: Hadoop resourceStorageType="HDFS" Need to hadoop core configuration file - site. XML and HDFS - site. XML on the conf directory of the installation path and this example is in/opt/soft/dolphinscheduler/conf below, and configure the namenode cluster name; DefaultFS =" HDFS ://node1:8020" # If Yarn is not used, keep the following default values. If it is the ResourceManager HA configuration to the main ResourceManager node for the IP or hostname, such as "192.168 xx, xx, 192.168 xx. Xx"; YarnHaIps ="" yarnHaIps="" # If ResourceManager is HA or does not use Yarn, keep the default value. SingleYarnIp ="node1" # Resource upload root path, hosting HDFS and S3. HDFS supports local file systems. ResourceUploadPath ="/data/ Dolphinscheduler "# hdfsRootUser=" HDFS" # Which machines to deploy the DS service on, Localhost ips="node1,node2,node3" # SSH port ="22" # where worker service deployment of these machines, and specify the worker belongs to which a worker group, the following example as the default group name workers = "2: default, node3: default" # machine alarm service deployment in AlertServer ="node2" # On which machine the back-end API service is deployed apiServers="node1"Copy the code

If NameNode of HDFS is configured with high availability, copy core-site. XML and hdFs-site. XML to /conf. CDH installation configuration file directory: / opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop

You also need to modify the zookeeper.properties file

vim conf/zookeeper.properties
Copy the code

Example Add the zookepper cluster information

zookeeper.quorum=node2:2181,node3:2181,node4:2181
Copy the code

7 installation

Switch to the deployment user DolphinScheduler

Execute the one-click deployment script

sh install.sh
Copy the code

After the script is executed, you can run the JPS command to view the service startup status.

MasterServer ----- Master service WorkerServer ----- worker service LoggerServer ----- Logger service ApiApplicationServer ----- API service AlertServer ----- Alert serviceCopy the code

Services provided by different nodes vary according to the cluster planning.

8 the login

Login system after the installation is complete, can through the web page http://node1:12345/dolphinscheduler default administrator account: admin/dolphinscheduler123

9 Starting and stopping the service

Stop all services in the cluster

sh ./bin/stop-all.sh
Copy the code

Enable all cluster services in one click

sh ./bin/start-all.sh
Copy the code

Rev. Stop the Master

sh ./bin/dolphinscheduler-daemon.sh start master-server
sh ./bin/dolphinscheduler-daemon.sh stop master-server
Copy the code

Rev. Stop the Worker

sh ./bin/dolphinscheduler-daemon.sh start worker-server
sh ./bin/dolphinscheduler-daemon.sh stop worker-server
Copy the code

Rev. Stop the Api

sh ./bin/dolphinscheduler-daemon.sh start api-server
sh ./bin/dolphinscheduler-daemon.sh stop api-server
Copy the code

Rev. Stop Logger

sh ./bin/dolphinscheduler-daemon.sh start logger-server
sh ./bin/dolphinscheduler-daemon.sh stop logger-server
Copy the code

Rev. Stop Alert.

sh ./bin/dolphinscheduler-daemon.sh start alert-server
sh ./bin/dolphinscheduler-daemon.sh stop alert-server
Copy the code