Azkaban is a batch workflow task scheduler open-source by Linkedin. Used to run a set of jobs and processes in a particular order within a workflow. Azkaban defines a KV file format to establish dependencies between tasks and provides an easy-to-use Web user interface to maintain and track your workflow.

1. Install

The preparatory work

Azkaban – a web server – 2.5.0. Tar. Gz azkaban – executor – server – 2.5.0. Tar. Gz azkaban – SQL script — 2.5.0. Tar. Gz

Gz indicates the server and azkaban-executor-server-2.1.0.tar. gz indicates the execution server. Azkaban-sql-script-2.1.0.tar. gz is the SQL script executed.

Mysql > create table

Unzip them separately after installation. We also need to create the database in mysql, and then run the SQL script provided by Azkaban to create the tables required by Azkaban.

mysql -uroot -p
mysql> create database azkaban;
mysql> use azkaban;
Database changed
mysql> source/ home/fantj azkaban/azkaban - 2.5.0 / create - all - SQL - 2.5.0. SQL; mysql> show tables; +------------------------+ | Tables_in_azkaban | +------------------------+ | active_executing_flows | | active_sla | | execution_flows | | execution_jobs | | execution_logs | | project_events | | project_files | | project_flows | | project_permissions | | project_properties | | project_versions | | projects | | properties | | schedules | | triggers |  +------------------------+ 15 rowsin set (0.00 sec)
Copy the code

3. Create an SSL configuration

1. Run commandskeytool -keystore keystore -alias jetty -genkey -keyalg RSAOne is generated in the current directorykeystoreOf course, executing this command requires you to fill in some information, such as your name + work unit, etc. Fill in as prompted.
2. Save the keystore cobb to the bin directory of the Azkaban Web server

4. Configure the time zone

[root@s166 azkaban]# tzselect
Please identify a location so that time zone rules can be set correctly.
Please select a continent or ocean.
 1) Africa
 2) Americas
 3) Antarctica
 4) Arctic Ocean
 5) Asia
 6) Atlantic Ocean
 7) Australia
 8) Europe
 9) Indian Ocean
10) Pacific Ocean
11) none - I want to specify the time zone using the Posix TZ format.
#? 5
Please select a country.
 1) Afghanistan		  18) Israel		    35) Palestine
 2) Armenia		  19) Japan		    36) Philippines
 3) Azerbaijan		  20) Jordan		    37) Qatar
 4) Bahrain		  21) Kazakhstan	    38) Russia
 5) Bangladesh		  22) Korea (North)	    39) Saudi Arabia
 6) Bhutan		  23) Korea (South)	    40) Singapore
 7) Brunei		  24) Kuwait		    41) Sri Lanka
 8) Cambodia		  25) Kyrgyzstan	    42) Syria
 9) China		  26) Laos		    43) Taiwan
10) Cyprus		  27) Lebanon		    44) Tajikistan
11) East Timor		  28) Macau		    45) Thailand
12) Georgia		  29) Malaysia		    46) Turkmenistan
13) Hong Kong		  30) Mongolia		    47) United Arab Emirates
14) India		  31) Myanmar (Burma)	    48) Uzbekistan
15) Indonesia		  32) Nepal		    49) Vietnam
16) Iran		  33) Oman		    50) Yemen
17) Iraq		  34) Pakistan
#? 9
Please select one of the following time zone regions.
1) Beijing Time
2) Xinjiang Time
#? 1

The following information has been given:

	China
	Beijing Time

Therefore TZ='Asia/Shanghai' will be used.
Local time is now:	Sat Jul 28 18:29:58 CST 2018.
Universal Time is now:	Sat Jul 28 10:29:58 UTC 2018.
Is the above information OK?
1) Yes
2) No
#? 1

You can make this change permanent for yourself by appending the line
	TZ='Asia/Shanghai'; export TZ
to the file '.profile' in your home directory; then log out and log in again.

Here is that TZ value again, this time on standard output so that you
can use the /usr/bin/tzselect command in shell scripts:
Asia/Shanghai
Copy the code

This configuration needs to be set for each host in the cluster, because task scheduling depends on accurate time. We can also directly copy the relevant files to another host for overwriting.

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
Copy the code
[root@s166 azkaban]# scp /usr/share/zoneinfo/Asia/Shanghai root@s168:/etc/localtime
Shanghai                                                                                              100%  388   500.8KB/s   00:00    
[root@s166 azkaban]# scp /usr/share/zoneinfo/Asia/Shanghai root@s169:/etc/localtime
Shanghai   
Copy the code

5. Modify the configuration

5.1 Modifying Server Configurations
5.1.1 /webserver/confIn the directoryazkaban.properties(I renamed the decompressed file on the server side to WebServer)
#Azkaban Personalization Settings
azkaban.name=Test
azkaban.label=My Local Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai

#Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml

#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

database.type=mysql
mysql.port=3306
mysql.host=localhost
mysql.database=azkaban
mysql.user=root
mysql.password=root
mysql.numconnections=100

# Velocity dev mode
velocity.dev.mode=false

# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
jetty.keystore=keystore
jetty.password=jiaoroot
jetty.keypassword=jiaoroot
jetty.truststore=keystore
jetty.trustpassword=jiaoroot

# Azkaban Executor settings
executor.port=12321

# mail settings
[email protected]
mail.host=smtp.qq.com
job.failure.email=
job.success.email=

lockdown.create.projects=false

cache.directory=cache
Copy the code

Modify the time zone, mysql configuration, SSL password, file path, and email. Don’t stick notes, a look at the understanding.

5.1.2. Modify/conf/In the directoryazkaban-users.xml
<azkaban-users>
        <user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
        <user username="metrics" password="metrics" roles="metrics"/>
        <user username="admin" password="admin" roles="admin">
        
        <role name="admin" permissions="ADMIN" />
        <role name="metrics" permissions="METRICS"/>
</azkaban-users>
Copy the code
5.2 Configuring the Server

Modify azkaban.properties in the /executor/conf directory

#Azkaban
default.timezone.id=Asia/Shanghai

# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes

#Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects

database.type=mysql
mysql.port=3306
mysql.host=localhost
mysql.database=azkaban
mysql.user=root
mysql.password=root
mysql.numconnections=100

# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30
Copy the code

6. Perform

6.1 Starting the Web Server

In the webserver/bin directory, run [root@s166 webserver]# nohup bin/azkaban-web-start.sh 1>/ TMP /azstd.out 2>/ TMP/AZERr.out & to start the service.

Tip: Don’t remember to use nohUP for the moment, otherwise there will be no timely feedback, should try to use Nohup after the execution is passed. [root@s166 executor]# bin/azkaban-executor-start.sh

Some of the errors I probably saw were:

  1. The /bin/ directory does not contain the keystore file ——. You need to copy it to bin
  2. Various configuration files cannot be found —– I configured these files as absolute paths in the configuration file.
6.2 Starting the Execution Server

Run [root@s166 webserver]# bin/azkaban-web-start.sh in the /executor/bin/ directory

6.3 Browser Accesshttps://s166:8443/

If you see a screen like this, you are wrong. Instead of executing the boot file in the root directory, you habitually execute the boot file in the bin directory, so a lot of its CSS does not load.

Log in with the specified account and password.

7. Azkaban

7.1 Example of a Single Job
  1. Example Create a job description file
vim command.job

#command.job
type=command                                                    
command=echo fantj666
Copy the code
  1. Package the job resource file as zip command. Job

  2. Create a project using azkaban’s Web management platform and upload the job package

7.2 Multi-job Workflow Flow
  1. Create jobs with dependencies. Describe the first job: foo.job
# foo.job
type=command
command=echo foo
Copy the code

The second job: bar.job depends on foo.job

# bar.job
type=command
dependencies=foo
command=echo bar
Copy the code
  1. Type all job resource files into a ZIP package
  2. Upload the ZIP package and start it
  3. See the job log
7.3 operating hadoop
  1. vim fs.job
# fs.job
type=command
command=/home/fantj/hadoop/bin/hadoop fs -lsr /
Copy the code
  1. Upload it as a zip package
  2. Start the job and view the LOB
7.4 operating hive

Hive test script. SQL

use default;
drop table aztest;
create table aztest(id int,name string,age int) row format delimited fields terminated by ', ' ;
load data inpath '/aztest/hiveinput' into table aztest;
create table azres as select * from aztest;
insert overwrite directory '/aztest/hiveoutput' select count(1) from aztest; 
Copy the code

The job file hivef. Job

# hivef.job
type=command
command=/home/fantj/hive/bin/hive -f 'test.sql'
Copy the code

Zip package – Upload – Execute – Query log