官方文檔地址：https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9 DolphinScheduler簡介摘自官網：Apache DolphinScheduler 是一個分散式易擴展的可視化DAG工作流任務調度開源系統。適用於企業級場景，提供了一個 ...

官方文檔地址：https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9

DolphinScheduler簡介

摘自官網：Apache DolphinScheduler 是一個分散式易擴展的可視化DAG工作流任務調度開源系統。適用於企業級場景，提供了一個可視化操作任務、工作流和全生命周期數據處理過程的解決方案。

Apache DolphinScheduler 旨在解決複雜的大數據任務依賴關係，併為應用程式提供數據和各種 OPS 編排中的關係。解決數據研發ETL依賴錯綜複雜，無法監控任務健康狀態的問題。 DolphinScheduler 以 DAG（Directed Acyclic Graph，DAG）流式方式組裝任務，可以及時監控任務的執行狀態，支持重試、指定節點恢復失敗、暫停、恢復、終止任務等操作。

file

項目安裝依賴環境

Linux CentOS == 7.6.18(3台)
JDK == 1.8.151
Zookeeper == 3.8.3
MySQL == 5.7.30
dolhpinscheduler == 3.1.9

環境準備

通用集群環境準備

準備虛擬機

IP地址	主機名	CPU配置	記憶體配置	磁碟配置	角色說明
192.168.10.100	hadoop01	4U	8G	100G	DS NODE
192.168.10.101	hadoop02	4U	8G	100G	DS NODE
192.168.10.102	hadoop03	4U	8G	100G	DS NODE

在所有的主機上執行以下命令：

cat >> /etc/hosts << "EOF"
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF

修改軟體源

替換yum的鏡像源為清華源

sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
         -e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
         -i.bak \
         /etc/yum.repos.d/CentOS-*.repo

修改終端顏色

cat << EOF >> ~/.bashrc
PS1="\[\e[37;47m\][\[\e[32;47m\]\u\[\e[34;47m\]@\h \[\e[36;47m\]\w\[\e[0m\]]\\$ "
EOF

讓修改生效

source ~/.bashrc

修改sshd服務優化

sed -ri 's@UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config

sed -ri 's#GSSAPIAuthentication yes#GSSAPIAuthentication no#g' /etc/ssh/sshd_config

grep ^UseDNS /etc/ssh/sshd_config

grep ^GSSAPIAuthentication /etc/ssh/sshd_config`

關閉防火牆

systemctl disable --now firewalld && systemctl is-enabled firewalld

systemctl status firewalld

禁用selinux

sed -ri 's#(SELINUX=)enforcing#\1disabled#' /etc/selinux/config

grep ^SELINUX= /etc/selinux/config

setenforce 0

getenforce

配置集群免密登錄和同步腳本

1)修改主機列表

cat >> /etc/hosts << 'EOF'
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF

2)hadoop01節點上生成密鑰對

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa -q

3)hadoop01配置所有集群節點的免密登錄

for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done

4)免密登錄測試

ssh hadoop01
ssh hadoop02
ssh hadoop03

5)所有節點安裝rsync數據同步工具

線上安裝

yum install -y rsync

離線安裝方式一

yum localinstall -y rsync-2.7.0.rpm

離線安裝方式二

rpm -ivh rsync-2.7.0.rpm --force --nodeps

6)編寫同步腳本

vim /usr/local/sbin/data_rsync.sh

腳本內容如下：

#!/bin/bash
# Author: kkarma

if  [ $# -ne 1 ];then
    echo "Usage: $0 /path/to/file(絕對路徑)"
	exit
fi

#判斷文件是否存在
if  [ ! -e $1 ];then
    echo "[ $1 ] dir or file not found!"
	exit
fi

# 獲取父路徑
fullpath=`dirname $1`

# 獲取子路徑
basename=`basename $1`

# 進入到父路徑
cd $fullpath

for ((host_id=1;host_id<=3;host_id++))
    do
	  # 使得終端輸出變為綠色
	  tput setaf 2
	  echo ==== rsyncing hadoop0${host_id}: $basename ====
	  # 使得終端恢複原來的顏色
	  tput setaf 7
	  # 將數據同步到其他兩個節點
	  rsync -az $basename `whoami@hadoop0${host_id}:$fullpath`
	  if [ $? -eq 0 ];then
	      echo "命令執行成功!"
	  fi
done

7)授權同步腳本

chmod 755 /usr/local/sbin/data_rsync.sh

2.1.8.集群時間同步

1)安裝常用的Linux工具

yum install -y vim net-tools

2)安裝chrony服務

yum install -y ntpdate chrony

3)修改chrony服務配置文件

vim /etc/chrony.conf

註釋掉官方的時間伺服器，換成國內的時間伺服器即可

server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst

4)配置chronyd服務開機自啟

systemctl enable --now chronyd

5)查看chronyd服務

systemctl status chronyd

修改sysctl.conf系統配置

編輯sysctl.conf文件

vm.swappiness = 0
kernel.sysrq = 1

net.ipv4.neigh.default.gc_stale_time = 120

# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2

# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2



fs.file-max = 6815744
vm.max_map_count = 262144
fs.aio-max-nr = 1048576
kernel.shmall = 2097152
kernel.shmmax = 536870912
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.suid_dumpable=1

net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586

修改limit.conf配置文件

在/etc/security/limits.conf文件的末尾追加以下內容
如果已經創建了專門用來管理Elasticsearch的賬號(例如賬號名稱為elastic)，則配置如下：

elastic soft nofile 65535
elastic hard nofile 65535

如果嫌麻煩，直接使用下麵這種配置也可以

* soft nofile 65535
* hard nofile 65535

以上修改完成之後，建議重啟伺服器讓系統配置生效。

JDK安裝

這部分跳過，很簡單，基本隨便找個博客文章照著配置就能搞定。

集群安裝

這裡本來想跳過安裝，直接使用CDH集群中的zookeeper集群的，實際操作發現當使用低版本的Zookeeper集群，併在dolphinscheduler打包時進行低版本ZK適配之後，
部署成功之後總是集群啟動總是會出現各種問題，所以這裡就不折騰了，直接另外安裝了一組Zookeeper集群，下麵給大家講講Zookeeper集群的安裝部署方式

下載安裝

首先配置集群的主機名，確保通過主機名稱可以相互訪問集群節點

vim /etc/hosts

在文件中追加如下內容（所有節點都需要進行此操作）

192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03

Zookkeper下載地址：https://zookeeper.apache.org/releases.html#download

下載之後將安裝包上傳到所有的集群主機上，解壓安裝到/opt/software

file

file
在安裝目錄下，創建data和logs目錄(所有節點都需要進行此操作)

mkdir -p /opt/software/zookeeper/data

mkdir -p /opt/software/zookeeper/logs

file

集群配置

進入到安裝目錄下的conf目錄/opt/software/zookeeper/conf，配置zookeeper的配置文件zoo.cfg

拷貝zoo_sample.cfg文件並重命名為zoo.cfg(所有節點都需要進行此操作)

file

cp /opt/software/zookeeper/conf/zoo_sample.cfg /opt/software/zookeeper/conf/zoo.cfg

配置文件的修改內容如下：

tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/opt/software/zookeeper/data
# the port at which the clients will connect
# 這裡為了避免與主機上的hadoop集群依賴的Zookeeper集群發生衝突， 修改了服務端的埠以及ZK節點之間的通信埠
clientPort=2191
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true

# zookeeper新版本啟動的過程中，zookeeper新增的審核日誌是預設關閉,在windows下啟動需要開啟
#audit.enable=true

# 這裡指定Zookeeper集群的內部通訊配置， 有幾個節點就配置幾條
server.1=hadoop01:2999:3999
server.2=hadoop02:2999:3999
server.3=hadoop03:2999:3999

配置集群中各個節點的server_id，這個配置需要和在zoo.cfg文件中的配置保持一致:

在hadoop01節點上執行以下命令

echo 1 > /opt/software/zookeeper/data/myid

file

在hadoop02節點上執行以下命令

echo 2 > /opt/software/zookeeper/data/myid

在hadoop03節點上執行以下命令

echo 3 > /opt/software/zookeeper/data/myid

測試驗證

首先設置集群的啟停腳本

vim /opt/software/zookeeper/zk-start-all.sh

腳本的內容如下：

註意：

zookeeper集群的啟動依賴JDK, 會用到JAVA_HOME變數, 所以需要先安裝JDK，設置JAVA的系統環境變數
以下腳本的執行，如果沒有配置集群的免密登錄，每次都需要輸入密碼，所以需要先進行集群免密登錄設置

#!/bin/bash

case $1 in
"start"){
    #遍歷集群所有機器
	for i in hadoop01 hadoop02 hadoop03
	do
		#控制台輸出日誌
		echo =============zookeeper $i 啟動====================
		#啟動命令
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh start"
	done
}
;;
"stop"){
	for i in hadoop01 hadoop02 hadoop03
	do
		echo =============zookeeper $i 停止====================
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh stop"
	done
}
;;
"status"){
	for i in hadoop01 hadoop02 hadoop03
	do
		echo =============查看 zookeeper $i 狀態====================
		ssh $i "/opt/software/zookeeper/bin/zkServer.sh status"
	done
}
;;
esac

chmod 755 /opt/software/zookeeper/zk-start-all.sh

我這裡已經啟動過集群正在使用，就不演示啟動了，演示一下查詢狀態命令，/opt/software/zookeeper/zk-start-all.sh status，出現如下報錯：

file

解決方法： 找到每台節點主機的/opt/software/zookeeper/bin/zkEnv.sh文件，在腳本文件代碼部分的最前面加上自己的JAVA_HOME路徑即可。

file

進入hadoop01的/opt/software/zookeeper目錄下，執行./zk-start-all.sh status命令查看Zookeeper 集群狀態，返回結果如下圖：OK,集群的啟停腳本基本沒啥問題了。

file

zk集群啟停、狀態查詢的命令如下：

sh /opt/software/zookeeper/zk-start-all.sh start

# 停止zookeeper集群
sh /opt/software/zookeeper/zk-start-all.sh stop

# 可以查詢集群各節點的狀態跟角色信息
sh /opt/software/zookeeper/zk-start-all.sh status

MySQL安裝

MySQL安裝可以參考我的另外一篇博客伺服器linux-CentOS7.系統下使用mysql..tar.gz包安裝mysql資料庫詳解

集群部署

下載DolphinScheduler

下載地址：https://dlcdn.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz

直接通過wget命令下載到伺服器的某個路徑下，如果伺服器無法聯網，只能先聯網下載二進位安裝包到本地，然後再通過ssh客戶端工具上傳到伺服器集群的每個節點。

創建dolphinscheduler的集群運行賬戶並設置

創建安裝運行dolphinscheduler集群的用戶ds
在root賬號下，執行添加普通用戶的命令

useradd dolphinscheduler

設置dolphinscheduler用戶的密碼

passwd dolphinscheduler

讓dolphinscheduler用戶具有執行sudo命令免密執行的許可權

sed -i '$adolphinscheduler  ALL=(ALL)  NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults    requirett/#Defaults    requirett/g' /etc/sudoers

拷貝二進位安裝包apache-dolphinscheduler-3.1.9-bin.tar.gz到/opt/packages目錄(沒有則創建此目錄)下

修改apache-dolphinscheduler-3.1.9-bin.tar.gz安裝包的所屬用戶和用戶組為dolphinscheduler

chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz

配置用戶的集群免密登錄

切換到dolphinscheduler用戶，配置集群免密（這裡只需要在hadoop01上執行就可以）

2)hadoop01節點上生成密鑰對

ssh-keygen -t rsa

3)hadoop01配置所有集群節點的免密登錄

for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done

4)免密登錄測試

ssh hadoop01
ssh hadoop02
ssh hadoop03

資料庫初始化

dolphinscheduler預設使用的資料庫的名稱是dolphinscheduler, 我們這裡先創建資料庫並創建管理用戶並授權

create database `dolphinscheduler` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;

-- 創建 dolphinScheduler 用戶專門用戶管理dolphinscheduler資料庫
CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';

-- 給予庫的訪問許可權
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';

-- 讓許可權配置修改生效
flush privileges;

解壓二進位安裝包

tar -zxf /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz

mv

修改安裝腳本和參數配置

dolphinscheduler中主要包含api-server、master-server、 worker-server三個服務，配置文件 /opt/oackages/apache-dolphinscheduler-3.1.9-bin/bin/env/install_env.sh 主要就是用來配置哪些機器將被安裝 DolphinScheduler 以及每台機器對應安裝哪些服務。

# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
ips="hadoop01,hadoop02,hadoop03"

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}

# 在哪些主機節點上安裝Dolphinscheduler,多台服務之間使用英文逗號分隔
ips="hadoop01,hadoop02,hadoop03"

# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}

# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
#masters=${masters:-"hadoop01"}

# 集群中那些被指定為master節點,多台服務之間使用英文逗號分隔
masters="hadoop01,hadoop02"

# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}

# 集群中那些被指定為worker節點,多台服務之間使用英文逗號分隔,那幾台被指定為預設，就在節點名稱後添加":default"
workers="hadoop02:default,hadoop03:default"

# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
#alertServer=${alertServer:-"ds3"}

# 集群中那些被指定為alert告警節點,多台服務之間使用英文逗號分隔
alertServer="hadoop03"

# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
#apiServers=${apiServers:-"ds1"}

# 集群中那個節點用來安裝api-server服務
apiServers="hadoop01"

# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
#installPath=${installPath:-"/tmp/dolphinscheduler"}
#installPath="/opt/software/dolphinscheduler"

# dolphinscheduler在集群中的預設安裝路徑/opt/software/dolphinscheduler
installPath="/opt/software/dolphinscheduler"

# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# 指定dolphinscheduler集群的安裝用戶
deployUser=${deployUser:-"dolphinscheduler"}

# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
#zkRoot=${zkRoot:-"/dolphinscheduler"}

# 指定dolphinscheduler集群在zookeeper中的註冊根路徑
zkRoot=${zkRoot:-"/dolphinscheduler"}

配置文件 /opt/packages/apache-dolphinscheduler-3.1.9-bin/bin/env/dolphinscheduler_env.sh 主要就是用來配置 DolphinScheduler 的資料庫連接信息、一些dolphinscheduler支持的調度任務類型外部依賴路徑或庫文件，如 JAVA_HOME 、DATAX_HOME和SPARK_HOME 都是在這裡定義的。

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

# JAVA_HOME, will use it to start DolphinScheduler server
#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}

#配置JAVA_HOME變數
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_181-cloudera}

# Database related configuration, set database type, username and password
#export SPRING_DATASOURCE_URL


#配置Dolphinscheduler的資料庫連接信息
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?serverTimezone=UTC&useTimezone=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai"
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=dolphinscheduler

# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}

# Registry center configuration, determines the type and link of the registry center

#配置Dolphinscheduler的使用的註冊中心類型為Zookeeper
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2191}

#配置Dolphinscheduler的使用的註冊中心zookeeper集群連接信息
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2191,hadoop02:2191,hadoop03:2191}

# Tasks related configurations, need to change the configuration if you use the related tasks.
#Dolphinscheduler中各個任務類型相關的系統環境變數配置，找到你可能使用到的任務類型可能使用到的服務在伺服器上的安裝路徑，配置到這裡就可以，最好在集群安裝之前配置好
#export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
#export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
#export HADOOP_CONF_DIR=etc/hadoop/conf
#export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
#export PYTHON_HOME=/opt/soft/python
#export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
#export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
#export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
#export SQOOP_HOME=${SQOOP_HOME:-/opt/soft/sqoop}

export PATH=$HADOOP_HOME/bin:$SQOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH

關閉Python 網關(預設開啟)

Python 網關服務會預設與 api-server 一起啟動，如果不想啟動則需要更改 api-server 配置文件 /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/conf/application.yaml 中的 python-gateway.enabled : false 來禁用它。

vim ./api-server/conf/application.yaml

file

執行資料庫初始化腳本

#切換到資料庫腳本所在目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/sql/sql
#從SQL備份文件中還原資料庫
mysql -udolphinscheduler -p dolphinscheduler < dolphinscheduler_mysql.sql

配置數據源驅動文件

MySQL 驅動文件必須使用 JDBC Driver 8.0.16 及以上的版本，需要手動下載 mysql-connector-java 並移動到 DolphinScheduler 的每個模塊的 libs 目錄下，其中包括 5 個目錄：

/opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs

/opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs

將mysql的驅動複製到這些模塊的依賴路徑下

cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs/

當然除了mysql之外，可能還涉及SQLServer、Oracle、Hive等數據源驅動，集成方式和MySQL是一樣的，不過最好在集群安裝之前就將需要的依賴都提前添加到對應模塊的libs目錄下，這樣集群安裝之後就不用再處理了，不過之後再處理數據源依賴也是可以的。
file

以上資料庫依賴有需要可以私信流郵箱，我看到會發給你們的。

執行集群安裝

首先，再次修改/opt/packages/apache-dolphinscheduler-3.1.9-bin的所屬用戶和用戶組為dolphinscheduler

chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin

切換到dolphinscheudler用戶

su - dolphinscheudler

切換到解壓根目錄

cd /opt/packages/apache-dolphinscheduler-3.1.9-bin

執行集群安裝腳本install.sh

./bin/install.sh

安裝腳本執行完成後, 會自動檢測集群各個節點的信息

file

集群啟停測試

安裝完成之後，所有節點上Dolphinscheduler服務的預設安裝目錄都是/opt/software/dolphinscheduler

啟動之前，確保zookeeper服務正常啟動，否則集群無法正常啟動成功。

在hadoop01節點上切換到dolphinscheduler系統用戶

su - dolphinscheduler

切換到dolphinscheduler安裝目錄

cd /opt/software/dolphinscheduler

執行集群常用操作命令

#一鍵啟動集群命令
./bin/start-all.sh

#一鍵停止集群命令
./bin/stop-all.sh

#一鍵查詢集群狀態命令
./bin/status-all.sh

訪問UI地址：http://hadoop01的IP:12345/dolphinscheduler/ui

用戶名：admin 密碼：dolphinscheduler123

file

OK, 至此DolphinScheduler分散式集群就搭建完成了。

本文由白鯨開源提供發佈支持！

DolphinScheduler分散式集群部署指南(小白版)

DolphinScheduler簡介

項目安裝依賴環境

環境準備

通用集群環境準備

準備虛擬機

修改軟體源

修改終端顏色

修改sshd服務優化

關閉防火牆

禁用selinux

配置集群免密登錄和同步腳本

線上安裝

離線安裝方式一

離線安裝方式二

2.1.8.集群時間同步

註釋掉官方的時間伺服器，換成國內的時間伺服器即可

修改sysctl.conf系統配置

修改limit.conf配置文件

JDK安裝

集群安裝

下載安裝

集群配置

測試驗證

MySQL安裝

集群部署

下載DolphinScheduler

創建dolphinscheduler的集群運行賬戶並設置

配置用戶的集群免密登錄

資料庫初始化

解壓二進位安裝包

修改安裝腳本和參數配置

關閉Python 網關(預設開啟)

執行資料庫初始化腳本

配置數據源驅動文件

執行集群安裝

集群啟停測試