官方文檔地址:https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9 DolphinScheduler簡介 摘自官網:Apache DolphinScheduler 是一個分散式易擴展的可視化DAG工作流任務調度開源系統。適用於企業級場景,提供了一個 ...
官方文檔地址:https://dolphinscheduler.apache.org/zh-cn/docs/3.1.9
DolphinScheduler簡介
摘自官網:Apache DolphinScheduler 是一個分散式易擴展的可視化DAG工作流任務調度開源系統。適用於企業級場景,提供了一個可視化操作任務、工作流和全生命周期數據處理過程的解決方案。
Apache DolphinScheduler 旨在解決複雜的大數據任務依賴關係,併為應用程式提供數據和各種 OPS 編排中的關係。 解決數據研發ETL依賴錯綜複雜,無法監控任務健康狀態的問題。 DolphinScheduler 以 DAG(Directed Acyclic Graph,DAG)流式方式組裝任務,可以及時監控任務的執行狀態,支持重試、指定節點恢復失敗、暫停、恢復、終止任務等操作。
項目安裝依賴環境
- Linux CentOS == 7.6.18(3台)
- JDK == 1.8.151
- Zookeeper == 3.8.3
- MySQL == 5.7.30
- dolhpinscheduler == 3.1.9
環境準備
通用集群環境準備
準備虛擬機
IP地址 | 主機名 | CPU配置 | 記憶體配置 | 磁碟配置 | 角色說明 |
---|---|---|---|---|---|
192.168.10.100 | hadoop01 | 4U | 8G | 100G | DS NODE |
192.168.10.101 | hadoop02 | 4U | 8G | 100G | DS NODE |
192.168.10.102 | hadoop03 | 4U | 8G | 100G | DS NODE |
在所有的主機上執行以下命令:
cat >> /etc/hosts << "EOF"
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF
修改軟體源
替換yum的鏡像源為清華源
sudo sed -e 's|^mirrorlist=|#mirrorlist=|g' \
-e 's|^#baseurl=http://mirror.centos.org|baseurl=https://mirrors.tuna.tsinghua.edu.cn|g' \
-i.bak \
/etc/yum.repos.d/CentOS-*.repo
修改終端顏色
cat << EOF >> ~/.bashrc
PS1="\[\e[37;47m\][\[\e[32;47m\]\u\[\e[34;47m\]@\h \[\e[36;47m\]\w\[\e[0m\]]\\$ "
EOF
讓修改生效
source ~/.bashrc
修改sshd服務優化
sed -ri 's@UseDNS yes@UseDNS no@g' /etc/ssh/sshd_config
sed -ri 's#GSSAPIAuthentication yes#GSSAPIAuthentication no#g' /etc/ssh/sshd_config
grep ^UseDNS /etc/ssh/sshd_config
grep ^GSSAPIAuthentication /etc/ssh/sshd_config`
關閉防火牆
systemctl disable --now firewalld && systemctl is-enabled firewalld
systemctl status firewalld
禁用selinux
sed -ri 's#(SELINUX=)enforcing#\1disabled#' /etc/selinux/config
grep ^SELINUX= /etc/selinux/config
setenforce 0
getenforce
配置集群免密登錄和同步腳本
1)修改主機列表
cat >> /etc/hosts << 'EOF'
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
EOF
2)hadoop01節點上生成密鑰對
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa -q
3)hadoop01配置所有集群節點的免密登錄
for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done
4)免密登錄測試
ssh hadoop01
ssh hadoop02
ssh hadoop03
5)所有節點安裝rsync數據同步工具
線上安裝
yum install -y rsync
離線安裝方式一
yum localinstall -y rsync-2.7.0.rpm
離線安裝方式二
rpm -ivh rsync-2.7.0.rpm --force --nodeps
6)編寫同步腳本
vim /usr/local/sbin/data_rsync.sh
腳本內容如下:
#!/bin/bash
# Author: kkarma
if [ $# -ne 1 ];then
echo "Usage: $0 /path/to/file(絕對路徑)"
exit
fi
#判斷文件是否存在
if [ ! -e $1 ];then
echo "[ $1 ] dir or file not found!"
exit
fi
# 獲取父路徑
fullpath=`dirname $1`
# 獲取子路徑
basename=`basename $1`
# 進入到父路徑
cd $fullpath
for ((host_id=1;host_id<=3;host_id++))
do
# 使得終端輸出變為綠色
tput setaf 2
echo ==== rsyncing hadoop0${host_id}: $basename ====
# 使得終端恢複原來的顏色
tput setaf 7
# 將數據同步到其他兩個節點
rsync -az $basename `whoami@hadoop0${host_id}:$fullpath`
if [ $? -eq 0 ];then
echo "命令執行成功!"
fi
done
7)授權同步腳本
chmod 755 /usr/local/sbin/data_rsync.sh
2.1.8.集群時間同步
1)安裝常用的Linux工具
yum install -y vim net-tools
2)安裝chrony服務
yum install -y ntpdate chrony
3)修改chrony服務配置文件
vim /etc/chrony.conf
註釋掉官方的時間伺服器,換成國內的時間伺服器即可
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
server ntp.aliyun.com iburst
4)配置chronyd服務開機自啟
systemctl enable --now chronyd
5)查看chronyd服務
systemctl status chronyd
修改sysctl.conf系統配置
編輯sysctl.conf文件
vm.swappiness = 0
kernel.sysrq = 1
net.ipv4.neigh.default.gc_stale_time = 120
# see details in https://help.aliyun.com/knowledge_detail/39428.html
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.arp_announce = 2
net.ipv4.conf.lo.arp_announce = 2
net.ipv4.conf.all.arp_announce = 2
# see details in https://help.aliyun.com/knowledge_detail/41334.html
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_max_syn_backlog = 1024
net.ipv4.tcp_synack_retries = 2
fs.file-max = 6815744
vm.max_map_count = 262144
fs.aio-max-nr = 1048576
kernel.shmall = 2097152
kernel.shmmax = 536870912
kernel.shmmni = 4096
kernel.sem = 250 32000 100 128
fs.suid_dumpable=1
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048586
修改limit.conf配置文件
在/etc/security/limits.conf文件的末尾追加以下內容
如果已經創建了專門用來管理Elasticsearch的賬號(例如賬號名稱為elastic),則配置如下:
elastic soft nofile 65535
elastic hard nofile 65535
如果嫌麻煩, 直接使用下麵這種配置也可以
* soft nofile 65535
* hard nofile 65535
以上修改完成之後,建議重啟伺服器讓系統配置生效。
JDK安裝
這部分跳過,很簡單,基本隨便找個博客文章照著配置就能搞定。
集群安裝
這裡本來想跳過安裝, 直接使用CDH集群中的zookeeper集群的,實際操作發現當使用低版本的Zookeeper集群,併在dolphinscheduler打包時進行低版本ZK適配之後,
部署成功之後總是集群啟動總是會出現各種問題,所以這裡就不折騰了,直接另外安裝了一組Zookeeper集群, 下麵給大家講講Zookeeper集群的安裝部署方式
下載安裝
首先配置集群的主機名,確保通過主機名稱可以相互訪問集群節點
vim /etc/hosts
在文件中追加如下內容(所有節點都需要進行此操作)
192.168.10.100 hadoop01
192.168.10.101 hadoop02
192.168.10.102 hadoop03
Zookkeper下載地址:https://zookeeper.apache.org/releases.html#download
下載之後將安裝包上傳到所有的集群主機上,解壓安裝到/opt/software
在安裝目錄下,創建data和logs目錄(所有節點都需要進行此操作)
mkdir -p /opt/software/zookeeper/data
mkdir -p /opt/software/zookeeper/logs
集群配置
進入到安裝目錄下的conf目錄/opt/software/zookeeper/conf
,配置zookeeper的配置文件zoo.cfg
拷貝zoo_sample.cfg
文件並重命名為zoo.cfg
(所有節點都需要進行此操作)
cp /opt/software/zookeeper/conf/zoo_sample.cfg /opt/software/zookeeper/conf/zoo.cfg
配置文件的修改內容如下:
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/opt/software/zookeeper/data
# the port at which the clients will connect
# 這裡為了避免與主機上的hadoop集群依賴的Zookeeper集群發生衝突, 修改了服務端的埠以及ZK節點之間的通信埠
clientPort=2191
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
## Metrics Providers
#
# https://prometheus.io Metrics Exporter
#metricsProvider.className=org.apache.zookeeper.metrics.prometheus.PrometheusMetricsProvider
#metricsProvider.httpPort=7000
#metricsProvider.exportJvmInfo=true
# zookeeper新版本啟動的過程中,zookeeper新增的審核日誌是預設關閉,在windows下啟動需要開啟
#audit.enable=true
# 這裡指定Zookeeper集群的內部通訊配置, 有幾個節點就配置幾條
server.1=hadoop01:2999:3999
server.2=hadoop02:2999:3999
server.3=hadoop03:2999:3999
配置集群中各個節點的server_id, 這個配置需要和在zoo.cfg
文件中的配置保持一致:
在hadoop01節點上執行以下命令
echo 1 > /opt/software/zookeeper/data/myid
在hadoop02節點上執行以下命令
echo 2 > /opt/software/zookeeper/data/myid
在hadoop03節點上執行以下命令
echo 3 > /opt/software/zookeeper/data/myid
測試驗證
首先設置集群的啟停腳本
vim /opt/software/zookeeper/zk-start-all.sh
腳本的內容如下:
註意:
- zookeeper集群的啟動依賴JDK, 會用到
JAVA_HOME
變數, 所以需要先安裝JDK,設置JAVA的系統環境變數 - 以下腳本的執行,如果沒有配置集群的免密登錄,每次都需要輸入密碼,所以需要先進行集群免密登錄設置
#!/bin/bash
case $1 in
"start"){
#遍歷集群所有機器
for i in hadoop01 hadoop02 hadoop03
do
#控制台輸出日誌
echo =============zookeeper $i 啟動====================
#啟動命令
ssh $i "/opt/software/zookeeper/bin/zkServer.sh start"
done
}
;;
"stop"){
for i in hadoop01 hadoop02 hadoop03
do
echo =============zookeeper $i 停止====================
ssh $i "/opt/software/zookeeper/bin/zkServer.sh stop"
done
}
;;
"status"){
for i in hadoop01 hadoop02 hadoop03
do
echo =============查看 zookeeper $i 狀態====================
ssh $i "/opt/software/zookeeper/bin/zkServer.sh status"
done
}
;;
esac
chmod 755 /opt/software/zookeeper/zk-start-all.sh
我這裡已經啟動過集群正在使用,就不演示啟動了,演示一下查詢狀態命令,/opt/software/zookeeper/zk-start-all.sh status
,出現如下報錯:
解決方法: 找到每台節點主機的/opt/software/zookeeper/bin/zkEnv.sh
文件,在腳本文件代碼部分的最前面 加上自己的JAVA_HOME
路徑即可。
進入hadoop01的/opt/software/zookeeper
目錄下,執行./zk-start-all.sh status
命令查看Zookeeper 集群狀態,返回結果如下圖:OK,集群的啟停腳本基本沒啥問題了。
zk集群啟停、狀態查詢的命令如下:
sh /opt/software/zookeeper/zk-start-all.sh start
# 停止zookeeper集群
sh /opt/software/zookeeper/zk-start-all.sh stop
# 可以查詢集群各節點的狀態跟角色信息
sh /opt/software/zookeeper/zk-start-all.sh status
MySQL安裝
MySQL安裝可以參考我的另外一篇博客伺服器linux-CentOS7.系統下使用mysql..tar.gz包安裝mysql資料庫詳解
集群部署
下載DolphinScheduler
下載地址:https://dlcdn.apache.org/dolphinscheduler/3.1.9/apache-dolphinscheduler-3.1.9-bin.tar.gz
直接通過wget
命令下載到伺服器的某個路徑下,如果伺服器無法聯網, 只能先聯網下載二進位安裝包到本地,然後再通過ssh客戶端工具上傳到伺服器集群的每個節點。
創建dolphinscheduler的集群運行賬戶並設置
創建安裝運行dolphinscheduler集群的用戶ds
在root賬號下,執行添加普通用戶的命令
useradd dolphinscheduler
設置dolphinscheduler
用戶的密碼
passwd dolphinscheduler
讓dolphinscheduler
用戶具有執行sudo
命令免密執行的許可權
sed -i '$adolphinscheduler ALL=(ALL) NOPASSWD: NOPASSWD: ALL' /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
拷貝二進位安裝包apache-dolphinscheduler-3.1.9-bin.tar.gz
到/opt/packages目錄(沒有則創建此目錄)下
修改apache-dolphinscheduler-3.1.9-bin.tar.gz
安裝包的所屬用戶和用戶組為dolphinscheduler
chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz
配置用戶的集群免密登錄
切換到dolphinscheduler
用戶,配置集群免密(這裡只需要在hadoop01上執行就可以)
2)hadoop01節點上生成密鑰對
ssh-keygen -t rsa
3)hadoop01配置所有集群節點的免密登錄
for ((host_id=1;host_id<=3;host_id++));do ssh-copy-id hadoop0${host_id} ;done
4)免密登錄測試
ssh hadoop01
ssh hadoop02
ssh hadoop03
資料庫初始化
dolphinscheduler預設使用的資料庫的名稱是dolphinscheduler
, 我們這裡先創建資料庫並創建管理用戶並授權
create database `dolphinscheduler` DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_general_ci;
-- 創建 dolphinScheduler 用戶專門用戶管理dolphinscheduler資料庫
CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
-- 給予庫的訪問許可權
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%';
-- 讓許可權配置修改生效
flush privileges;
解壓二進位安裝包
tar -zxf /opt/packages/apache-dolphinscheduler-3.1.9-bin.tar.gz
mv
修改安裝腳本和參數配置
dolphinscheduler中主要包含api-server
、master-server
、 worker-server
三個服務,配置文件 /opt/oackages/apache-dolphinscheduler-3.1.9-bin/bin/env/install_env.sh
主要就是用來配置哪些機器將被安裝 DolphinScheduler 以及每台機器對應安裝哪些服務。
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
ips="hadoop01,hadoop02,hadoop03"
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ---------------------------------------------------------
# INSTALL MACHINE
# ---------------------------------------------------------
# A comma separated list of machine hostname or IP would be installed DolphinScheduler,
# including master, worker, api, alert. If you want to deploy in pseudo-distributed
# mode, just write a pseudo-distributed hostname
# Example for hostnames: ips="ds1,ds2,ds3,ds4,ds5", Example for IPs: ips="192.168.8.1,192.168.8.2,192.168.8.3,192.168.8.4,192.168.8.5"
#ips=${ips:-"ds1,ds2,ds3,ds4,ds5"}
# 在哪些主機節點上安裝Dolphinscheduler,多台服務之間使用英文逗號分隔
ips="hadoop01,hadoop02,hadoop03"
# Port of SSH protocol, default value is 22. For now we only support same port in all `ips` machine
# modify it if you use different ssh port
sshPort=${sshPort:-"22"}
# A comma separated list of machine hostname or IP would be installed Master server, it
# must be a subset of configuration `ips`.
# Example for hostnames: masters="ds1,ds2", Example for IPs: masters="192.168.8.1,192.168.8.2"
#masters=${masters:-"hadoop01"}
# 集群中那些被指定為master節點,多台服務之間使用英文逗號分隔
masters="hadoop01,hadoop02"
# A comma separated list of machine <hostname>:<workerGroup> or <IP>:<workerGroup>.All hostname or IP must be a
# subset of configuration `ips`, And workerGroup have default value as `default`, but we recommend you declare behind the hosts
# Example for hostnames: workers="ds1:default,ds2:default,ds3:default", Example for IPs: workers="192.168.8.1:default,192.168.8.2:default,192.168.8.3:default"
#workers=${workers:-"ds1:default,ds2:default,ds3:default,ds4:default,ds5:default"}
# 集群中那些被指定為worker節點,多台服務之間使用英文逗號分隔,那幾台被指定為預設,就在節點名稱後添加":default"
workers="hadoop02:default,hadoop03:default"
# A comma separated list of machine hostname or IP would be installed Alert server, it
# must be a subset of configuration `ips`.
# Example for hostname: alertServer="ds3", Example for IP: alertServer="192.168.8.3"
#alertServer=${alertServer:-"ds3"}
# 集群中那些被指定為alert告警節點,多台服務之間使用英文逗號分隔
alertServer="hadoop03"
# A comma separated list of machine hostname or IP would be installed API server, it
# must be a subset of configuration `ips`.
# Example for hostname: apiServers="ds1", Example for IP: apiServers="192.168.8.1"
#apiServers=${apiServers:-"ds1"}
# 集群中那個節點用來安裝api-server服務
apiServers="hadoop01"
# The directory to install DolphinScheduler for all machine we config above. It will automatically be created by `install.sh` script if not exists.
# Do not set this configuration same as the current path (pwd). Do not add quotes to it if you using related path.
#installPath=${installPath:-"/tmp/dolphinscheduler"}
#installPath="/opt/software/dolphinscheduler"
# dolphinscheduler在集群中的預設安裝路徑/opt/software/dolphinscheduler
installPath="/opt/software/dolphinscheduler"
# The user to deploy DolphinScheduler for all machine we config above. For now user must create by yourself before running `install.sh`
# script. The user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled than the root directory needs
# to be created by this user
# 指定dolphinscheduler集群的安裝用戶
deployUser=${deployUser:-"dolphinscheduler"}
# The root of zookeeper, for now DolphinScheduler default registry server is zookeeper.
#zkRoot=${zkRoot:-"/dolphinscheduler"}
# 指定dolphinscheduler集群在zookeeper中的註冊根路徑
zkRoot=${zkRoot:-"/dolphinscheduler"}
配置文件 /opt/packages/apache-dolphinscheduler-3.1.9-bin/bin/env/dolphinscheduler_env.sh
主要就是用來配置 DolphinScheduler 的資料庫連接信息、一些dolphinscheduler支持的調度任務類型外部依賴路徑或庫文件,如 JAVA_HOME
、DATAX_HOME
和SPARK_HOME
都是在這裡定義的。
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# JAVA_HOME, will use it to start DolphinScheduler server
#export JAVA_HOME=${JAVA_HOME:-/opt/java/openjdk}
#配置JAVA_HOME變數
export JAVA_HOME=${JAVA_HOME:-/usr/java/jdk1.8.0_181-cloudera}
# Database related configuration, set database type, username and password
#export SPRING_DATASOURCE_URL
#配置Dolphinscheduler的資料庫連接信息
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?serverTimezone=UTC&useTimezone=true&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai"
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-GMT+8}
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=dolphinscheduler
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
#配置Dolphinscheduler的使用的註冊中心類型為Zookeeper
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
#export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2191}
#配置Dolphinscheduler的使用的註冊中心zookeeper集群連接信息
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-hadoop01:2191,hadoop02:2191,hadoop03:2191}
# Tasks related configurations, need to change the configuration if you use the related tasks.
#Dolphinscheduler中各個任務類型相關的系統環境變數配置,找到你可能使用到的任務類型可能使用到的服務在伺服器上的安裝路徑,配置到這裡就可以,最好在集群安裝之前配置好
#export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
#export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
#export HADOOP_CONF_DIR=etc/hadoop/conf
#export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
#export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
#export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
#export PYTHON_HOME=/opt/soft/python
#export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
#export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
#export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
#export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
#export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
#export SQOOP_HOME=${SQOOP_HOME:-/opt/soft/sqoop}
export PATH=$HADOOP_HOME/bin:$SQOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH
關閉Python 網關(預設開啟)
Python 網關服務
會預設與 api-server
一起啟動,如果不想啟動則需要更改 api-server
配置文件 /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/conf/application.yaml
中的 python-gateway.enabled : false
來禁用它。
vim ./api-server/conf/application.yaml
執行資料庫初始化腳本
#切換到資料庫腳本所在目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/sql/sql
#從SQL備份文件中還原資料庫
mysql -udolphinscheduler -p dolphinscheduler < dolphinscheduler_mysql.sql
配置數據源驅動文件
MySQL 驅動文件必須使用 JDBC Driver 8.0.16 及以上的版本,需要手動下載 mysql-connector-java 並移動到 DolphinScheduler 的每個模塊的 libs 目錄下,其中包括 5 個目錄:
/opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs
/opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs
將mysql的驅動複製到這些模塊的依賴路徑下
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/api-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/alert-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/master-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/worker-server/libs/
cp /opt/packages/mysql-connector-j-8.0.16.jar /opt/packages/apache-dolphinscheduler-3.1.9-bin/tools/libs/
當然除了mysql之外,可能還涉及SQLServer、Oracle、Hive等數據源驅動,集成方式和MySQL是一樣的, 不過最好在集群安裝之前就將需要的依賴都提前添加到對應模塊的libs目錄下, 這樣集群安裝之後就不用再處理了, 不過之後再處理數據源依賴也是可以的。
以上資料庫依賴有需要可以私信流郵箱,我看到會發給你們的。
執行集群安裝
首先,再次修改/opt/packages/apache-dolphinscheduler-3.1.9-bin
的所屬用戶和用戶組為dolphinscheduler
chmod -R dolphinscheduler:dolphinscheduler /opt/packages/apache-dolphinscheduler-3.1.9-bin
切換到dolphinscheudler
用戶
su - dolphinscheudler
切換到解壓根目錄
cd /opt/packages/apache-dolphinscheduler-3.1.9-bin
執行集群安裝腳本install.sh
./bin/install.sh
安裝腳本執行完成後, 會自動檢測集群各個節點的信息
集群啟停測試
安裝完成之後, 所有節點上Dolphinscheduler服務的預設安裝目錄都是/opt/software/dolphinscheduler
啟動之前, 確保zookeeper服務正常啟動, 否則集群無法正常啟動成功。
在hadoop01
節點上切換到dolphinscheduler
系統用戶
su - dolphinscheduler
切換到dolphinscheduler
安裝目錄
cd /opt/software/dolphinscheduler
執行集群常用操作命令
#一鍵啟動集群命令
./bin/start-all.sh
#一鍵停止集群命令
./bin/stop-all.sh
#一鍵查詢集群狀態命令
./bin/status-all.sh
訪問UI地址:http://hadoop01的IP:12345/dolphinscheduler/ui
用戶名:admin
密碼:dolphinscheduler123
OK, 至此DolphinScheduler分散式集群就搭建完成了。
本文由 白鯨開源 提供發佈支持!