Apache DolphinScheduler 使用文档(4/8):软件部署
阅读原文时间:2023年07月10日阅读:4

本文章经授权转载,原文链接:

https://blog.csdn.net/MiaoSO/article/details/104770720

目录

4. 软件部署

  • 4.1 为 dolphinscheduler 创建 Mysql 数据库

  • 4.2 解压 dolphinscheduler 安装包

  • * 4.2.1 dolphinscheduler-backend

    • 4.2.2 dolphinscheduler-ui
  • 4.3 dolphinscheduler-backend 部署

  • * 4.3.1 数据库配置

    • 4.3.2 初始化数据库

    • 4.3.3 修改环境变量配置

    • 4.3.4 修改集群部署配置

    • 4.3.5 添加 Hadoop 配置文件

    • 4.3.6 一键部署

    • 4.3.7 指令

    • 4.3.8 数据库升级(略)

  • 4.4 dolphinscheduler-ui 部署

  • * 4.4.6.1 CentOS7 安装 Nginx

    • 4.4.6.2 Nginx 指令

    • 4.4.1 dolphinscheduler-ui 部署说明

    • 4.4.2 自动部署

    • 4.4.3 手动部署

    • 4.4.4 修改上传文件大小限制

    • 4.4.5 dolphinscheduler 首次登录

    • 4.4.6 Nginx 相关


4.1 为 dolphinscheduler 创建 Mysql 数据库

CREATE DATABASE dolphinscheduler DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dscheduler'@'10.10.7.%' IDENTIFIED BY 'Ds@12345';
#GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dscheduler'@'10.158.1.%' IDENTIFIED BY 'Ds@12345';
#drop user dscheduler@'%';
flush privileges;

4.2 解压 dolphinscheduler 安装包

4.2.1 dolphinscheduler-backend

cd /opt/dolphinscheduler && tar -zxf apache-dolphinscheduler-incubating-1.2.0-dolphinscheduler-backend-bin.tar.gz
ln -s apache-dolphinscheduler-incubating-1.2.0-dolphinscheduler-backend-bin dolphinscheduler-backend

# 目录介绍
cd dolphinscheduler-backend && tree -L 1
.
├── bin           # 基础服务启动脚本
├── conf          # 项目配置文件
├── DISCLAIMER-WIP# DISCLAIMER文件
├── install.sh    # 一键部署脚本
├── lib           # 项目依赖jar包,包括各个模块jar和第三方jar
├── LICENSE       # LICENSE文件
├── licenses      # 运行时license
├── NOTICE        # NOTICE文件
├── script        # 集群启动、停止和服务监控启停脚本
└── sql           # 项目依赖sql文件

4.2.2 dolphinscheduler-ui

cd /opt/dolphinscheduler && tar -zxf apache-dolphinscheduler-incubating-1.2.0-dolphinscheduler-front-bin.tar.gz
ln -s apache-dolphinscheduler-incubating-1.2.0-dolphinscheduler-front-bin dolphinscheduler-front

###

4.3 dolphinscheduler-backend 部署

4.3.1 数据库配置

1.修改配置文件

vim /opt/dolphinscheduler/dolphinscheduler-backend/conf/application-dao.properties
# postgre
#spring.datasource.driver-class-name=org.postgresql.Driver
#spring.datasource.url=jdbc:postgresql://192.168.xx.xx:5432/dolphinscheduler
# mysql
spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://10.10.7.209:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
spring.datasource.username=dscheduler
spring.datasource.password=Ds@12345
  1. 添加 mysql 驱动

    cp /usr/share/java/mysql-connector-java.jar /opt/dolphinscheduler/dolphinscheduler-backend/lib

    cd /opt/dolphinscheduler && wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
    tar zxvf mysql-connector-java-5.1.46.tar.gz
    cp mysql-connector-java-5.1.46/mysql-connector-java-5.1.46-bin.jar /opt/dolphinscheduler/dolphinscheduler-backend/lib

4.3.2 初始化数据库

sh /opt/dolphinscheduler/dolphinscheduler-backend/script/create-dolphinscheduler.sh
# create dolphinscheduler success -> 表示数据库初始化成功

4.3.3 修改环境变量配置

vim /opt/dolphinscheduler/dolphinscheduler-backend/conf/env/.dolphinscheduler_env.sh

# ==========
# CDH 版
# ==========
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop
export SPARK_HOME1=/opt/cloudera/parcels/CDH/lib/spark
export SPARK_HOME2=/opt/cloudera/parcels/CDH/lib/spark
export PYTHON_HOME=/usr/bin/python
export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera
export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export FLINK_HOME=/opt/soft/flink
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH:$FLINK_HOME/bin:$PATH

4.3.4 修改集群部署配置

cp /opt/dolphinscheduler/dolphinscheduler-backend/install.sh /opt/dolphinscheduler/dolphinscheduler-backend/install.sh_b
vim /opt/dolphinscheduler/dolphinscheduler-backend/install.sh

# 注:以下参数仅为核心部分配置,并未包含 install.sh 脚本全部内容
......................................................
source ${workDir}/conf/config/run_config.conf
source ${workDir}/conf/config/install_config.conf

# 1. 数据库配置
# ${installPath}/conf/quartz.properties
#dbtype="postgresql"
dbtype="mysql"
dbhost="10.10.7.209"
dbname="dolphinscheduler"
username="dscheduler"
# Note: if there are special characters, please use the \ transfer character to transfer
passowrd="Ds@12345"

# 2. 集群部署环境配置
# ${installPath}/conf/config/install_config.conf
installPath="/opt/dolphinscheduler/dolphinscheduler-agent"
# deployment user
# Note: the deployment user needs to have sudo privileges and permissions to operate hdfs. If hdfs is enabled, the root directory needs to be created by itself
deployUser="dscheduler"
# zk cluster
zkQuorum="test01:2181,test02:2181,test03:2181"
# install hosts
ips="test01,test02,test03"

# 3. 各节点服务配置
# ${installPath}/conf/config/run_config.conf
# run master machine
masters="test02,test03"
# run worker machine
workers="test01,test02,test03"
# run alert machine
alertServer="test03"
# run api machine
apiServers="test03"

# 4. alert 配置
# ${installPath}/conf/alert.properties
# 若公司未开启 SSL 服务,可设置: mailServerPort="25" ; starttlsEnable="false" ; sslEnable="false"
# mail protocol
mailProtocol="SMTP"
# mail server host
mailServerHost="smtp.sohh.cn"
# mail server port
mailServerPort="465"
# sender
mailSender="dashuju@sohh.cn"
# user
mailUser="dashuju@sohh.cn"
# sender password
mailPassword="dashuju@123"
# TLS mail protocol support
starttlsEnable="false"
sslTrust="*"
# SSL mail protocol support
# note: The SSL protocol is enabled by default.
# only one of TLS and SSL can be in the true state.
sslEnable="true"
# download excel path
xlsFilePath="/tmp/xls"
# Enterprise WeChat Enterprise ID Configuration
enterpriseWechatCorpId="xxxxxxxxxx"
# Enterprise WeChat application Secret configuration
enterpriseWechatSecret="xxxxxxxxxx"
# Enterprise WeChat Application AgentId Configuration
enterpriseWechatAgentId="xxxxxxxxxx"
# Enterprise WeChat user configuration, multiple users to , split
enterpriseWechatUsers="xxxxx,xxxxx"
# alert port
alertPort=7789

# 5. 开启监控自启动脚本
# 控制是否启动自启动脚本(监控master,worker状态,如果掉线会自动启动)
# whether to start monitoring self-starting scripts
monitorServerState="true"

# 6. 资源中心配置
# ${installPath}/conf/common/ 中
# resource Center upload and select storage method:HDFS,S3,NONE
resUploadStartupType="HDFS"
# if resUploadStartupType is HDFS,defaultFS write namenode address,HA you need to put core-site.xml and hdfs-site.xml in the conf directory.
# if S3,write S3 address,HA,for example :s3a://dolphinscheduler,
# Note,s3 be sure to create the root directory /dolphinscheduler
defaultFS="hdfs://stcluster:8020"

# if S3 is configured, the following configuration is required.
s3Endpoint="http://192.168.xx.xx:9010"
s3AccessKey="xxxxxxxxxx"
s3SecretKey="xxxxxxxxxx"

# resourcemanager HA configuration, if it is a single resourcemanager, here is yarnHaIps=""
yarnHaIps="test03,test02"
# if it is a single resourcemanager, you only need to configure one host name. If it is resourcemanager HA, the default configuration is fine.
singleYarnIp="ark1"

# hdfs root path, the owner of the root path must be the deployment user.
# versions prior to 1.1.0 do not automatically create the hdfs root directory, you need to create it yourself.
hdfsPath="/dolphinscheduler"
# have users who create directory permissions under hdfs root path /
# Note: if kerberos is enabled, hdfsRootUser="" can be used directly.
hdfsRootUser="hdfs"

# 7. common 配置
# ${installPath}/conf/common/common.properties 中
# common config
# Program root path
programPath="/tmp/dolphinscheduler"
# download path
downloadPath="/tmp/dolphinscheduler/download"
# task execute path
execPath="/tmp/dolphinscheduler/exec"
# SHELL environmental variable path
shellEnvPath="$installPath/conf/env/.dolphinscheduler_env.sh"
# suffix of the resource file
resSuffixs="txt,log,sh,conf,cfg,py,java,sql,hql,xml"
# development status, if true, for the SHELL script, you can view the encapsulated SHELL script in the execPath directory.
# If it is false, execute the direct delete
devState="true"
# kerberos config
# kerberos whether to start
kerberosStartUp="false"
# kdc krb5 config file path
krb5ConfPath="$installPath/conf/krb5.conf"
# keytab username
keytabUserName="hdfs-mycluster@ESZ.COM"
# username keytab path
keytabPath="$installPath/conf/hdfs.headless.keytab"

# 8. zk 配置
# ${installPath}/conf/zookeeper.properties
# zk config
# zk root directory
zkRoot="/dolphinscheduler"
# used to record the zk directory of the hanging machine
zkDeadServers="$zkRoot/dead-servers"
# masters directory
zkMasters="$zkRoot/masters"
# workers directory
zkWorkers="$zkRoot/workers"
# zk master distributed lock
mastersLock="$zkRoot/lock/masters"
# zk worker distributed lock
workersLock="$zkRoot/lock/workers"
# zk master fault-tolerant distributed lock
mastersFailover="$zkRoot/lock/failover/masters"
# zk worker fault-tolerant distributed lock
workersFailover="$zkRoot/lock/failover/workers"
# zk master start fault tolerant distributed lock
mastersStartupFailover="$zkRoot/lock/failover/startup-masters"
# zk session timeout
zkSessionTimeout="300"
# zk connection timeout
zkConnectionTimeout="300"
# zk retry interval
zkRetrySleep="100"
# zk retry maximum number of times
zkRetryMaxtime="5"

# 9. master config
# ${installPath}/conf/master.properties
# master execution thread maximum number, maximum parallelism of process instance
masterExecThreads="100"
# the maximum number of master task execution threads, the maximum degree of parallelism for each process instance
masterExecTaskNum="20"
# master heartbeat interval
masterHeartbeatInterval="10"
# master task submission retries
masterTaskCommitRetryTimes="5"
# master task submission retry interval
masterTaskCommitInterval="100"
# master maximum cpu average load, used to determine whether the master has execution capability
masterMaxCpuLoadAvg="10"
# master reserve memory to determine if the master has execution capability
masterReservedMemory="1"
# master port
masterPort=5566

# 10. worker config
# ${installPath}/conf/worker.properties
# worker execution thread
workerExecThreads="100"
# worker heartbeat interval
workerHeartbeatInterval="10"
# worker number of fetch tasks
workerFetchTaskNum="3"
# worker reserve memory to determine if the master has execution capability
workerReservedMemory="1"
# master port
workerPort=7788

# 11. api config
# ${installPath}/conf/application.properties
# api server port
apiServerPort="12345"
# api session timeout
apiServerSessionTimeout="7200"
# api server context path
apiServerContextPath="/dolphinscheduler/"
# spring max file size
springMaxFileSize="1024MB"
# spring max request size
springMaxRequestSize="1024MB"
# api max http post size
apiMaxHttpPostSize="5000000"

# 1,replace file
echo "1,replace file"
......................................................

4.3.5 添加 Hadoop 配置文件

# 若 install.sh 中,resUploadStartupType 为 HDFS,且配置为 HA,则需拷贝 hadoop 配置文件到 conf 目录下
cp /etc/hadoop/conf.cloudera.yarn/hdfs-site.xml /opt/dolphinscheduler/dolphinscheduler-backend/conf/
cp /etc/hadoop/conf.cloudera.yarn/core-site.xml /opt/dolphinscheduler/dolphinscheduler-backend/conf/

# 若需要修改 hadoop 配置文件,则需拷贝 hadoop 配置文件到 $installPath/conf 目录下,并重启 api-server 服务
#cp /etc/hadoop/conf.cloudera.yarn/hdfs-site.xml /opt/dolphinscheduler/dolphinscheduler-agent/conf/
#cp /etc/hadoop/conf.cloudera.yarn/core-site.xml /opt/dolphinscheduler/dolphinscheduler-agent/conf/
#sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start api-server
#sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop api-server

4.3.6 一键部署

执行脚本部署并启动

sh /opt/dolphinscheduler/dolphinscheduler-backend/install.sh

查看日志

tree /opt/dolphinscheduler/dolphinscheduler/logs
-------------------------------------------------
/opt/DolphinScheduler/dolphinscheduler/logs
├── dolphinscheduler-alert.log
├── dolphinscheduler-alert-server-node-b.test.com.out
├── dolphinscheduler-alert-server.pid
├── dolphinscheduler-api-server-node-b.test.com.out
├── dolphinscheduler-api-server.log
├── dolphinscheduler-api-server.pid
├── dolphinscheduler-logger-server-node-b.test.com.out
├── dolphinscheduler-logger-server.pid
├── dolphinscheduler-master.log
├── dolphinscheduler-master-server-node-b.test.com.out
├── dolphinscheduler-master-server.pid
├── dolphinscheduler-worker.log
├── dolphinscheduler-worker-server-node-b.test.com.out
├── dolphinscheduler-worker-server.pid
└── {processDefinitionId}
    └── {processInstanceId}
        └── {taskInstanceId}.log

查看Java进程

jps
8138 MasterServer              # master服务
8165 WorkerServer              # worker服务
8206 LoggerServer              # logger服务
8240 AlertServer               # alert服务
8274 ApiApplicationServer      # api服务

Worker 启动失败

less /opt/dolphinscheduler/dolphinscheduler-agent/logs/dolphinscheduler-worker-server-test01.out
nohup: 无法运行命令"/bin/java": 没有那个文件或目录

解决方法:创建 java 软链
cd /usr/bin/ && sudo ln -s /usr/java/jdk1.8.0_181-cloudera/bin/java /usr/bin/java

4.3.7 指令

# 一键部署(含暂停、重发安装包、启动等操作)
sh /opt/dolphinscheduler/dolphinscheduler-backend/install.sh

# 一键启停集群所有服务
sh /opt/dolphinscheduler/dolphinscheduler-backend/bin/start-all.sh
sh /opt/dolphinscheduler/dolphinscheduler-backend/bin/stop-all.sh
或
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/start-all.sh
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/stop-all.sh

# 启停 Master
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start master-server
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop master-server

# 启停 Worker
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start worker-server
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop worker-server

# 启停 Api
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start api-server
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop api-server

# 启停 Logger
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start logger-server
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop logger-server

# 启停Alert
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh start alert-server
sh /opt/dolphinscheduler/dolphinscheduler-agent/bin/dolphinscheduler-daemon.sh stop alert-server

4.3.8 数据库升级(略)

# 数据库升级是在1.0.2版本增加的功能,执行以下命令即可自动升级数据库
sh /opt/dolphinscheduler/dolphinscheduler-agent/script/upgrade_dolphinscheduler.sh

4.4 dolphinscheduler-ui 部署

4.4.1 dolphinscheduler-ui 部署说明

在部署 ApiApplicationServer 的服务器上部署 UI 服务。
前端部署分自动和手动两种方式:

  • 自动部署脚本会用 yum 安装 Nginx,通过引导设置后的 Nginx 配置文件为 /etc/nginx/conf.d/dolphinscheduler.conf

  • 如果本地已经存在 Nginx,则需手动部署,创建 Nginx 配置文件 /etc/nginx/conf.d/dolphinscheduler.conf

4.4.2 自动部署

sudo sh /opt/dolphinscheduler/dolphinscheduler-front/install-dolphinscheduler-ui.sh

············
请输入nginx代理端口,不输入,则默认8888 :8886
请输入api server代理ip,必须输入,例如:192.168.xx.xx :10.10.7.209
请输入api server代理端口,不输入,则默认12345 :12345
=================================================
1.CentOS6安装
2.CentOS7安装
3.Ubuntu安装
4.退出
=================================================
请输入安装编号(1|2|3|4):2
············
Complete!
port option is needed for add
FirewallD is not running
setenforce: SELinux is disabled
请浏览器访问:http://10.10.7.209:8886

4.4.3 手动部署

vim /etc/nginx/conf.d/dolphinscheduler.conf

server {
    listen       8886;# access port
    server_name  localhost;
    #charset koi8-r;
    #access_log  /var/log/nginx/host.access.log  main;
    location / {
    root   /opt/dolphinscheduler/dolphinscheduler-front/dist; # static file directory
    index  index.html index.html;
    }
    location /dolphinscheduler {
    proxy_pass http://10.10.7.209:12345; # interface address
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header x_real_ipP $remote_addr;
    proxy_set_header remote_addr $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_http_version 1.1;
    proxy_connect_timeout 300s;
    proxy_read_timeout 300s;
    proxy_send_timeout 300s;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection upgrade;
    }
    #error_page  404              /404.html;
    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
    root   /usr/share/nginx/html;
    }
}

4.4.4 修改上传文件大小限制

sudo vim /etc/nginx/nginx.conf

# 在 http 内加入
client_max_body_size 1024m;

重启 nginx 服务

systemctl restart nginx

4.4.5 dolphinscheduler 首次登录

访问 http://10.10.7.209:8886
初始用户:admin
初始密码:dolphinscheduler123
注:若访问网址提示 404,则删除 /etc/nginx/conf.d/default.conf 文件

4.4.6 Nginx 相关

4.4.6.1 CentOS7 安装 Nginx
rpm -Uvh http://nginx.org/packages/centos/7/noarch/RPMS/nginx-release-centos-7-0.el7.ngx.noarch.rpm
yum install nginx
systemctl start nginx.service
4.4.6.2 Nginx 指令
# 启动
systemctl start nginx
# 重启
systemctl restart nginx
# 状态
systemctl status nginx
# 停止
systemctl stop nginx

文章目录:

DS 1.2.0 使用文档(1/8):架构及名词解释
DS 1.2.0 使用文档(2-3/8):集群规划及环境准备
DS 1.2.0 使用文档(4/8):软件部署
DS 1.2.0 使用文档(5/8):使用与测试
DS 1.2.0 使用文档(6/8):任务节点类型与任务参数设置
DS 1.2.0 使用文档(7/8):系统参数及自定义参数
DS 1.2.0 使用文档(8/8):附录