Oozie分散式任務的工作流——郵件篇

来源:http://www.cnblogs.com/xing901022/archive/2016/11/17/6075665.html
-Advertisement-
Play Games

在大數據的當下,各種spark和hadoop的框架層出不窮。各種高端的計算框架,分散式任務如亂花般迷眼。你是否有這種困惑!——有了許多的分散式任務,但是每天需要固定時間跑任務,自己寫個調度,既不穩定,又沒有可靠的通知。 想要瞭解 "Oozie的基礎知識,可以參考這裡" 那麼你應該是在找——Oozie ...


在大數據的當下,各種spark和hadoop的框架層出不窮。各種高端的計算框架,分散式任務如亂花般迷眼。你是否有這種困惑!——有了許多的分散式任務,但是每天需要固定時間跑任務,自己寫個調度,既不穩定,又沒有可靠的通知。

想要瞭解Oozie的基礎知識,可以參考這裡

那麼你應該是在找——Oozie。

Oozie是一款支持分散式任務調度的開源框架,它支持很多的分散式任務,比如map reduce,spark,sqoop,pig甚至shell等等。你可以以各種方式調度它們,把它們組成工作流。每個工作流節點可以串列也可以並行執行。

如果你定義好了一系列的任務,就可以開啟工作流,設置一個coordinator調度器進行定時的調度了。

有了這些工作以後,還需要一個很重要的環節—— 就是郵件提醒。不管是任務執行成功還是失敗,都可以發送郵件提醒。這樣每天晚上收到任務成功的消息,就可以安心睡覺了。

因此,本篇就帶你來看看如何在Oozie中使用Email。

郵箱服務

Email Action

在Oozie中每個工作流的環節都被設計成一個Action,email就是其中的一個Action.

Email action可以在oozie中發送信息,在email action中必須指定接收的地址,主題subject和內容body。在接收地址參數中支持使用逗號分隔,添加多個郵箱地址。

email action是同步執行的,因此必須等到郵件發出後,這個action才算完成,才能執行下一個action。

email action裡面的所有參數都可以使用EL表達式。

語法規則

<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="[NODE-NAME]">
        <email xmlns="uri:oozie:email-action:0.2">
            <to>[COMMA-SEPARATED-TO-ADDRESSES]</to>
            <cc>[COMMA-SEPARATED-CC-ADDRESSES]</cc> <!-- cc is optional -->
            <subject>[SUBJECT]</subject>
            <body>[BODY]</body>
            <content_type>[CONTENT-TYPE]</content_type> <!-- content_type is optional -->
            <attachment>[COMMA-SEPARATED-HDFS-FILE-PATHS]</attachment> <!-- attachment is optional -->
        </email>
        <ok to="[NODE-NAME]"/>
        <error to="[NODE-NAME]"/>
    </action>
    ...
</workflow-app>

to和cc命令指定了誰來接收郵件。可以通過逗號分隔來指定多個郵箱地址。to是必填項,cc是可選的。

主題subject和正文body用於指定郵件的標題和正文,email-action:0.2支持text/html這種格式的正文,預設是普通的文本"text/plain"

attachment用於在郵件中添加一個hdfs文件的附件,也可以通過逗號分隔符指定多個附件。如果路徑聲明的不全,那麼也會被當做hdfs中的文件。本地文件是不能添加到附件中的。

配置

email action需要在oozie-site.xml中配置SMTP伺服器配置。下麵是需要配置的值:

oozie.email.smtp.host

這個值是SMTP伺服器的地址,預設是loalhost

oozie.email.smtp.port

是SMTP伺服器的埠號,預設是25.

oozie.email.from.address

發送郵件的地址,預設是oozie@localhost

oozie.email.smtp.auth

是否開啟認證,預設不開啟

oozie.email.smtp.username

如果開啟認證,登錄的用戶名,預設是空

oozie.email.smtp.password

如果開啟認證,用戶對應的密碼,預設是空

PS. 在linux可以通過find -name oozie-site.xml在當前目錄下查找。在我們的CDH版本中這個文件在./etc/oozie/conf.dist/oozie-site.xml

樣例

<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    ...
    <action name="an-email">
        <email xmlns="uri:oozie:email-action:0.1">
            <to>[email protected],[email protected]</to>
            <cc>[email protected]</cc>
            <subject>Email notifications for ${wf:id()}</subject>
            <body>The wf ${wf:id()} successfully completed.</body>
        </email>
        <ok to="myotherjob"/>
        <error to="errorcleanup"/>
    </action>
    ...
</workflow-app>

上面的例子中,郵件發給了bob,the.other.bob以及抄送給will,並指定了郵件的標題和正文以及workflow的id。

附錄

為了更多的瞭解Oozie,這裡直接給出了Oozie相關的重要配置

oozie-site.xml配置

<?xml version="1.0"?>
<configuration>
    <!--oozie-default.xml文件是預設的配置-->
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
        <value>*</value>
    </property>
</configuration>

oozie-defualt.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one
  or more contributor license agreements.  See the NOTICE file
  distributed with this work for additional information
  regarding copyright ownership.  The ASF licenses this file
  to you under the Apache License, Version 2.0 (the
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<configuration>

    <!-- ************************** VERY IMPORTANT  ************************** -->
    <!-- This file is in the Oozie configuration directory only for reference. -->
    <!-- It is not loaded by Oozie, Oozie uses its own privatecopy.            -->
    <!-- ************************** VERY IMPORTANT  ************************** -->

    <property>
        <name>oozie.output.compression.codec</name>
        <value>gz</value>
        <description>
            The name of the compression codec to use.
            where codec class implements the interface org.apache.oozie.compression.CompressionCodec.
            If oozie.compression.codecs is not specified, gz codec implementation is used by default.
        </description>
    </property>

    <property>
        <name>oozie.action.mapreduce.uber.jar.enable</name>
        <value>false</value>
        <description>
            which specify the oozie.mapreduce.uber.jar configuration property will fail.
        </description>
    </property>

    <property>
        <name>oozie.processing.timezone</name>
        <value>UTC</value>
        <description>
            is changed, note that GMT(+/-)#### timezones do not observe DST changes.
        </description>
    </property>

    <!-- Base Oozie URL: <SCHEME>://<HOST>:<PORT>/<CONTEXT> -->

    <property>
        <name>oozie.base.url</name>
        <value>http://localhost:8080/oozie</value>
        <description>
             Base Oozie URL.
        </description>
    </property>

    <!-- Services -->

    <property>
        <name>oozie.system.id</name>
        <value>oozie-${user.name}</value>
        <description>
            The Oozie system ID.
        </description>
    </property>

    <property>
        <name>oozie.systemmode</name>
        <value>NORMAL</value>
        <description>
            System mode for  Oozie at startup.
        </description>
    </property>

    <property>
        <name>oozie.delete.runtime.dir.on.shutdown</name>
        <value>true</value>
        <description>
            If the runtime directory should be kept after Oozie shutdowns down.
        </description>
    </property>

    <property>
        <name>oozie.services</name>
        <value>
            org.apache.oozie.service.SchedulerService,
            org.apache.oozie.service.InstrumentationService,
            org.apache.oozie.service.MemoryLocksService,
            org.apache.oozie.service.UUIDService,
            org.apache.oozie.service.ELService,
            org.apache.oozie.service.AuthorizationService,
            org.apache.oozie.service.UserGroupInformationService,
            org.apache.oozie.service.HadoopAccessorService,
/email
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            Comma separated AUTHORITY=SPARK_CONF_DIR, where AUTHORITY is the HOST:PORT of
            the ResourceManager of a YARN cluster. The wildcard '*' configuration is
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
@                                                                                                      
search hit BOTTOM, continuing at TOP
            IMPORTANT: if the StoreServicePasswordService is active, it will reset this value with the
value given in
                       the console.
        </description>
    </property>

    <property>
        <name>oozie.service.JPAService.pool.max.active.conn</name>
        <value>10</value>
        <description>
             Max number of connections.
        </description>
    </property>

   <!-- SchemaService -->

    <property>
        <name>oozie.service.SchemaService.wf.schemas</name>
        <value>
            oozie-workflow-0.1.xsd,oozie-workflow-0.2.xsd,oozie-workflow-0.2.5.xsd,oozie-workflow-0.3.x
sd,oozie-workflow-0.4.xsd,
            oozie-workflow-0.4.5.xsd,oozie-workflow-0.5.xsd,
            shell-action-0.1.xsd,shell-action-0.2.xsd,shell-action-0.3.xsd,
            email-action-0.1.xsd,email-action-0.2.xsd,
            hive-action-0.2.xsd,hive-action-0.3.xsd,hive-action-0.4.xsd,hive-action-0.5.xsd,hive-action
-0.6.xsd,
            sqoop-action-0.2.xsd,sqoop-action-0.3.xsd,sqoop-action-0.4.xsd,
            ssh-action-0.1.xsd,ssh-action-0.2.xsd,
            distcp-action-0.1.xsd,distcp-action-0.2.xsd,
            oozie-sla-0.1.xsd,oozie-sla-0.2.xsd,
            hive2-action-0.1.xsd, hive2-action-0.2.xsd,
            spark-action-0.1.xsd,spark-action-0.2.xsd
        </value>
        <description>
            List of schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.wf.ext.schemas</name>
        <value> </value>
        <description>
            List of additional schemas for workflows (separated by commas).
        </description>
    </property>

    <property>
        <name>oozie.service.SchemaService.coord.schemas</name>
/email
        <description>
             Base console URL for a workflow job.
        </description>
    </property>


    <!-- ActionService -->

    <property>
        <name>oozie.service.ActionService.executor.classes</name>
        <value>
            org.apache.oozie.action.decision.DecisionActionExecutor,
            org.apache.oozie.action.hadoop.JavaActionExecutor,
            org.apache.oozie.action.hadoop.FsActionExecutor,
            org.apache.oozie.action.hadoop.MapReduceActionExecutor,
            org.apache.oozie.action.hadoop.PigActionExecutor,
            org.apache.oozie.action.hadoop.HiveActionExecutor,
            org.apache.oozie.action.hadoop.ShellActionExecutor,
            org.apache.oozie.action.hadoop.SqoopActionExecutor,
            org.apache.oozie.action.hadoop.DistcpActionExecutor,
            org.apache.oozie.action.hadoop.Hive2ActionExecutor,
            org.apache.oozie.action.ssh.SshActionExecutor,
            org.apache.oozie.action.oozie.SubWorkflowActionExecutor,
            org.apache.oozie.action.email.EmailActionExecutor,
            org.apache.oozie.action.hadoop.SparkActionExecutor
        </value>
        <description>
            List of ActionExecutors classes (separated by commas).
            Only action types with associated executors can be used in workflows.
        </description>
    </property>

    <property>
        <name>oozie.service.ActionService.executor.ext.classes</name>
        <value> </value>
        <description>
            List of ActionExecutors extension classes (separated by commas). Only action types with ass
ociated
            executors can be used in workflows. This property is a convenience property to add extensio
ns to the built
            in executors without having to include all the built in ones.
        </description>
    </property>

    <!-- ActionCheckerService -->

    <property>
        <name>oozie.service.ActionCheckerService.action.check.interval</name>
/email
        <description>
            used when there is no exact match for an authority. The SPARK_CONF_DIR contains
            the relevant spark-defaults.conf properties file. If the path is relative is looked within
            the Oozie configuration directory; though the path can be absolute.  This is only used
            when the Spark master is set to either "yarn-client" or "yarn-cluster".
        </description>
    </property>

    <property>
        <name>oozie.service.SparkConfigurationService.spark.configurations.ignore.spark.yarn.jar</name>
        <value>true</value>
        <description>
            If true, Oozie will ignore the "spark.yarn.jar" property from any Spark configurations spec
ified in
            oozie.service.SparkConfigurationService.spark.configurations.  If false, Oozie will not ign
ore it.  It is recommended
            to leave this as true because it can interfere with the jars in the Spark sharelib.
        </description>
    </property>

    <property>
        <name>oozie.email.attachment.enabled</name>
        <value>true</value>
        <description>
            This value determines whether to support email attachment of a file on HDFS.
            Set it false if there is any security concern.
        </description>
    </property>

    <property>
        <name>oozie.actions.default.name-node</name>
        <value> </value>
        <description>
            The default value to use for the &lt;name-node&gt; element in applicable action types.  Thi
s value will be used when
            neither the action itself nor the global section specifies a &lt;name-node&gt;.  As expecte
d, it should be of the form
            "hdfs://HOST:PORT".
        </description>
    </property>

    <property>
        <name>oozie.actions.default.job-tracker</name>
        <value> </value>
        <description>
            The default value to use for the &lt;job-tracker&gt; element in applicable action types.  T
his value will be used when
            neither the action itself nor the global section specifies a &lt;job-tracker&gt;.  As expec
ted, it should be of the form
            "HOST:PORT".
        </description>
    </property>

</configuration>

您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • Redis Desktop Manager是Redis圖形化管理工具,方便管理人員更方便直觀地管理Redis數據。 然而在使用Redis Desktop Manager之前,有幾個要素需要註意: 一、註釋redis.conf文件中的:bind 127.0.0.1(在一段文字之前打#號為註釋) 二、設 ...
  • mysql 官方客戶端 MySQL-Workbench 下載鏈接 http://dev.mysql.com/downloads/workbench/ 具體安裝步驟就不寫了,直接一直下一步就可以了。 下麵說一下基礎操作: 登錄成功後,界面如下所示。其中,區域1顯示的是資料庫伺服器中已經創建的資料庫列表 ...
  • 前面的話 由於編碼錯誤,造成的資料庫中文識別成亂碼或問號的問題非常常見,本文將詳細說明解決辦法 配置文件 解決中文識別問題的第一步是修改mysql的配置文件my.ini 在[client]下添加 在[mysqld]下添加 然後重新啟動服務 資料庫編碼 首先,新建一個資料庫 通過下列代碼查看資料庫的編 ...
  • 由於SQL優化優化起來比較複雜,並且還受環境限制,在開發過程中,寫SQL必須遵循以下幾點原則: 1.Oracle 採用自下而上的順序解析WHERE子句,根據這個原理,表之間的連接必須寫在其他Where條件之前,那些可以過濾掉最大數量記錄的條件必須寫在Where子句的末尾. 2.Select 語句避免 ...
  • 最近一直在跟Oracle打交道,從最初的一臉懵逼到現在的略有所知,也來總結一下自己最近所學,不定時更新ing… 一:什麼是Oracle執行計劃? 執行計劃是一條查詢語句在Oracle中的執行過程或訪問路徑的描述 二:怎樣查看Oracle執行計劃? 因為我一直用的PLSQL遠程連接的公司資料庫,所以這 ...
  • SQL server基礎知識 一、基礎知識 (1)、存儲結構:資料庫->表->數據 (2)、管理資料庫 增加:create database 資料庫名稱 刪除:drop database 資料庫名稱 查詢:select name from master..sysdatabases 修改:alter ...
  • [1]插入記錄 [2]更新記錄 [3]刪除記錄 [4]查詢表達式 [5]結果處理 ...
  • 環境:CentOS6.8 Mongodb3.2.10 啟動 啟動mongoDB伺服器 # service mongod start 啟動mongoDB客戶端 # mongo 該客戶端是一個JavaScript shell,我們可以直接在裡面寫js的代碼,並執行。該客戶端在安裝mongoDB時是可選的... ...
一周排行
    -Advertisement-
    Play Games
  • 移動開發(一):使用.NET MAUI開發第一個安卓APP 對於工作多年的C#程式員來說,近來想嘗試開發一款安卓APP,考慮了很久最終選擇使用.NET MAUI這個微軟官方的框架來嘗試體驗開發安卓APP,畢竟是使用Visual Studio開發工具,使用起來也比較的順手,結合微軟官方的教程進行了安卓 ...
  • 前言 QuestPDF 是一個開源 .NET 庫,用於生成 PDF 文檔。使用了C# Fluent API方式可簡化開發、減少錯誤並提高工作效率。利用它可以輕鬆生成 PDF 報告、發票、導出文件等。 項目介紹 QuestPDF 是一個革命性的開源 .NET 庫,它徹底改變了我們生成 PDF 文檔的方 ...
  • 項目地址 項目後端地址: https://github.com/ZyPLJ/ZYTteeHole 項目前端頁面地址: ZyPLJ/TreeHoleVue (github.com) https://github.com/ZyPLJ/TreeHoleVue 目前項目測試訪問地址: http://tree ...
  • 話不多說,直接開乾 一.下載 1.官方鏈接下載: https://www.microsoft.com/zh-cn/sql-server/sql-server-downloads 2.在下載目錄中找到下麵這個小的安裝包 SQL2022-SSEI-Dev.exe,運行開始下載SQL server; 二. ...
  • 前言 隨著物聯網(IoT)技術的迅猛發展,MQTT(消息隊列遙測傳輸)協議憑藉其輕量級和高效性,已成為眾多物聯網應用的首選通信標準。 MQTTnet 作為一個高性能的 .NET 開源庫,為 .NET 平臺上的 MQTT 客戶端與伺服器開發提供了強大的支持。 本文將全面介紹 MQTTnet 的核心功能 ...
  • Serilog支持多種接收器用於日誌存儲,增強器用於添加屬性,LogContext管理動態屬性,支持多種輸出格式包括純文本、JSON及ExpressionTemplate。還提供了自定義格式化選項,適用於不同需求。 ...
  • 目錄簡介獲取 HTML 文檔解析 HTML 文檔測試參考文章 簡介 動態內容網站使用 JavaScript 腳本動態檢索和渲染數據,爬取信息時需要模擬瀏覽器行為,否則獲取到的源碼基本是空的。 本文使用的爬取步驟如下: 使用 Selenium 獲取渲染後的 HTML 文檔 使用 HtmlAgility ...
  • 1.前言 什麼是熱更新 游戲或者軟體更新時,無需重新下載客戶端進行安裝,而是在應用程式啟動的情況下,在內部進行資源或者代碼更新 Unity目前常用熱更新解決方案 HybridCLR,Xlua,ILRuntime等 Unity目前常用資源管理解決方案 AssetBundles,Addressable, ...
  • 本文章主要是在C# ASP.NET Core Web API框架實現向手機發送驗證碼簡訊功能。這裡我選擇是一個互億無線簡訊驗證碼平臺,其實像阿裡雲,騰訊雲上面也可以。 首先我們先去 互億無線 https://www.ihuyi.com/api/sms.html 去註冊一個賬號 註冊完成賬號後,它會送 ...
  • 通過以下方式可以高效,並保證數據同步的可靠性 1.API設計 使用RESTful設計,確保API端點明確,並使用適當的HTTP方法(如POST用於創建,PUT用於更新)。 設計清晰的請求和響應模型,以確保客戶端能夠理解預期格式。 2.數據驗證 在伺服器端進行嚴格的數據驗證,確保接收到的數據符合預期格 ...