大數據Hadoop之——搭建本地flink開發環境詳解(window10)

来源:https://www.cnblogs.com/liugp/archive/2022/05/08/16245546.html
-Advertisement-
Play Games

一、下載安裝IDEA IDEA2020.2.3版本:https://www.cnblogs.com/liugp/p/13868346.html 最新版本安裝詳情請參考:https://www.jb51.net/article/196349.htm 二、搭建本地hadoop環境(window10) 可 ...


目錄

一、下載安裝IDEA

IDEA2020.2.3版本:https://www.cnblogs.com/liugp/p/13868346.html
最新版本安裝詳情請參考:https://www.jb51.net/article/196349.htm

二、搭建本地hadoop環境(window10)

可以看我之前的文章:大數據Hadoop之——部署hadoop+hive環境(window10環境)
當然也可以部署在linux系統上,遠程連接,可以參考以下兩篇文章:
大數據Hadoop原理介紹+安裝+實戰操作(HDFS+YARN+MapReduce)
大數據Hadoop之——數據倉庫Hive

三、安裝Maven

可以看我之前的文章:Java-Maven詳解

四、新建項目和模塊

1)新建maven項目

因為之前我創建過了,所以會標紅

把自動生成的src刪掉,以後是通過模塊來管理項目,因為一個項目一般會包含很多模塊。

2)新建flink模塊


目錄結構,新建沒有的目錄

設置目錄屬性

因為之前創建過項目,所以這裡創建一個新項目來演示:bigdata-test2023

五、配置IDEA環境(scala)

1)下載安裝scala插件

File-》Settings

intellij IDEA本來是不能開發Scala程式的,但是通過配置是可以的,我之前已經裝過了,沒裝過的小伙伴,點擊這裡安裝即可。

2)配置scala插件到模塊或者全局環境




添加完scala插件之後就可以創建scala項目了

3)創建scala項目

創建Object類

【溫馨提示】類只會被編譯,不能直接被執行。

4)DataStream API配置

1、Maven配置

在flink模塊目錄下pom.xml配置如下內容:

【溫馨提示】這裡的scala版本要與上面插件版本一致

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-scala_2.12</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>

<!-- https://mvnrepository.com/artifact/org.apache.flink/flink-streaming-scala -->
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-scala_2.12</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-scala_2.12</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>

【問題】IDEA 在使用Maven項目時,未載入 provided 範圍的依賴包,導致啟動時報錯
【原因】就是 Run Application時,IDEA未載入 provided 範圍的依賴包,導致啟動時報錯,這是IDEA的bug
【解決】在IDEA中設置

2、示例演示

官網示例

package com
import org.apache.flink.streaming.api.scala._
import org.apache.flink.streaming.api.windowing.assigners.TumblingProcessingTimeWindows
import org.apache.flink.streaming.api.windowing.time.Time

object WindowWordCount {
  def main(args: Array[String]) {

    val env = StreamExecutionEnvironment.getExecutionEnvironment
    val text = env.socketTextStream("localhost", 9999)

    val counts = text.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
      .map { (_, 1) }
      .keyBy(_._1)
      .window(TumblingProcessingTimeWindows.of(Time.seconds(5)))
      .sum(1)

    counts.print()

    env.execute("Window Stream WordCount")
  }
}

在命令行起一個9999埠的服務

$ nc -lk 9999

運行測試

5)Table API & SQL配置

1、Maven配置

<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-planner_2.12</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-streaming-scala_2.12</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>
<dependency>
	<groupId>org.apache.flink</groupId>
	<artifactId>flink-table-common</artifactId>
	<version>1.14.3</version>
	<scope>provided</scope>
</dependency>

2、示例演示

這裡使用filesystem,不需要引用相應得maven配置,像kafka,ES等連接器是需要引入相應的maven配置,但是這裡使用到了format csv,所以得引入相應得配置,配置如下:

更多連接器的介紹,你看官方文檔

<!-- format csv 下麵會用到-->
<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-csv</artifactId>
    <version>1.14.3</version>
</dependency>

源碼

package com

import org.apache.flink.table.api._

object TableSQL {
  def main(args: Array[String]): Unit = {
    val settings = EnvironmentSettings.inStreamingMode()
    val tableEnv = TableEnvironment.create(settings)

    // create an output Table
    val schema = Schema.newBuilder()
      .column("a", DataTypes.STRING())
      .column("b", DataTypes.STRING())
      .column("c", DataTypes.STRING())
      .build()

    tableEnv.createTemporaryTable("CsvSourceTable", TableDescriptor.forConnector("filesystem")
      .schema(schema)
      .option("path", "flink/data/source")
      .format(FormatDescriptor.forFormat("csv")
        .option("field-delimiter", "|")
        .build())
      .build())

    tableEnv.createTemporaryTable("CsvSinkTable", TableDescriptor.forConnector("filesystem")
      .schema(schema)
      .option("path", "flink/data/")
      .format(FormatDescriptor.forFormat("csv")
        .option("field-delimiter", "|")
        .build())
      .build())

    // 創建一個查詢語句
    val sourceTable = tableEnv.sqlQuery("SELECT * FROM CsvSourceTable limit 2")

    // 將查詢到的數據轉到下游存儲
    sourceTable.executeInsert("CsvSinkTable")
  }
}

6)HiveCatalog

1、Maven配置

  • 基礎配置
<!-- Flink Dependency -->
<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-connector-hive_2.11</artifactId>
  <version>1.14.3</version>
  <scope>provided</scope>
</dependency>

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-table-api-java-bridge_2.11</artifactId>
  <version>1.14.3</version>
  <scope>provided</scope>
</dependency>

<!-- Hive Dependency -->
<dependency>
    <groupId>org.apache.hive</groupId>
    <artifactId>hive-exec</artifactId>
    <version>3.1.2</version>
    <scope>provided</scope>
</dependency>

【溫馨提示】在IDEA中scope設置provided的時候,必須對應的運行文件設置載入provided的依賴到classpath

  • Log4j2 配置(log4j2.xml)
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
        </Console>

        <RollingFile name="RollingFile" filename="log/test.log"
                     filepattern="${logPath}/%d{YYYYMMddHHmmss}-fargo.log">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
            <Policies>
                <SizeBasedTriggeringPolicy size="10 MB" />
            </Policies>
            <DefaultRolloverStrategy max="20" />
        </RollingFile>

    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
            <AppenderRef ref="RollingFile" />
        </Root>
    </Loggers>
</Configuration>

  • 配置hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

    <!-- 所連接的 MySQL 資料庫的地址,hive_remote2是資料庫,程式會自動創建,自定義就行 -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=Asia/Shanghai</value>
    </property>

    <!-- MySQL 驅動 -->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL JDBC driver class</description>
    </property>

    <!-- mysql連接用戶 -->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>user name for connecting to mysql server</description>
    </property>

    <!-- mysql連接密碼 -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password for connecting to mysql server</description>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
        <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
    </property>

    <!-- host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>localhost</value>
        <description>Bind host on which to run the HiveServer2 Thrift service.</description>
    </property>

    <!-- hs2埠 預設是1000,為了區別,我這裡不使用預設埠-->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10001</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>true</value>
    </property>

</configuration>

【溫馨提示】必須啟動metastore和hiveserver2服務,不清楚的小伙拍可以參考我之前的文章:大數據Hadoop之——部署hadoop+hive環境(window10環境)

$ hive --service metastore
$ hive --service hiveserver2

2、Hadoop與Hive Guava衝突問題

【問題】Hadoop和hive-exec-3.1.2的Guava的版本衝突導致Flink任務啟動異常
【解決】刪掉%HIVE_HOME%\lib目錄下的guava-19.0.jar,再把%HADOOP_HOME%\share\hadoop\common\lib\guava-27.0-jre.jar複製到%HIVE_HOME%\lib目錄下。

3、示例演示

package com

import org.apache.flink.table.api.{EnvironmentSettings, TableEnvironment}
import org.apache.flink.table.catalog.hive.HiveCatalog

object HiveCatalogTest {
  def main(args: Array[String]): Unit = {
    val settings = EnvironmentSettings.inStreamingMode()
    val tableEnv = TableEnvironment.create(settings)
    val name            = "myhive"
    val defaultDatabase = "default"
    val hiveConfDir     = "flink/data/"
    val hive = new HiveCatalog(name, defaultDatabase, hiveConfDir)
    // 註冊catalog,會話結束自動消失
    tableEnv.registerCatalog("myhive", hive)
    // 顯示有多少個catalog
    tableEnv.executeSql("show catalogs").print()
    // 切換到myhive 的catalog
    tableEnv.useCatalog("myhive")
    // 創建庫,已經持久化到hive了,會話結束依然存在
    tableEnv.executeSql("CREATE DATABASE IF NOT EXISTS mydatabase")
    // 顯示有多少個database
    tableEnv.executeSql("show databases").print()
    // 切換資料庫
    tableEnv.useDatabase("mydatabase")
    // 切換表
    tableEnv.executeSql("CREATE TABLE IF NOT EXISTS user_behavior (\n  user_id BIGINT,\n  item_id BIGINT,\n  category_id BIGINT,\n  behavior STRING,\n  ts TIMESTAMP(3)\n) WITH (\n 'connector' = 'kafka',\n 'topic' = 'user_behavior',\n 'properties.bootstrap.servers' = 'hadoop-node1:9092',\n 'properties.group.id' = 'testGroup',\n 'format' = 'json',\n 'json.fail-on-missing-field' = 'false',\n 'json.ignore-parse-errors' = 'true'\n)")
    tableEnv.executeSql("show tables").print()

  }
}

看下麵通過hive客戶端連接查看上面程式創建的庫和表,依然是存在的

從上面驗證顯示,一切ok,記得開發的時候引入連接器的時候需要引入對應的maven配置

7)下載flink並本地啟動集群(window)

下載地址:https://flink.apache.org/downloads.html

flink-1.14.3:https://dlcdn.apache.org/flink/flink-1.14.3/flink-1.14.3-bin-scala_2.12.tgz
【溫馨提示】在新版中start-cluster.cmd和flink.cmd已經找不到了,但是可以從以前的版本中複製過來。下載下麵的老版本
flink-1.9.1:https://archive.apache.org/dist/flink/flink-1.9.1/flink-1.9.1-bin-scala_2.11.tgz

其實主要從flink-1.9.1中copy以下兩個文件到新版本中

下載比較慢,所以我這裡還是提供一下這兩個文件

  • flink.cmd
::###############################################################################
::  Licensed to the Apache Software Foundation (ASF) under one
::  or more contributor license agreements.  See the NOTICE file
::  distributed with this work for additional information
::  regarding copyright ownership.  The ASF licenses this file
::  to you under the Apache License, Version 2.0 (the
::  "License"); you may not use this file except in compliance
::  with the License.  You may obtain a copy of the License at
::
::      http://www.apache.org/licenses/LICENSE-2.0
::
::  Unless required by applicable law or agreed to in writing, software
::  distributed under the License is distributed on an "AS IS" BASIS,
::  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
::  See the License for the specific language governing permissions and
:: limitations under the License.
::###############################################################################

@echo off
setlocal

SET bin=%~dp0
SET FLINK_HOME=%bin%..
SET FLINK_LIB_DIR=%FLINK_HOME%\lib
SET FLINK_PLUGINS_DIR=%FLINK_HOME%\plugins

SET JVM_ARGS=-Xmx512m

SET FLINK_JM_CLASSPATH=%FLINK_LIB_DIR%\*

java %JVM_ARGS% -cp "%FLINK_JM_CLASSPATH%"; org.apache.flink.client.cli.CliFrontend %*

endlocal

  • start-cluster.bat
::###############################################################################
::  Licensed to the Apache Software Foundation (ASF) under one
::  or more contributor license agreements.  See the NOTICE file
::  distributed with this work for additional information
::  regarding copyright ownership.  The ASF licenses this file
::  to you under the Apache License, Version 2.0 (the
::  "License"); you may not use this file except in compliance
::  with the License.  You may obtain a copy of the License at
::
::      http://www.apache.org/licenses/LICENSE-2.0
::
::  Unless required by applicable law or agreed to in writing, software
::  distributed under the License is distributed on an "AS IS" BASIS,
::  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
::  See the License for the specific language governing permissions and
:: limitations under the License.
::###############################################################################

@echo off
setlocal EnableDelayedExpansion

SET bin=%~dp0
SET FLINK_HOME=%bin%..
SET FLINK_LIB_DIR=%FLINK_HOME%\lib
SET FLINK_PLUGINS_DIR=%FLINK_HOME%\plugins
SET FLINK_CONF_DIR=%FLINK_HOME%\conf
SET FLINK_LOG_DIR=%FLINK_HOME%\log

SET JVM_ARGS=-Xms1024m -Xmx1024m

SET FLINK_CLASSPATH=%FLINK_LIB_DIR%\*

SET logname_jm=flink-%username%-jobmanager.log
SET logname_tm=flink-%username%-taskmanager.log
SET log_jm=%FLINK_LOG_DIR%\%logname_jm%
SET log_tm=%FLINK_LOG_DIR%\%logname_tm%
SET outname_jm=flink-%username%-jobmanager.out
SET outname_tm=flink-%username%-taskmanager.out
SET out_jm=%FLINK_LOG_DIR%\%outname_jm%
SET out_tm=%FLINK_LOG_DIR%\%outname_tm%

SET log_setting_jm=-Dlog.file="%log_jm%" -Dlogback.configurationFile=file:"%FLINK_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%FLINK_CONF_DIR%/log4j.properties"
SET log_setting_tm=-Dlog.file="%log_tm%" -Dlogback.configurationFile=file:"%FLINK_CONF_DIR%/logback.xml" -Dlog4j.configuration=file:"%FLINK_CONF_DIR%/log4j.properties"

:: Log rotation (quick and dirty)
CD "%FLINK_LOG_DIR%"
for /l %%x in (5, -1, 1) do ( 
SET /A y = %%x+1 
RENAME "%logname_jm%.%%x" "%logname_jm%.!y!" 2> nul
RENAME "%logname_tm%.%%x" "%logname_tm%.!y!" 2> nul
RENAME "%outname_jm%.%%x" "%outname_jm%.!y!"  2> nul
RENAME "%outname_tm%.%%x" "%outname_tm%.!y!"  2> nul
)
RENAME "%logname_jm%" "%logname_jm%.0"  2> nul
RENAME "%logname_tm%" "%logname_tm%.0"  2> nul
RENAME "%outname_jm%" "%outname_jm%.0"  2> nul
RENAME "%outname_tm%" "%outname_tm%.0"  2> nul
DEL "%logname_jm%.6"  2> nul
DEL "%logname_tm%.6"  2> nul
DEL "%outname_jm%.6"  2> nul
DEL "%outname_tm%.6"  2> nul

for %%X in (java.exe) do (set FOUND=%%~$PATH:X)
if not defined FOUND (
    echo java.exe was not found in PATH variable
    goto :eof
)

echo Starting a local cluster with one JobManager process and one TaskManager process.

echo You can terminate the processes via CTRL-C in the spawned shell windows.

echo Web interface by default on http://localhost:8081/.

start java %JVM_ARGS% %log_setting_jm% -cp "%FLINK_CLASSPATH%"; org.apache.flink.runtime.entrypoint.StandaloneSessionClusterEntrypoint --configDir "%FLINK_CONF_DIR%" > "%out_jm%" 2>&1
start java %JVM_ARGS% %log_setting_tm% -cp "%FLINK_CLASSPATH%"; org.apache.flink.runtime.taskexecutor.TaskManagerRunner --configDir "%FLINK_CONF_DIR%" > "%out_tm%" 2>&1

endlocal

啟動flink集群很簡單,只要雙擊start-cluster.bat

通過sql客戶端驗證一下

$ SELECT 'Hello World';

【錯誤】NoResourceAvailableException: Could not acquire the minimum required resources
【解決】是因為資源太小,不足以跑任務,擴大配置,修改如下配置:

jobmanager.memory.process.size: 3200m

taskmanager.memory.process.size: 2728m

taskmanager.memory.flink.size: 2280m

但是我這裡調大了還是太小了,自己電腦配置有限,如果有小伙伴的配置高,可以再調大驗證一下。

8)完成版配置

1、maven配置

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>bigdata-test2023</artifactId>
        <groupId>com.bigdata.test2023</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>flink</artifactId>

    <!-- DataStream API maven settings begin -->
    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-scala_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>1.14.3</version>
        </dependency>
        <!-- DataStream API maven settings end -->

        <!-- Table and SQL maven settings begin-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>
        <!-- 上面已經設置過了 -->
        <!--<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-scala_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.14.3</version>
        </dependency>
        <!-- Table and SQL maven settings end-->

        <!-- Hive Catalog maven settings begin -->
        <!-- Flink Dependency -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hive_2.11</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_2.11</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <!-- Hive Dependency -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.2</version>
            <scope>provided</scope>
        </dependency>

        <!-- Hive Catalog maven settings end -->


        <!--hadoop start-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <!--hadoop end-->

    </dependencies>

</project>

2、log4j2.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
        </Console>

        <RollingFile name="RollingFile" filename="log/test.log"
                     filepattern="${logPath}/%d{YYYYMMddHHmmss}-fargo.log">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
            <Policies>
                <SizeBasedTriggeringPolicy size="10 MB" />
            </Policies>
            <DefaultRolloverStrategy max="20" />
        </RollingFile>

    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
            <AppenderRef ref="RollingFile" />
        </Root>
    </Loggers>
</Configuration>

3、hive-site.xml配置

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

    <!-- 所連接的 MySQL 資料庫的地址,hive_remote2是資料庫,程式會自動創建,自定義就行 -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=Asia/Shanghai</value>
    </property>

    <!-- MySQL 驅動 -->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL JDBC driver class</description>
    </property>

    <!-- mysql連接用戶 -->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>user name for connecting to mysql server</description>
    </property>

    <!-- mysql連接密碼 -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password for connecting to mysql server</description>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
        <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
    </property>

    <!-- host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>localhost</value>
        <description>Bind host on which to run the HiveServer2 Thrift service.</description>
    </property>

    <!-- hs2埠 預設是1000,為了區別,我這裡不使用預設埠-->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10001</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>true</value>
    </property>

</configuration>

六、配置IDEA環境(java)

1)maven配置

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>bigdata-test2023</artifactId>
        <groupId>com.bigdata.test2023</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>flink</artifactId>

    <!-- DataStream API maven settings begin -->
    <dependencies>
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-java</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-clients_2.12</artifactId>
            <version>1.14.3</version>
        </dependency>
        <!-- DataStream API maven settings end -->

        <!-- Table and SQL maven settings begin-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-planner_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>
        <!-- 上面已經設置過了 -->
        <!--<dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-streaming-java_2.12</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>-->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-common</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-csv</artifactId>
            <version>1.14.3</version>
        </dependency>
        <!-- Table and SQL maven settings end-->

        <!-- Hive Catalog maven settings begin -->
        <!-- Flink Dependency -->
        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-connector-hive_2.11</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.flink</groupId>
            <artifactId>flink-table-api-java-bridge_2.11</artifactId>
            <version>1.14.3</version>
            <scope>provided</scope>
        </dependency>

        <!-- Hive Dependency -->
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.2</version>
            <scope>provided</scope>
        </dependency>

        <!-- Hive Catalog maven settings end -->


        <!--hadoop start-->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-common</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
            <version>3.3.1</version>
            <scope>provided</scope>
        </dependency>
        <!--hadoop end-->

    </dependencies>

</project>

【溫馨提示】其實log4j2.xmlhive-site.xml不區分java和scala的,為了方便這裡還是再複製一份。

2)log4j2.xml配置

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_OUT">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
        </Console>

        <RollingFile name="RollingFile" filename="log/test.log"
                     filepattern="${logPath}/%d{YYYYMMddHHmmss}-fargo.log">
            <PatternLayout pattern="%d{YYYY-MM-dd HH:mm:ss} [%t] %-5p %c{1}:%L - %msg%n" />
            <Policies>
                <SizeBasedTriggeringPolicy size="10 MB" />
            </Policies>
            <DefaultRolloverStrategy max="20" />
        </RollingFile>

    </Appenders>
    <Loggers>
        <Root level="info">
            <AppenderRef ref="Console" />
            <AppenderRef ref="RollingFile" />
        </Root>
    </Loggers>
</Configuration>

3)hive-site.xml配置

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

    <!-- 所連接的 MySQL 資料庫的地址,hive_remote2是資料庫,程式會自動創建,自定義就行 -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;serverTimezone=Asia/Shanghai</value>
    </property>

    <!-- MySQL 驅動 -->
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>MySQL JDBC driver class</description>
    </property>

    <!-- mysql連接用戶 -->
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>user name for connecting to mysql server</description>
    </property>

    <!-- mysql連接密碼 -->
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password for connecting to mysql server</description>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://localhost:9083</value>
        <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
    </property>

    <!-- host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>localhost</value>
        <description>Bind host on which to run the HiveServer2 Thrift service.</description>
    </property>

    <!-- hs2埠 預設是1000,為了區別,我這裡不使用預設埠-->
    <property>
        <name>hive.server2.thrift.port</name>
        <value>10001</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>true</value>
    </property>

</configuration>

關於更多大數據的內容,請耐心等待~


您的分享是我們最大的動力!

-Advertisement-
Play Games
更多相關文章
  • VS2019如何把項目部署和發佈 這裡演示:通過IIS文件publish的方式部署到Windows本地伺服器上 第一步(安裝IIS) 1.在自己電腦上搜索Windows功能里的【啟用或關閉Windows功能】 2.配置Internet Information Services 3.然後點擊確認就OK ...
  • 1、回調函數 關於回調函數,在之前的文章《回調函數》已經詳解講解過了,這個文章不再講解,不太懂的同學請看之前的文章《回調函數》。在之前講解回調函數中就使用串口作為示例,使用回調函數可以方便封裝通訊庫,晶元/模塊廠家的SDK和部分開源庫經常這樣做,這樣可以實現模塊間的解耦,模塊化編程。 這篇文章主要講 ...
  • 大家好,我是痞子衡,是正經搞技術的痞子。今天痞子衡給大家介紹的是i.MXRT1xxx系列MCU時鐘相關功能引腳作用。 如果我們從一顆 MCU 晶元的引腳分類來看晶元功能,大概可以分為三大類:電源、時鐘、外設功能。作為嵌入式開發者,大部分時候關註得都是外設功能引腳,而對於時鐘相關引腳往往不太在意,其實 ...
  • Linux小白,可能描述的不規範,請見諒 事情經過 有一次我樹莓派由於某種原因導致桌面某個組件卡死,於是我就在跳出的“未響應”(類似windows的No Response視窗)點擊'End Process'後,桌面的菜單欄還在,但是圖標不見了。 這時我右鍵點擊桌面,發現右鍵菜單與之前截然不同(應該是 ...
  • CH32V103C8T6是沁恆的RISC-V內核MCU, 基於RISC-V3A處理器, 內核採用2級流水線處理,設置了靜態分支預測、指令預取機制,支持DMA. 需要準備一片 WCH-Link 用於燒錄沁恆的晶元. 市面上有相容 DAP-Link 和 WCH-Link 的燒錄器. 註意看燒錄器的說明,... ...
  • UART,全稱Universal Asynchronous Receiver Transmitter,通用非同步收發器,俗稱串口。作為最常用的通信介面之一,從8位單片機到64位SoC,一般都會提供UART介面。 ...
  • Mysql複習的一個小總結,用xmind寫的。(字數沒有都不給我發博客😹) 下麵是一些備註 子查詢 MySQL子查詢稱為內部查詢,而包含子查詢的查詢稱為外部查詢。 子查詢可以在使用表達式的任何地方使用,並且必須在括弧中關閉。 視圖 基本語法可以使用 CREATE VIEW 語句來創建視圖。 語法格 ...
  • ​大數據概述 在大數據這個概念興起之前,信息系統存儲數據的方法主要是我們熟知的關係型資料庫,關係型資料庫,關係型模型之父 Edgar F. Codd,在 1970 年 Communications of ACM 上發表了《大型共用資料庫數據的關係模型》的經典論文,從此之後關係模型的語義設計達到了 4 ...
一周排行
    -Advertisement-
    Play Games
  • 前言 推薦一款基於.NET 8、WPF、Prism.DryIoc、MVVM設計模式、Blazor以及MySQL資料庫構建的企業級工作流系統的WPF客戶端框架-AIStudio.Wpf.AClient 6.0。 項目介紹 框架採用了 Prism 框架來實現 MVVM 模式,不僅簡化了 MVVM 的典型 ...
  • 先看一下效果吧: 我們直接通過改造一下原版的TreeView來實現上面這個效果 我們先創建一個普通的TreeView 代碼很簡單: <TreeView> <TreeViewItem Header="人事部"/> <TreeViewItem Header="技術部"> <TreeViewItem He ...
  • 1. 生成式 AI 簡介 https://imp.i384100.net/LXYmq3 2. Python 語言 https://imp.i384100.net/5gmXXo 3. 統計和 R https://youtu.be/ANMuuq502rE?si=hw9GT6JVzMhRvBbF 4. 數 ...
  • 本文為大家介紹下.NET解壓/壓縮zip文件。雖然解壓縮不是啥核心技術,但壓縮性能以及進度處理還是需要關註下,針對使用較多的zip開源組件驗證,給大家提供個技術選型參考 之前在《.NET WebSocket高併發通信阻塞問題 - 唐宋元明清2188 - 博客園 (cnblogs.com)》講過,團隊 ...
  • 之前寫過兩篇關於Roslyn源生成器生成源代碼的用例,今天使用Roslyn的代碼修複器CodeFixProvider實現一個cs文件頭部註釋的功能, 代碼修複器會同時涉及到CodeFixProvider和DiagnosticAnalyzer, 實現FileHeaderAnalyzer 首先我們知道修 ...
  • 在軟體行業,經常會聽到一句話“文不如表,表不如圖”說明瞭圖形在軟體應用中的重要性。同樣在WPF開發中,為了程式美觀或者業務需要,經常會用到各種個樣的圖形。今天以一些簡單的小例子,簡述WPF開發中幾何圖形(Geometry)相關內容,僅供學習分享使用,如有不足之處,還請指正。 ...
  • 在 C# 中使用 RabbitMQ 通過簡訊發送重置後的密碼到用戶的手機號上,你可以按照以下步驟進行 1.安裝 RabbitMQ 客戶端庫 首先,確保你已經安裝了 RabbitMQ 客戶端庫。你可以通過 NuGet 包管理器來安裝: dotnet add package RabbitMQ.Clien ...
  • 1.下載 Protocol Buffers 編譯器(protoc) 前往 Protocol Buffers GitHub Releases 頁面。在 "Assets" 下找到適合您系統的壓縮文件,通常為 protoc-{version}-win32.zip 或 protoc-{version}-wi ...
  • 簡介 在現代微服務架構中,服務發現(Service Discovery)是一項關鍵功能。它允許微服務動態地找到彼此,而無需依賴硬編碼的地址。以前如果你搜 .NET Service Discovery,大概率會搜到一大堆 Eureka,Consul 等的文章。現在微軟為我們帶來了一個官方的包:Micr ...
  • ZY樹洞 前言 ZY樹洞是一個基於.NET Core開發的簡單的評論系統,主要用於大家分享自己心中的感悟、經驗、心得、想法等。 好了,不賣關子了,這個項目其實是上班無聊的時候寫的,為什麼要寫這個項目呢?因為我單純的想吐槽一下工作中的不滿而已。 項目介紹 項目很簡單,主要功能就是提供一個簡單的評論系統 ...