![file](https://img2023.cnblogs.com/other/2685289/202308/2685289-20230818105027807-1763988259.jpg) ## 支持以下引擎 * Spark * Flink * SeaTunnel Zeta ## 關鍵特性 ...
支持以下引擎
- Spark
- Flink
- SeaTunnel Zeta
關鍵特性
- 批處理
- 精確一次性處理
- 列投影
- 並行處理
- 支持用戶自定義拆分
- 支持查詢 SQL 並實現投影效果
描述
通過 JDBC 讀取外部數據源數據。
支持的數據源信息
Datasource | Supported versions | Driver | Url | Maven |
---|---|---|---|---|
Vertica | Different dependency version has different driver class. | com.vertica.jdbc.Driver | jdbc:vertica://localhost:5433/vertica | Download |
資料庫依賴
請下載與 'Maven' 對應的支持列表,並將其複製到 '$SEATNUNNEL_HOME/plugins/jdbc/lib/' 工作目錄中
例如,Vertica 數據源:cp vertica-jdbc-xxx.jar $SEATNUNNEL_HOME/plugins/jdbc/lib/
數據類型映射
Vertical Data type | SeaTunnel Data type |
---|---|
BIT | BOOLEAN |
TINYINT TINYINT UNSIGNED SMALLINT SMALLINT UNSIGNED MEDIUMINT MEDIUMINT UNSIGNED INT INTEGER YEAR |
INT |
INT UNSIGNED INTEGER UNSIGNED BIGINT |
LONG |
BIGINT UNSIGNED | DECIMAL(20,0) |
DECIMAL(x,y)(Get the designated column's specified column size.<38) | DECIMAL(x,y) |
DECIMAL(x,y)(Get the designated column's specified column size.>38) | DECIMAL(38,18) |
DECIMAL UNSIGNED | DECIMAL((Get the designated column's specified column size)+1, (Gets the designated column's number of digits to right of the decimal point.))) |
FLOAT FLOAT UNSIGNED |
FLOAT |
DOUBLE DOUBLE UNSIGNED |
DOUBLE |
CHAR VARCHAR TINYTEXT MEDIUMTEXT TEXT LONGTEXT JSON |
STRING |
DATE | DATE |
TIME | TIME |
DATETIME TIMESTAMP |
TIMESTAMP |
TINYBLOB MEDIUMBLOB BLOB LONGBLOB BINARY VARBINAR BIT(n) |
BYTES |
GEOMETRY UNKNOWN |
Not supported yet |
源選項
Name | Type | Required | Default | Description |
---|---|---|---|---|
url | String | Yes | - | The URL of the JDBC connection. Refer to a case: jdbc:vertica://localhost:5433/vertica |
driver | String | Yes | - | The jdbc class name used to connect to the remote data source, if you use Vertica the value is com.vertica.jdbc.Driver . |
user | String | No | - | Connection instance user name |
password | String | No | - | Connection instance password |
query | String | Yes | - | Query statement |
connection_check_timeout_sec | Int | No | 30 | The time in seconds to wait for the database operation used to validate the connection to complete |
partition_column | String | No | - | The column name for parallelism's partition, only support numeric type,Only support numeric type primary key, and only can config one column. |
partition_lower_bound | Long | No | - | The partition_column min value for scan, if not set SeaTunnel will query database get min value. |
partition_upper_bound | Long | No | - | The partition_column max value for scan, if not set SeaTunnel will query database get max value. |
partition_num | Int | No | job parallelism | The number of partition count, only support positive integer. default value is job parallelism |
fetch_size | Int | No | 0 | For queries that return a large number of objects,you can configure the row fetch size used in the query toimprove performance by reducing the number database hits required to satisfy the selection criteria. Zero means use jdbc default value. |
common-options | No | - | Source plugin common parameters, please refer to Source Common Options for details |
- 提示
如果未設置 partition_column
,則會在單一併發中運行;如果設置了 partition_column
,則將根據任務的併發性進行並行執行。
任務示例
簡單示例:
此示例在單一併行中查詢您的測試“資料庫”中的 type_bin 'table'
16 個數據,並查詢其所有欄位。您還可以指定要查詢的欄位,以便將最終輸出顯示在控制臺上。
env {
您可以在此處設置 Flink 配置
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
query = "select * from type_bin limit 16"
}
}
transform {
# 如果您想獲取有關如何配置 seatunnel 的更多信息,並查看完整的轉換插件列表,
# 請訪問 https://seatunnel.apache.org/docs/transform-v2/sql
}
sink {
Console {}
}
並行示例:
並行讀取您的查詢表,使用您配置的 shard 欄位和 shard 數據。如果要讀取整個表,可以這樣做。
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根據需要定義查詢邏輯
query = "select * from type_bin"
# 並行分片讀取欄位
partition_column = "id"
# 片段數量
partition_num = 10
}
}
並行邊界示例:
根據查詢的上限和下限指定數據更加高效,根據您配置的上限和下限來讀取數據源更加高效
source {
Jdbc {
url = "jdbc:vertica://localhost:5433/vertica"
driver = "com.vertica.jdbc.Driver"
connection_check_timeout_sec = 100
user = "root"
password = "123456"
# 根據需要定義查詢邏輯
query = "select * from type_bin"
partition_column = "id"
# 讀取起始邊界
partition_lower_bound = 1
# 讀取結束邊界
partition_upper_bound = 500
partition_num = 10
}
}
本文由 白鯨開源 提供發佈支持!