在Flink SQL中, 元數據的管理分為三層: catalog-> database-> table, 我們知道Flink SQL是依托calcite框架來進行SQL執行樹生產,校驗,優化等等, 所以本文講介紹FlinkSQL是如何來結合Calcite來進行元數據管理的. calcite開放的介面 ...
在Flink SQL中, 元數據的管理分為三層: catalog-> database-> table,
我們知道Flink SQL是依托calcite框架來進行SQL執行樹生產,校驗,優化等等, 所以本文講介紹FlinkSQL是如何來結合Calcite來進行元數據管理的.
calcite開放的介面
public interface Schema {
Table getTable(String name);
Schema getSubSchema(String name);
....
}
如介面所示, Schema介面,可以通過table名來獲得一張表, 可以通過schema名來獲得一個子schema.
public interface Table {
RelDataType getRowType(RelDataTypeFactory typeFactory);
....
}
看Table的介面, 主要就是返回table的RelDataType.
Flink的相關實現
接下來,我們來看下Flink是如何實現這些介面的:
public class CatalogManagerCalciteSchema extends FlinkSchema {
@Override
public Schema getSubSchema(String schemaName) {
if (catalogManager.schemaExists(name)) {
return new CatalogCalciteSchema(name, catalogManager, isStreamingMode);
} else {
return null;
}
}
}
public class CatalogCalciteSchema extends FlinkSchema {
@Override
public Schema getSubSchema(String schemaName) {
if (catalogManager.schemaExists(catalogName, schemaName)) {
return new DatabasecalciteSchema(schemaName, catalogNmae, catalogManager, isStreamingMode);
}
}
}
public class DatabaseCalciteSchema extends FlinkSchema {
private final String databaseName;
private final String catalogName;
private final CatalogManager catalogManager;
@Override
public Table getTable(String tableName) {
ObjectIdentifier identifier = ObjectIdentifier.of(catalogName, databaseName, tableName);
return catalogManager.getTable(identifier)
.map(result -> {
CatalogBaseTable table = result.getTable();
FlinkStatistic statistic = getStatistic(result.isTemporary(), table, identifier);
return new CatalogSchemaTable(identifier,
table,
statistic,
catalogManager.getCatalog(catalogName)
.flatMap(Catalog::getTableFactory)
.orElse(null),
isStreamingMode,
result.isTemporary());
})
.orElse(null);
}
@Override
public Schema getSubSchema(String name) {
return null;
}
}
很容易發現,CatalogSchema返回DatabaseSchema, DatabaseSchema返回Table,
這樣就容易理解,Flink的三層結構是怎樣的了. 同時, 具體的元數據實際上都是在catalogManager中。
DatabaseSchema中返回的Table類型為CatalogSchemaTable,我們來看下具體的結結構是怎樣的,
上文中也提到了,Table介面主為getRowType函數, 用於返回某個table的type信息。
TableSchema是Flink內部用於保存各個欄位的類型信息的類, 通過相關的轉化函數,轉換為calcite的type類型.
public class CatalogSchemaTable extends AbstractTable implements TemporalTable {
private final ObjectIdentifier tableIdentifier;
private final CatalogBaseTable catalogBaseTable;
private final FlinkStatistic statistic;
private final boolean isStreamingMode;
private final boolean isTemporary;
...
private static RelDataType getRowType(RelDataTypeFactory typeFactory,
CatalogBaseTable catalogBaseTable,
boolean isStreamingMode) {
final FlinkTypeFactory flinkTypeFactory = (FlinkTypeFactory) typeFactory;
TableSchema tableSchema = catalogBaseTable.getSchema();
final DataType[] fieldDataTypes = tableSchema.getFieldDataTypes();
if (!isStreamingMode
&& catalogBaseTable instanceof ConnectorCatalogTable
&& ((ConnectorCatalogTable) catalogBaseTable).getTableSource().isPresent()) {
// If the table source is bounded, materialize the time attributes to normal TIMESTAMP type.
// Now for ConnectorCatalogTable, there is no way to
// deduce if it is bounded in the table environment, so the data types in TableSchema
// always patched with TimeAttribute.
// See ConnectorCatalogTable#calculateSourceSchema
// for details.
// Remove the patched time attributes type to let the TableSourceTable handle it.
// We should remove this logic if the isBatch flag in ConnectorCatalogTable is fixed.
// TODO: Fix FLINK-14844.
for (int i = 0; i < fieldDataTypes.length; i++) {
LogicalType lt = fieldDataTypes[i].getLogicalType();
if (lt instanceof TimestampType
&& (((TimestampType) lt).getKind() == TimestampKind.PROCTIME
|| ((TimestampType) lt).getKind() == TimestampKind.ROWTIME)) {
int precision = ((TimestampType) lt).getPrecision();
fieldDataTypes[i] = DataTypes.TIMESTAMP(precision);
}
}
}
return TableSourceUtil.getSourceRowType(flinkTypeFactory,
tableSchema,
scala.Option.empty(),
isStreamingMode);
}
}
CatalogBaseTable介面定義如下, Flink的Table的參數(schema參數,connector參數)都可以最終表示為一個map.
public interface CatalogBaseTable {
/**
* Get the properties of the table.
*
* @return property map of the table/view
*/
Map<String, String> getProperties();
/**
* Get the schema of the table.
*
* @return schema of the table/view.
*/
TableSchema getSchema();
/**
* Get comment of the table or view.
*
* @return comment of the table/view.
*/
String getComment();
/**
* Get a deep copy of the CatalogBaseTable instance.
*
* @return a copy of the CatalogBaseTable instance
*/
CatalogBaseTable copy();
/**
* Get a brief description of the table or view.
*
* @return an optional short description of the table/view
*/
Optional<String> getDescription();
/**
* Get a detailed description of the table or view.
*
* @return an optional long description of the table/view
*/
Optional<String> getDetailedDescription();
}
FlinkSchema的使用
上面都是的相關介面都是Flink用於適配calcite框架元數據的相關實現。
那麼這些類具體是在哪裡調用的? 已經什麼時候會被調用到?
calcite中的schema,主要是在validate過程中, 獲得對應table的欄位信息, 對應的function的返回值信息,
確保SQL的欄位名, 欄位類型是正確的.
類的依賴關係為:
validator ---> schemaReader ---> schema
FlinkPlannerImpl.scala中
private def createSqlValidator(catalogReader: CatalogReader) = {
val validator = new FlinkCalciteSqlValidator(
operatorTable,
catalogReader,
typeFactory)
validator.setIdentifierExpansion(true)
// Disable implicit type coercion for now.
validator.setEnableTypeCoercion(false)
validator
}
PlanningConfigurationBuilder.java
private CatalogReader createCatalogReader(
boolean lenientCaseSensitivity,
String currentCatalog,
String currentDatabase) {
SqlParser.Config sqlParserConfig = getSqlParserConfig();
final boolean caseSensitive;
if (lenientCaseSensitivity) {
caseSensitive = false;
} else {
caseSensitive = sqlParserConfig.caseSensitive();
}
SqlParser.Config parserConfig = SqlParser.configBuilder(sqlParserConfig)
.setCaseSensitive(caseSensitive)
.build();
return new CatalogReader(
rootSchema,
asList(
asList(currentCatalog, currentDatabase),
singletonList(currentCatalog)
),
typeFactory,
CalciteConfig.connectionConfig(parserConfig));
}
綜上所訴, 我們就知道了Flink是如何來利用calcite的schema來管理Flink的table信息的.