Information Schema in Confluent Cloud for Apache Flink¶
An information schema, or data dictionary, is a standard SQL schema with a collection of predefined views that enable accessing metadata about objects in Confluent Cloud for Apache Flink®. The Confluent INFORMATION_SCHEMA is based on the SQL-92 ANSI Information Schema, with the addition of views and functions that are specific to Confluent Cloud for Apache Flink. The ANSI standard uses “catalog” to refer to a database. In Confluent Cloud, “schema” refers to a database. Conceptually, the terms are equivalent.
The views in the INFORMATION_SCHEMA provide information about database objects, such as tables, columns, and constraints. The views are organized into tables that you can query by using standard SQL statements.
For example, you can use the INFORMATION_SCHEMA.COLUMNS table to get details about the columns in a table, like the column name, data type, and whether it allows null values.
Similarly, you can use the INFORMATION_SCHEMA.TABLES table to get singular, static configuration details of the relation, if it’s a view, a table, or a system table. For example, you can query for details like watermark definition and the number of partitions in the topic.
Every Flink catalog has a corresponding INFORMATION_SCHEMA, so you can always
run a statement like
SELECT (...) FROM <catalog-name>.INFORMATION_SCHEMA.TABLES WHERE (...)
.
Global views are available in every INFORMATION_SCHEMA, which means that you can query for information across catalogs. For example, you can query the global INFORMATION_SCHEMA.CATALOGS view to list all catalogs.
The information schema is a powerful tool for querying metadata about your Flink catalogs and databases, and you can use it for a variety of purposes, such as generating reports, documenting a schema, and troubleshooting performance issues.
The following views are supported in the Confluent INFORMATION_SCHEMA:
Query syntax in INFORMATION_SCHEMA¶
Metadata queries on the INFORMATION_SCHEMA tables support the following syntax.
Supported data types:
- INT
- STRING
Supported operators:
- SELECT
- WHERE
- UNION ALL
Supported expressions:
- CAST(NULL AS dt), CAST(x as dt)
- UNION ALL (see this example)
- AND, OR
- = , <>, IS NULL, IS NOT NULL
- AS
- STRING and INT literals
The following limitations apply to INFORMATION_SCHEMA:
- You can use INFORMATION_SCHEMA views only in SELECT statements, not in INSERT INTO statements.
- You can’t use INFORMATION_SCHEMA in joins with real tables.
- Only the previously listed equality and basic expressions are supported.
Catalogs and databases¶
CATALOGS¶
The global catalogs view.
The rows returned are limited to the schemas that you have permission to interact with.
This view is an extension to the SQL standard.
Column Name | Data Type | Standard | Description |
---|---|---|---|
CATALOG_ID | STRING NOT NULL | No | The ID of the catalog/environment, for example, env-xmzdkk . |
CATALOG_NAME | STRING NOT NULL | No | The human readable name of the catalog/environment, for example, default . |
- Example
Run the following code to query for all catalogs across environments.
SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`CATALOGS`;
INFORMATION_SCHEMA_CATALOG_NAME¶
Local catalog view. Returns the name of the current information schema’s catalog.
Column Name | Data Type | Standard | Description |
---|---|---|---|
CATALOG_ID | STRING NOT NULL | No | The ID of the catalog/environment, for example, env-xmzdkk . |
CATALOG_NAME | STRING NOT NULL | Yes | The human readable name of the catalog/environment, for example, default . |
- Example
Run the following code to query for the name of this information schema’s catalog.
SELECT `CATALOG_ID`, `CATALOG_NAME` FROM `INFORMATION_SCHEMA`.`INFORMATION_SCHEMA_CATALOG_NAME`
SCHEMATA / DATABASES¶
Describes databases within the catalog.
For convenience, DATABASES is an alias for SCHEMATA.
The rows returned are limited to the schemas that you have permission to interact with.
Column Name | Data Type | Standard | Description |
---|---|---|---|
CATALOG_ID | STRING NOT NULL | No | The ID of the catalog/environment, for example, env-xmzdkk . |
CATALOG_NAME | STRING NOT NULL | Yes | The human readable name of the catalog/environment, for example, default . |
SCHEMA_ID | STRING NOT NULL | No | The ID of the database/cluster, for example, lkc-kgjwwv . |
SCHEMA_NAME | STRING NOT NULL | Yes | The human readable name of the database/cluster, for example, MyCluster. |
- Example
Run the following code to list all Flink databases within a catalog, (Kafka clusters within an environment), excluding information schema.
SELECT `SCHEMA_ID`, `SCHEMA_NAME` FROM `INFORMATION_SCHEMA`.`SCHEMATA` WHERE `SCHEMA_NAME` <> 'INFORMATION_SCHEMA';
COLUMNS¶
Describes columns of tables and virtual tables (views) in the catalog.
Column Name | Data Type | Standard | Description |
---|---|---|---|
COLUMN_NAME | STRING NOT NULL | Yes | Column reference. |
COMMENT | STRING NULL | No (Like Databricks, Snowflake) | An optional comment that describes the relation. |
DATA_TYPE | STRING NOT NULL | Yes | Type root, for example, VARCHAR or ROW. |
DISTRIBUTION_ORDINAL_POSITION | INT NULL | No (Like BigQuery for clustering key) | If the table IS_DISTRIBUTED, contains the position of the key in a DISTRIBUTED BY clause. |
FULL_DATA_TYPE | STRING NOT NULL | No (Like Databricks) | Fully qualified data type. for example, VARCHAR(32) or ROW<…>. |
GENERATION_EXPRESSION | STRING NULL | Yes (Like BigQuery and Databricks) | For computed columns. |
IS_GENERATED | STRING NOT NULL | Yes | Indicates whether column is a computed column. Values are YES or NO. |
IS_HIDDEN | STRING NOT NULL | No (Like BigQuery) | Indicates whether a column is a system column. Values are YES or NO. |
IS_METADATA | STRING NOT NULL | No | Indicates whether column is a metadata column. Values are YES or NO. |
IS_NULLABLE | STRING NOT NULL | No | Indicates whether the column is nullable. Values are YES or NO. |
IS_PERSISTED | STRING NOT NULL | No | Indicates whether a metadata column is stored during INSERT INTO. Also YES if a physical column. Values are YES or NO. |
METADATA_KEY | STRING NULL | No | For metadata columns. |
ORDINAL_POSITION | INT NOT NULL | Yes | Position of the column in the key, starting at 1. |
TABLE_CATALOG | STRING NOT NULL | Yes | The human readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | Yes | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | Yes | The human-readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
- Examples
This example shows a complex query. The complexity comes from reducing the number of requests. Because the views are in normal form, instead of issuing three requests, you can batch them into single one by using UNION ALL. UNION ALL avoids the need for various inner/outer joins. The result is a sparse table that contains different “sections”.
The overall schema looks like this:
( section, column_name, column_pos, column_type, constraint_name, constraint_type, constraint_enforced )
Run the following code to list columns, like name, position, data type, and their primary key characteristics.
( SELECT 'COLUMNS' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, `FULL_DATA_TYPE` AS `column_type`, CAST(NULL AS STRING) AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM `<current-catalog>`.`INFORMATION_SCHEMA`.`COLUMNS` WHERE `TABLE_CATALOG` = '<current-catalog>' AND `TABLE_SCHEMA` = '<current-database>' AND `TABLE_NAME` = '<current-table>' AND `IS_HIDDEN` = 'NO' ) UNION ALL ( SELECT 'TABLE_CONSTRAINTS' AS `section`, CAST(NULL AS STRING) AS `column_name`, CAST(NULL AS INT) AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, `CONSTRAINT_TYPE` AS `constraint_type`, `ENFORCED` AS `constraint_enforced` FROM `<<CURRENT_CAT>>`.`INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS` WHERE `CONSTRAINT_CATALOG` = '<current-catalog>' AND `CONSTRAINT_SCHEMA` = '<current-database>' AND `TABLE_CATALOG` = '<current-catalog>' AND `TABLE_SCHEMA` = '<current-database>' AND `TABLE_NAME` = '<current-table>' ) UNION ALL ( SELECT 'KEY_COLUMN_USAGE' AS `section`, `COLUMN_NAME` AS `column_name`, `ORDINAL_POSITION` AS `column_pos`, CAST(NULL AS STRING) AS `column_type`, `CONSTRAINT_NAME` AS `constraint_name`, CAST(NULL AS STRING) AS `constraint_type`, CAST(NULL AS STRING) AS `constraint_enforced` FROM `<<CURRENT_CAT>>`.`INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE` WHERE `TABLE_CATALOG` = '<current-catalog>' AND `TABLE_SCHEMA` = '<current-database>' AND `TABLE_NAME` = '<current-table>' );
KEY_COLUMN_USAGE¶
Side view of TABLE_CONSTRAINTS for key columns.
Column Name | Data Type | Standard | Description |
---|---|---|---|
COLUMN_NAME | STRING NOT NULL | Yes | The name of the constrained column. |
CONSTRAINT_CATALOG | STRING NOT NULL | Yes | Catalog name containing the constraint. |
CONSTRAINT_CATALOG_ID | STRING NOT NULL | No | Catalog ID containing the constraint. |
CONSTRAINT_SCHEMA | STRING NOT NULL | Yes | Schema name containing the constraint. |
CONSTRAINT_SCHEMA_ID | STRING NOT NULL | No | Schema ID containing the constraint. |
CONSTRAINT_NAME | STRING NOT NULL | Yes | Name of the constraint. |
ORDINAL_POSITION | INT NOT NULL | Yes | The ordinal position of the column within the constraint key (starting at 1). |
TABLE_CATALOG | STRING NOT NULL | Yes | The human readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | Yes | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | Yes | The human readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
- Example
Run the following code to query for a side view of TABLE_CONSTRAINTS for key columns.
SELECT * FROM `INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE`
SYSTEM_TABLES¶
Contains the object-level metadata for virtual tables within the catalog. Virtual tables do not represent physical data. Instead, they expose specific metadata.
The rows returned are limited to the schemas the user has permission to interact with.
Column Name | Data Type | Description |
---|---|---|
BASE_TABLE_NAME | STRING NULL | The name of the relation to which this system view corresponds. |
COMMENT | STRING NULL | An optional comment that describes the system view. |
TABLE_CATALOG | STRING NOT NULL | The human-readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | The human-readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | The ID of the database. |
TABLES¶
Contains the object level metadata for tables and virtual tables (views) within the catalog.
The rows returned are limited to the schemas that you have permission to interact with.
Column Name | Data Type | Standard | Description |
---|---|---|---|
COMMENT | STRING NULL | No (Like Databricks and Snowflake) | An optional comment that describes the relation. |
DISTRIBUTION_ALGORITHM | STRING NULL | No | Only HASH is supported. |
DISTRIBUTION_BUCKETS | INT NULL | No | Number of buckets, if defined. |
IS_DISTRIBUTED | STRING NOT NULL | No (Like Snowflake for clustering key) | Indicates whether the table is bucketed using the DISTRIBUTED BY clause. Values are YES or NO . |
IS_WATERMARKED | STRING NOT NULL | No | Indicates whether the table has a watermark from the WATERMARK FOR clause. Values are YES or NO . |
TABLE_CATALOG | STRING NOT NULL | Yes | The human-readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | Yes | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | Yes | The human-readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
TABLE_TYPE | STRING NOT NULL | Yes | Values are BASE TABLE , EXTERNAL TABLE , SYSTEM TABLE , or VIEW [1]. |
WATERMARK_COLUMN | STRING NULL | No | Time attribute column for which the watermark is defined. |
WATERMARK_EXPRESSION | STRING NULL | No | Watermark expression. |
WATERMARK_IS_HIDDEN | STRING NULL | No | Indicates whether the watermark is the default, system-provided one. |
[1] | These are the valid values for the TABLE_TYPE column. |
- BASE TABLE: For Confluent-native tables that can be used conceptually for reading and writing, like a regular database table.
- EXTERNAL TABLE: For non-native Confluent table, for example non-Kafka and Tableflow. Usually, those tables are read-only.
- SYSTEM TABLE: For tables that the system creates, either with a BASE TABLE
or on its own. Only
$error
is supported. Compared with BASE TABLE, these tables are read-only. - VIEW: For SQL views.
- Examples
Run the following code to list all tables within a catalog (Kafka topics within an environment), excluding the information schema.
SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM `INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` <> 'INFORMATION_SCHEMA';
Run the following code to list all tables within a database (Kafka topics within a cluster).
SELECT `TABLE_CATALOG`, `TABLE_SCHEMA`, `TABLE_NAME` FROM `<current-catalog>`.`INFORMATION_SCHEMA`.`TABLES` WHERE `TABLE_SCHEMA` = '<current-database>';
TABLE_CONSTRAINTS¶
Side view of TABLES for all primary key constraints within the catalog.
Column Name | Data Type | Standard | Description |
---|---|---|---|
CONSTRAINT_CATALOG | STRING NOT NULL | Yes | Catalog name containing the constraint. |
CONSTRAINT_CATALOG_ID | STRING NOT NULL | No | Catalog ID containing the constraint. |
CONSTRAINT_SCHEMA | STRING NOT NULL | Yes | Schema name containing the constraint. |
CONSTRAINT_SCHEMA_ID | STRING NOT NULL | No | Schema ID containing the constraint. |
CONSTRAINT_NAME | STRING NOT NULL | Yes | Name of the constraint. |
CONSTRAINT_TYPE | STRING NOT NULL | Yes | Currently, only PRIMARY KEY. |
ENFORCED | STRING NOT NULL | Yes | YES if constraint is enforced, otherwise NO. |
TABLE_CATALOG | STRING NOT NULL | Yes | The human readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | Yes | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | Yes | The human readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
- Examples
Run the following code to query for a side view of TABLES for all primary key constraints within the catalog.
SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS`;
TABLE_OPTIONS¶
Side view of TABLES for WITH.
Extension to the SQL Standard Information Schema.
Column Name | Data Type | Description |
---|---|---|
TABLE_CATALOG | STRING NOT NULL | The human readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | The human readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | The ID of the database. |
OPTION_KEY | STRING NOT NULL | Option key. |
OPTION_VALUE | STRING NOT NULL | Option value. |
- Examples
Run the following code to query for a side view of TABLES for WITH.
SELECT * FROM `INFORMATION_SCHEMA`.`TABLE_OPTIONS`;
VIEWS¶
Contains the object-level metadata for views within the catalog.
The rows returned are limited to the schemas the user has permission to interact with.
Column Name | Data Type | Standard | Description |
---|---|---|---|
COMMENT | STRING NULL | No (Like Databricks and Snowflake) | An optional comment that describes the relation. |
TABLE_CATALOG | STRING NOT NULL | Yes | The human-readable name of the catalog. |
TABLE_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog. |
TABLE_NAME | STRING NOT NULL | Yes | The name of the relation. |
TABLE_SCHEMA | STRING NOT NULL | Yes | The human-readable name of the database. |
TABLE_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
VIEW_DEFINITION | STRING NULL | Yes | Text of the view’s expanded query expression. Like Databricks, NULL if the user does not own the view. |
Functions¶
Confluent Cloud for Apache Flink supports a number of features for routines.
- overloading
- structured types, that is, Java POJOs
- var-args
- procedures and polymorphic table functions (PTFs)
- table, model arguments with traits for PTFs
- input type strategies and return type strategies
These special cases are considered in the INFORMATION_SCHEMA design:
- overloading: SPECIFIC_NAME with
_1
,_2
behavior, similar to Databricks - structured types, return type strategies:
FULL_DATA_TYPE
andDATA_TYPE = NULL
- var-args, input type strategies: at least indicated with
IS_STATIC = NO
- procedures and PTFs, table, model arguments with traits for PTFs: special
columns
TRAITS
andDATA_TYPE = NULL
ROUTINES¶
Contains the object-level metadata for functions within the catalog.
Column Name | Data Type | Standard | Description |
---|---|---|---|
CREATED | TIMESTAMP_LTZ(9) NOT NULL | Yes | Creation time of the function. |
DATA_TYPE | STRING NULL | Yes | Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function. |
EXTERNAL_ARTIFACTS | STRING NULL | No | Contains the content from USING JAR. Null if not external. For example: confluent-artifact://<artifact-id>/<version-id>. If multiple artifacts are supported, use a semicolon separated list. The information about this being a JAR can be derived from EXTERNAL_LANGUAGE for now. |
EXTERNAL_LANGUAGE | STRING NULL | Yes | JAVA or PYTHON . NULL if not external. |
EXTERNAL_NAME | STRING NULL | Yes | Identifier in the external language. For example, for Java, it’s the fully qualified class path. NULL is for non-external functions, that is, functions implemented in SQL. Contains the content of the AS clause, for example, ‘<class>’: CREATE FUNCTION … AS ‘<class>’ |
FULL_DATA_TYPE | STRING NULL | No | Fully qualified data type, for example, VARCHAR(32) or ROW<…>. NULL is not standard but reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function. |
FUNCTION_KIND | STRING NULL | No | For ROUTINE_TYPE of FUNCTION or PTF, defines a more specific function kind, corresponding to Flink’s FunctionKind. Values are: TABLE , SCALAR , AGGREGATE , PROCESS_TABLE . Null is reserved for PROCEDURES. |
FUNCTION_REQUIREMENTS | STRING NULL | No | Semicolon separated list of requirements for ROUTINE_TYPE of FUNCTION or PTF. Corresponds to Flink’s FunctionRequirement. Values are: OVER_WINDOW_ONLY . Null if there are no calling requirements. |
IS_DETERMINISTIC | STRING NOT NULL | Yes | YES or NO . |
IS_DYNAMIC | STRING NOT NULL | No | YES or NO . Whether the signature has static arguments or uses a strategy. In the latter case, PARAMETERS doesn’t contain information about the given specific routine. |
ROUTINE_BODY | STRING NOT NULL | Yes | EXTERNAL for Java and Python. Deviates from standard because PTFs are also EXTERNAL . |
ROUTINE_CATALOG | STRING NOT NULL | Yes | Matches SPECIFIC_CATALOG. |
ROUTINE_CATALOG_ID | STRING NOT NULL | No | Matches SPECIFIC_CATALOG_ID. |
ROUTINE_NAME | STRING NOT NULL | Yes | Name of the routine. |
ROUTINE_SCHEMA | STRING NOT NULL | Yes | Matches SPECIFIC_SCHEMA. |
ROUTINE_SCHEMA_ID | STRING NOT NULL | No | Matches SPECIFIC_SCHEMA_ID. |
ROUTINE_TYPE | STRING NOT NULL | Yes | FUNCTION for user-defined functions. PTF for user-defined PTFs. |
SPECIFIC_CATALOG | STRING NOT NULL | Yes | The human-readable name of the catalog. |
SPECIFIC_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog/environment. |
SPECIFIC_NAME | STRING NOT NULL | Yes | Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING) . Each overload gets a specific name such as f_1 or f_2 . The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as PARAMETERS. |
SPECIFIC_SCHEMA | STRING NOT NULL | Yes | The human-readable name of the database. |
SPECIFIC_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
PARAMETERS¶
The parameters view supports only functions with static arguments.
Column Name | Data Type | Standard | Description |
---|---|---|---|
DATA_TYPE | STRING NULL | Yes | Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for PTFs with untyped table arguments. |
FULL_DATA_TYPE | STRING NULL | No | Fully qualified data type, for example, VARCHAR(32) or ROW<…>. |
IS_OPTIONAL | STRING NOT NULL | No | YES or NO whether this parameter is optional. |
ORDINAL_POSITION | INT NOT NULL | Yes | Position (1-based) of the parameter in the signature. |
PARAMETER_MODE | STRING NOT NULL | Yes | Always IN . Reserved for future use. |
PARAMETER_NAME | STRING NOT NULL | Yes | Name of the parameter. |
ROUTINE_NAME | STRING NOT NULL | No | Name of the routine. |
SPECIFIC_CATALOG | STRING NOT NULL | Yes | The human-readable name of the catalog. |
SPECIFIC_CATALOG_ID | STRING NOT NULL | No | The ID of the catalog/environment. |
SPECIFIC_NAME | STRING NOT NULL | Yes | Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING). Then each overload gets a specific name such as: f_1 or f_2. The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as ROUTINES. |
SPECIFIC_SCHEMA | STRING NOT NULL | Yes | The human-readable name of the database. |
SPECIFIC_SCHEMA_ID | STRING NOT NULL | No | The ID of the database. |
TRAITS | STRING NOT NULL | No | Semicolon-separated list of traits. By default, SCALAR only. |