Information Schema in Confluent Cloud for Apache Flink

An information schema, or data dictionary, is a standard SQL schema with a collection of predefined views that enable accessing metadata about objects in Confluent Cloud for Apache Flink®. The Confluent INFORMATION_SCHEMA is based on the SQL-92 ANSI Information Schema, with the addition of views and functions that are specific to Confluent Cloud for Apache Flink. The ANSI standard uses “catalog” to refer to a database. In Confluent Cloud, “schema” refers to a database. Conceptually, the terms are equivalent.

The views in the INFORMATION_SCHEMA provide information about database objects, such as tables, columns, and constraints. The views are organized into tables that you can query by using standard SQL statements.

For example, you can use the INFORMATION_SCHEMA.COLUMNS table to get details about the columns in a table, like the column name, data type, and whether it allows null values.

Similarly, you can use the INFORMATION_SCHEMA.TABLES table to get singular, static configuration details of the relation, if it’s a view, a table, or a system table. For example, you can query for details like watermark definition and the number of partitions in the topic.

Every Flink catalog has a corresponding INFORMATION_SCHEMA, so you can always run a statement like SELECT (...) FROM <catalog-name>.INFORMATION_SCHEMA.TABLES WHERE (...).

Global views are available in every INFORMATION_SCHEMA, which means that you can query for information across catalogs. For example, you can query the global INFORMATION_SCHEMA.CATALOGS view to list all catalogs.

The information schema is a powerful tool for querying metadata about your Flink catalogs and databases, and you can use it for a variety of purposes, such as generating reports, documenting a schema, and troubleshooting performance issues.

The following views are supported in the Confluent INFORMATION_SCHEMA:

Query syntax in INFORMATION_SCHEMA

Metadata queries on the INFORMATION_SCHEMA tables support the following syntax.

Supported data types:

  • INT
  • STRING

Supported operators:

  • SELECT
  • WHERE
  • UNION ALL

Supported expressions:

  • CAST(NULL AS dt), CAST(x as dt)
  • UNION ALL (see this example)
  • AND, OR
  • = , <>, IS NULL, IS NOT NULL
  • AS
  • STRING and INT literals

The following limitations apply to INFORMATION_SCHEMA:

  • You can use INFORMATION_SCHEMA views only in SELECT statements, not in INSERT INTO statements.
  • You can’t use INFORMATION_SCHEMA in joins with real tables.
  • Only the previously listed equality and basic expressions are supported.

Catalogs and databases

CATALOGS

The global catalogs view.

The rows returned are limited to the schemas that you have permission to interact with.

This view is an extension to the SQL standard.

Column Name Data Type Standard Description
CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk.
CATALOG_NAME STRING NOT NULL No The human readable name of the catalog/environment, for example, default.
Example

Run the following code to query for all catalogs across environments.

SELECT
  `CATALOG_ID`,
  `CATALOG_NAME`
FROM `INFORMATION_SCHEMA`.`CATALOGS`;

INFORMATION_SCHEMA_CATALOG_NAME

Local catalog view. Returns the name of the current information schema’s catalog.

Column Name Data Type Standard Description
CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk.
CATALOG_NAME STRING NOT NULL Yes The human readable name of the catalog/environment, for example, default.
Example

Run the following code to query for the name of this information schema’s catalog.

SELECT
  `CATALOG_ID`,
  `CATALOG_NAME`
FROM `INFORMATION_SCHEMA`.`INFORMATION_SCHEMA_CATALOG_NAME`

SCHEMATA / DATABASES

Describes databases within the catalog.

For convenience, DATABASES is an alias for SCHEMATA.

The rows returned are limited to the schemas that you have permission to interact with.

Column Name Data Type Standard Description
CATALOG_ID STRING NOT NULL No The ID of the catalog/environment, for example, env-xmzdkk.
CATALOG_NAME STRING NOT NULL Yes The human readable name of the catalog/environment, for example, default.
SCHEMA_ID STRING NOT NULL No The ID of the database/cluster, for example, lkc-kgjwwv.
SCHEMA_NAME STRING NOT NULL Yes The human readable name of the database/cluster, for example, MyCluster.
Example

Run the following code to list all Flink databases within a catalog, (Kafka clusters within an environment), excluding information schema.

SELECT
  `SCHEMA_ID`,
  `SCHEMA_NAME`
FROM `INFORMATION_SCHEMA`.`SCHEMATA`
WHERE `SCHEMA_NAME` <> 'INFORMATION_SCHEMA';

COLUMNS

Describes columns of tables and virtual tables (views) in the catalog.

Column Name Data Type Standard Description
COLUMN_NAME STRING NOT NULL Yes Column reference.
COMMENT STRING NULL No (Like Databricks, Snowflake) An optional comment that describes the relation.
DATA_TYPE STRING NOT NULL Yes Type root, for example, VARCHAR or ROW.
DISTRIBUTION_ORDINAL_POSITION INT NULL No (Like BigQuery for clustering key) If the table IS_DISTRIBUTED, contains the position of the key in a DISTRIBUTED BY clause.
FULL_DATA_TYPE STRING NOT NULL No (Like Databricks) Fully qualified data type. for example, VARCHAR(32) or ROW<…>.
GENERATION_EXPRESSION STRING NULL Yes (Like BigQuery and Databricks) For computed columns.
IS_GENERATED STRING NOT NULL Yes Indicates whether column is a computed column. Values are YES or NO.
IS_HIDDEN STRING NOT NULL No (Like BigQuery) Indicates whether a column is a system column. Values are YES or NO.
IS_METADATA STRING NOT NULL No Indicates whether column is a metadata column. Values are YES or NO.
IS_NULLABLE STRING NOT NULL No Indicates whether the column is nullable. Values are YES or NO.
IS_PERSISTED STRING NOT NULL No Indicates whether a metadata column is stored during INSERT INTO. Also YES if a physical column. Values are YES or NO.
METADATA_KEY STRING NULL No For metadata columns.
ORDINAL_POSITION INT NOT NULL Yes Position of the column in the key, starting at 1.
TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog.
TABLE_NAME STRING NOT NULL Yes The name of the relation.
TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database.

KEY_COLUMN_USAGE

Side view of TABLE_CONSTRAINTS for key columns.

Column Name Data Type Standard Description
COLUMN_NAME STRING NOT NULL Yes The name of the constrained column.
CONSTRAINT_CATALOG STRING NOT NULL Yes Catalog name containing the constraint.
CONSTRAINT_CATALOG_ID STRING NOT NULL No Catalog ID containing the constraint.
CONSTRAINT_SCHEMA STRING NOT NULL Yes Schema name containing the constraint.
CONSTRAINT_SCHEMA_ID STRING NOT NULL No Schema ID containing the constraint.
CONSTRAINT_NAME STRING NOT NULL Yes Name of the constraint.
ORDINAL_POSITION INT NOT NULL Yes The ordinal position of the column within the constraint key (starting at 1).
TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog.
TABLE_NAME STRING NOT NULL Yes The name of the relation.
TABLE_SCHEMA STRING NOT NULL Yes The human readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database.
Example

Run the following code to query for a side view of TABLE_CONSTRAINTS for key columns.

SELECT *
FROM `INFORMATION_SCHEMA`.`KEY_COLUMN_USAGE`

SYSTEM_TABLES

Contains the object-level metadata for virtual tables within the catalog. Virtual tables do not represent physical data. Instead, they expose specific metadata.

The rows returned are limited to the schemas the user has permission to interact with.

Column Name Data Type Description
BASE_TABLE_NAME STRING NULL The name of the relation to which this system view corresponds.
COMMENT STRING NULL An optional comment that describes the system view.
TABLE_CATALOG STRING NOT NULL The human-readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL The ID of the catalog.
TABLE_NAME STRING NOT NULL The name of the relation.
TABLE_SCHEMA STRING NOT NULL The human-readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL The ID of the database.

TABLES

Contains the object level metadata for tables and virtual tables (views) within the catalog.

The rows returned are limited to the schemas that you have permission to interact with.

Column Name Data Type Standard Description
COMMENT STRING NULL No (Like Databricks and Snowflake) An optional comment that describes the relation.
DISTRIBUTION_ALGORITHM STRING NULL No Only HASH is supported.
DISTRIBUTION_BUCKETS INT NULL No Number of buckets, if defined.
IS_DISTRIBUTED STRING NOT NULL No (Like Snowflake for clustering key) Indicates whether the table is bucketed using the DISTRIBUTED BY clause. Values are YES or NO.
IS_WATERMARKED STRING NOT NULL No Indicates whether the table has a watermark from the WATERMARK FOR clause. Values are YES or NO.
TABLE_CATALOG STRING NOT NULL Yes The human-readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog.
TABLE_NAME STRING NOT NULL Yes The name of the relation.
TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database.
TABLE_TYPE STRING NOT NULL Yes Values are BASE TABLE, EXTERNAL TABLE, SYSTEM TABLE, or VIEW [1].
WATERMARK_COLUMN STRING NULL No Time attribute column for which the watermark is defined.
WATERMARK_EXPRESSION STRING NULL No Watermark expression.
WATERMARK_IS_HIDDEN STRING NULL No Indicates whether the watermark is the default, system-provided one.
[1]These are the valid values for the TABLE_TYPE column.
  • BASE TABLE: For Confluent-native tables that can be used conceptually for reading and writing, like a regular database table.
  • EXTERNAL TABLE: For non-native Confluent table, for example non-Kafka and Tableflow. Usually, those tables are read-only.
  • SYSTEM TABLE: For tables that the system creates, either with a BASE TABLE or on its own. Only $error is supported. Compared with BASE TABLE, these tables are read-only.
  • VIEW: For SQL views.
Examples

Run the following code to list all tables within a catalog (Kafka topics within an environment), excluding the information schema.

SELECT
  `TABLE_CATALOG`,
  `TABLE_SCHEMA`,
  `TABLE_NAME`
FROM `INFORMATION_SCHEMA`.`TABLES`
WHERE `TABLE_SCHEMA` <> 'INFORMATION_SCHEMA';

Run the following code to list all tables within a database (Kafka topics within a cluster).

SELECT
  `TABLE_CATALOG`,
  `TABLE_SCHEMA`,
  `TABLE_NAME`
FROM `<current-catalog>`.`INFORMATION_SCHEMA`.`TABLES`
WHERE `TABLE_SCHEMA` = '<current-database>';

TABLE_CONSTRAINTS

Side view of TABLES for all primary key constraints within the catalog.

Column Name Data Type Standard Description
CONSTRAINT_CATALOG STRING NOT NULL Yes Catalog name containing the constraint.
CONSTRAINT_CATALOG_ID STRING NOT NULL No Catalog ID containing the constraint.
CONSTRAINT_SCHEMA STRING NOT NULL Yes Schema name containing the constraint.
CONSTRAINT_SCHEMA_ID STRING NOT NULL No Schema ID containing the constraint.
CONSTRAINT_NAME STRING NOT NULL Yes Name of the constraint.
CONSTRAINT_TYPE STRING NOT NULL Yes Currently, only PRIMARY KEY.
ENFORCED STRING NOT NULL Yes YES if constraint is enforced, otherwise NO.
TABLE_CATALOG STRING NOT NULL Yes The human readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog.
TABLE_NAME STRING NOT NULL Yes The name of the relation.
TABLE_SCHEMA STRING NOT NULL Yes The human readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database.
Examples

Run the following code to query for a side view of TABLES for all primary key constraints within the catalog.

SELECT *
FROM `INFORMATION_SCHEMA`.`TABLE_CONSTRAINTS`;

TABLE_OPTIONS

Side view of TABLES for WITH.

Extension to the SQL Standard Information Schema.

Column Name Data Type Description
TABLE_CATALOG STRING NOT NULL The human readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL The ID of the catalog.
TABLE_NAME STRING NOT NULL The name of the relation.
TABLE_SCHEMA STRING NOT NULL The human readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL The ID of the database.
OPTION_KEY STRING NOT NULL Option key.
OPTION_VALUE STRING NOT NULL Option value.
Examples

Run the following code to query for a side view of TABLES for WITH.

SELECT *
FROM `INFORMATION_SCHEMA`.`TABLE_OPTIONS`;

VIEWS

Contains the object-level metadata for views within the catalog.

The rows returned are limited to the schemas the user has permission to interact with.

Column Name Data Type Standard Description
COMMENT STRING NULL No (Like Databricks and Snowflake) An optional comment that describes the relation.
TABLE_CATALOG STRING NOT NULL Yes The human-readable name of the catalog.
TABLE_CATALOG_ID STRING NOT NULL No The ID of the catalog.
TABLE_NAME STRING NOT NULL Yes The name of the relation.
TABLE_SCHEMA STRING NOT NULL Yes The human-readable name of the database.
TABLE_SCHEMA_ID STRING NOT NULL No The ID of the database.
VIEW_DEFINITION STRING NULL Yes Text of the view’s expanded query expression. Like Databricks, NULL if the user does not own the view.

Functions

Confluent Cloud for Apache Flink supports a number of features for routines.

  • overloading
  • structured types, that is, Java POJOs
  • var-args
  • procedures and polymorphic table functions (PTFs)
  • table, model arguments with traits for PTFs
  • input type strategies and return type strategies

These special cases are considered in the INFORMATION_SCHEMA design:

  • overloading: SPECIFIC_NAME with _1, _2 behavior, similar to Databricks
  • structured types, return type strategies: FULL_DATA_TYPE and DATA_TYPE = NULL
  • var-args, input type strategies: at least indicated with IS_STATIC = NO
  • procedures and PTFs, table, model arguments with traits for PTFs: special columns TRAITS and DATA_TYPE = NULL

ROUTINES

Contains the object-level metadata for functions within the catalog.

Column Name Data Type Standard Description
CREATED TIMESTAMP_LTZ(9) NOT NULL Yes Creation time of the function.
DATA_TYPE STRING NULL Yes Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function.
EXTERNAL_ARTIFACTS STRING NULL No Contains the content from USING JAR. Null if not external. For example: confluent-artifact://<artifact-id>/<version-id>. If multiple artifacts are supported, use a semicolon separated list. The information about this being a JAR can be derived from EXTERNAL_LANGUAGE for now.
EXTERNAL_LANGUAGE STRING NULL Yes JAVA or PYTHON. NULL if not external.
EXTERNAL_NAME STRING NULL Yes Identifier in the external language. For example, for Java, it’s the fully qualified class path. NULL is for non-external functions, that is, functions implemented in SQL. Contains the content of the AS clause, for example, ‘<class>’: CREATE FUNCTION … AS ‘<class>’
FULL_DATA_TYPE STRING NULL No Fully qualified data type, for example, VARCHAR(32) or ROW<…>. NULL is not standard but reserved for user-defined functions with type strategies instead of a static return type or procedures. Table-valued functions always return a ROW type: it is either an automatic wrapper in case of a non-row type or a ROW returned by the function.
FUNCTION_KIND STRING NULL No For ROUTINE_TYPE of FUNCTION or PTF, defines a more specific function kind, corresponding to Flink’s FunctionKind. Values are: TABLE, SCALAR, AGGREGATE, PROCESS_TABLE. Null is reserved for PROCEDURES.
FUNCTION_REQUIREMENTS STRING NULL No Semicolon separated list of requirements for ROUTINE_TYPE of FUNCTION or PTF. Corresponds to Flink’s FunctionRequirement. Values are: OVER_WINDOW_ONLY. Null if there are no calling requirements.
IS_DETERMINISTIC STRING NOT NULL Yes YES or NO.
IS_DYNAMIC STRING NOT NULL No YES or NO. Whether the signature has static arguments or uses a strategy. In the latter case, PARAMETERS doesn’t contain information about the given specific routine.
ROUTINE_BODY STRING NOT NULL Yes EXTERNAL for Java and Python. Deviates from standard because PTFs are also EXTERNAL.
ROUTINE_CATALOG STRING NOT NULL Yes Matches SPECIFIC_CATALOG.
ROUTINE_CATALOG_ID STRING NOT NULL No Matches SPECIFIC_CATALOG_ID.
ROUTINE_NAME STRING NOT NULL Yes Name of the routine.
ROUTINE_SCHEMA STRING NOT NULL Yes Matches SPECIFIC_SCHEMA.
ROUTINE_SCHEMA_ID STRING NOT NULL No Matches SPECIFIC_SCHEMA_ID.
ROUTINE_TYPE STRING NOT NULL Yes FUNCTION for user-defined functions. PTF for user-defined PTFs.
SPECIFIC_CATALOG STRING NOT NULL Yes The human-readable name of the catalog.
SPECIFIC_CATALOG_ID STRING NOT NULL No The ID of the catalog/environment.
SPECIFIC_NAME STRING NOT NULL Yes Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING). Each overload gets a specific name such as f_1 or f_2. The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as PARAMETERS.
SPECIFIC_SCHEMA STRING NOT NULL Yes The human-readable name of the database.
SPECIFIC_SCHEMA_ID STRING NOT NULL No The ID of the database.

PARAMETERS

The parameters view supports only functions with static arguments.

Column Name Data Type Standard Description
DATA_TYPE STRING NULL Yes Type root, for example, VARCHAR or ROW. NULL is not standard but is reserved for PTFs with untyped table arguments.
FULL_DATA_TYPE STRING NULL No Fully qualified data type, for example, VARCHAR(32) or ROW<…>.
IS_OPTIONAL STRING NOT NULL No YES or NO whether this parameter is optional.
ORDINAL_POSITION INT NOT NULL Yes Position (1-based) of the parameter in the signature.
PARAMETER_MODE STRING NOT NULL Yes Always IN. Reserved for future use.
PARAMETER_NAME STRING NOT NULL Yes Name of the parameter.
ROUTINE_NAME STRING NOT NULL No Name of the routine.
SPECIFIC_CATALOG STRING NOT NULL Yes The human-readable name of the catalog.
SPECIFIC_CATALOG_ID STRING NOT NULL No The ID of the catalog/environment.
SPECIFIC_NAME STRING NOT NULL Yes Uniquely identifies a potentially overloaded routine signature. For example, a function f takes both f(INT) and f(STRING). Then each overload gets a specific name such as: f_1 or f_2. The specific name is not callable in SQL but is used for references by other INFORMATION_SCHEMA views such as ROUTINES.
SPECIFIC_SCHEMA STRING NOT NULL Yes The human-readable name of the database.
SPECIFIC_SCHEMA_ID STRING NOT NULL No The ID of the database.
TRAITS STRING NOT NULL No Semicolon-separated list of traits. By default, SCALAR only.