How to Control the Case of Identifiers in ksqlDB for Confluent Platform¶
Context¶
You have identifiers, like row names, that will be used outside of ksqlDB. You want to control the exact casing (capitalization) of how they are represented to make them consumable by downstream programs. Because ksqlDB uppercases all identifiers by default, you need to use backticks to preserve the desired casing.
In action¶
CREATE STREAM `s1` (
`foo` VARCHAR KEY,
`BAR` INT,
`Baz` VARCHAR
) WITH (
kafka_topic = 's1',
partitions = 1,
value_format = 'avro'
);
Backticks¶
Begin by telling ksqlDB to start all queries from the earliest point in each topic.
SET 'auto.offset.reset' = 'earliest';
Declare a new stream named s2
. In this example, you override
ksqlDB’s default behavior to uppercase all identifiers. Use backticks to
control the casing of the stream name and column names. To contrast this
behavior, qux
is declared without backticks to demonstrate the
default behavior of uppercasing.
CREATE STREAM `s2` (
`foo` VARCHAR KEY,
`BAR` INT,
`Baz` VARCHAR,
`grault` STRUCT<
`Corge` VARCHAR,
`garply` INT
>,
qux INT
) WITH (
kafka_topic = 's2',
partitions = 1,
value_format = 'avro'
);
Insert some rows into s2
. Notice how you need to use backticks each
time to reference a field that doesn’t have the default casing:
INSERT INTO `s2` (
`foo`, `BAR`, `Baz`, `grault`, qux
) VALUES (
'k1', 1, 'x', STRUCT(`Corge` := 'v1', `garply` := 5), 2
);
INSERT INTO `s2` (
`foo`, `BAR`, `Baz`, `grault`, qux
) VALUES (
'k2', 3, 'y', STRUCT(`Corge` := 'v2', `garply` := 6), 4
);
INSERT INTO `s2` (
`foo`, `BAR`, `Baz`, `grault`, qux
) VALUES (
'k3', 5, 'z', STRUCT(`Corge` := 'v3', `garply` := 8), 6
);
Issue a push query to select the rows. The relevant identifiers are
again surrounded with backticks. Notice that since qux
wasn’t
declared with backticks, it can be referred to with any casing. ksqlDB
will find the matching identifier by uppercasing it automatically.
SELECT `foo`,
`BAR`,
`Baz`,
`grault`->`Corge`,
`grault`->`garply`,
qux,
QUX AS qux2
FROM `s2`
EMIT CHANGES;
Your output should resemble the following. Notice the casing of the headers that ksqlDB prints:
+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
|foo |BAR |Baz |Corge |garply |QUX |QUX2 |
+---------------+---------------+---------------+---------------+---------------+---------------+---------------+
|k1 |1 |x |v1 |5 |2 |2 |
|k2 |3 |y |v2 |6 |4 |4 |
|k3 |5 |z |v3 |8 |6 |6 |
User-defined functions¶
Another area where casing is important is user-defined functions (UDFs) that work with struct parameters. This is the case whenever you use a UDF that either receives an incoming struct or returns a custom struct. Any references to struct fields must exactly match the casing ksqlDB expects. ksqlDB will reject any UDF invocations that do not match it.
Here is a quick example of using backticks in a UDF that returns a custom struct. The first two fields override the default behavior. The last uses the default casing. That means that when you work with the data returned from the UDF in ksqlDB, all select statements must use this exact casing (see the example above). For more information on working with structs in UDFs, see the struct section of the how to create a user-defined function guide.
public static final Schema AGGREGATE_SCHEMA = SchemaBuilder.struct().optional()
.field("`min`", Schema.OPTIONAL_INT64_SCHEMA)
.field("`max`", Schema.OPTIONAL_INT64_SCHEMA)
.field("COUNT", Schema.OPTIONAL_INT64_SCHEMA)
.build();