Process Encrypted Data with Flink on Confluent Cloud

Confluent Cloud for Apache Flink provides a fully managed service for stream processing that enables you to build and run real-time data pipelines and applications. When combined with CSFLE, Apache Flink® can process encrypted data while maintaining data privacy through the use of deterministic encryption. This powerful combination allows you to leverage capabilities of stream processing while ensuring your sensitive data remains protected. Currently, Flink does not decrypt the data, but can:

  • Process non-encrypted fields without limitations.
  • Run a subset of operations on encrypted fields when deterministic encryption is used. While other operations are not technically prevented, their semantics is erroneous (for example, MAX on an encrypted column does not return the real MAX).

Deterministic encryption is a cryptographic method where encrypting the same plaintext with the same key always produces the same ciphertext. This property enables operations on encrypted data without needing to decrypt it first, making it particularly useful for stream processing applications.

When you use CSFLE with Apache Flink® and deterministic encryption, you can process encrypted data while maintaining data privacy. This is possible because:

  • You can use deterministic encryption (AES256_SIV) to ensure that the same input data always produces the same encrypted output. This enables:
    • Filtering on non-encrypted columns.
    • Joining of encrypted data across different streams.
    • Aggregation operations on encrypted fields.
  • Flink can process the encrypted data while maintaining the security of sensitive information, as the actual decryption only happens when the data is accessed by authorized users or applications.
  • The deterministic nature of AES256_SIV encryption allows Flink to perform operations on encrypted data without needing to decrypt it first, which is particularly useful for:
    • Equality comparisons in WHERE clauses.
    • GROUP BY operations on encrypted fields.
    • JOIN operations between streams with encrypted fields.
    • Aggregation functions, including COUNT and COUNT (DISTINCT column) on encrypted fields.
  • When using deterministic encryption with Flink, you can:
    • Write SQL queries that operate directly on encrypted fields.
    • Perform stream processing operations while maintaining data privacy.
    • Enable secure data sharing between different applications.
    • Meet compliance requirements while still allowing data processing.
  • Note that while deterministic encryption enables these operations, it does reveal when two encrypted values are the same. This is a necessary trade-off to enable processing of encrypted data, but it should be considered when choosing which fields to encrypt deterministically.
  • For aggregation functions on encrypted fields, only functions that rely on uniqueness comparison work correctly:
    • COUNT and COUNT(DISTINCT) work because they only need to compare if values are the same or different
    • LEAD and LAG window functions work for comparison operations only
    • Other aggregation functions like SUM, AVG, MIN, MAX will not work correctly on encrypted fields as they require access to the actual values

CSFLE uses the Google Tink Cryptographic library for deterministic encryption. For more information on Google Tink, see the following: