Redact Confluent Logs in Confluent Platform

Modern software that runs in a Java Virtual Machine (JVM) is most often built up from hundreds of component libraries, which come from a wide variety of vendors and open source projects. Typically, each component library creates log messages to capture errors, warnings, informative messages, and debug information throughout their classes and methods. In rare cases, a log statement may inadvertently include sensitive information, unbeknownst to the component author, and once packaged up, end users may encounter scenarios that expose sensitive information in application logs. Of course, this can potentially lead to security concerns, and should be reported to the software provider. Minimally, you should reach out to the provider to request a fix, if one is available.

Regardless of the availability of such fixes for your component libraries, you can use Confluent Log Redactor plugin for Apache Log4j 2 to configure regular expression patterns (redaction rules) to identify and redact specific patterns of sensitive information from your logs, before they are delegated to other appenders and emitted. You can configure Log Redactor for a component (such as Kafka, or Connect) by updating its Log4j 2 configuration file.

For example, you might see that an HTTP Authorization header appears in your logs, and file a fix request ticket with the component provider. You can also configure Confluent Log Redaction to create a simple rule to match Authorization: Basic [0-9a-zA-Z\+\=\/]+ and replace it with Authorization: Basic *****.

Install the Confluent Log Redactor plugin

The Confluent Log Redactor plugin is installed by default (for fresh installs) for Connect.

Configure the Log Redactor plugin

To configure the Log Redactor plugin, you need to:

  1. Back up your configuration file changes.
  2. Ensure that the JAR file is in the classpath for Connect.
  3. Update the log4j2.yaml files.
  4. Reference the Log Redactor class.

Configuration steps are covered in the following subsections.

Set up redaction rules

The redaction rules are specified in JSON format. The file content looks like the following example:

{
  "version": "1",
  "rules": [
    {
      "description": "This is the first rule",
      "trigger": "triggerstring 1",
      "search": "regex 1",
      "replace": "replace 1"
    },
    {
      "description": "This is the second rule",
      "trigger": "triggerstring 2",
      "caseSensitive": false,
      "search": "regex 2",
      "replace": "replace 2"
    }
  ]
}

If the log message matches a rule, the search field is used to search the message, and all occurrences will be replaced with replace.

Field Description Required?
trigger A simple string compare. May be used to provide a performance hint; a simple string compare may be faster than a regular expression. If it does not exist, the message will always apply search. Note that in future versions, this hint might be ignored. No
caseSensitive A boolean indicating if the trigger and search are to be used in case-sensitive or case-insensitive matching. Defaults to true (case-sensitive matching). No
search A regular expression. Make sure that proper escaping is used. Yes
replace A simple string. In practice, it usually looks something like XXXXXXX. If missing, the rule will be detect-only and not redact. No
description Intended for self-documentation purposes. No

In the current implementation, the ordering of the rules is significant. The rules are evaluated strictly in the order given. Thus, in theory later rules might process the output of earlier rules (aaa->bbb, bbb->ccc). The use of rules that depend on this behavior is discouraged as Confluent offers no guarantee that this behavior will be maintained in future versions.

Example of a rules.json file

Here is an example of a rules.json file:

{
  "version": 1,
  "rules": [
    {
      "description": "No more vowels",
      "search": "[aeiou]",
      "replace": "x"
    },
    {
      "description": "Passwords",
      "trigger": "password",
      "search": "password=.*",
      "replace": "password=xxxxx"
    }
  ]
}

This example has two rules. The first rule banishes lowercase vowels from all log messages and replaces them with x’s. The second rule looks for lines containing password and replaces any password=... occurrences with password=xxxxx.

Watch for policy rule changes and updating at runtime

Our log redactor can redact log content against dynamically changing redaction rules found in a file on the filesystem.

To set up the log redactor, use the following configuration in the log4j2.yaml file:

 Rewrite:
- name: RedactorAppender
  RedactorPolicy:
    name: "io.confluent.log4j2.redactor.RedactorPolicy"
    rules: "path/to/log-redactor-rules.json"
    refreshInterval: 60000
  AppenderRef:
    - ref: STDOUT
    - ref: ConnectAppender
refreshInterval
(Optional) Specifies the interval in milliseconds for how often the file system is checked for changes. When a change is detected, the redactor automatically adopts the changes. If unspecified, the policy rules are only read once at startup.

Configure Log Redactor for Metadata Service (MDS)

The Confluent Log Redactor is included in Confluent Platform and can be configured for the Metadata Service (MDS) by updating the Log4j 2 configuration file for Confluent Platform, which provides logging configuration for Confluent Platform components.

Here is an excerpt of an updated log4j2.yaml file that shows a redactor configuration using MDS redactor rules:

Redactor:
  name: redactor
  appenderRefs: stdout,file
  policy: io.confluent.log4j.redactor.RedactorPolicy
  policyRules: path/to/metadata-log-redactor-rules.json

Loggers:
  Root:
    level: ERROR
    AppenderRef:
      ref: redactor