.. _github-source-connector: GitHub Source Connector for |cp| ================================ The |kconnect-long| GitHub Source Connector is used to write meta data (detect changes in real time or consume the history) from GitHub to |ak-tm| topics. This connector polls data from GitHub through `GitHub APIs `__, converts data into |ak| records, and then pushes the records into a |ak| topic. Each record from GitHub is converted into exactly one |ak| record. Features -------- The GitHub Source connector offers the following features: * **At Least Once Delivery**: The connector guarantees that records from GitHub are delivered at least once to the |ak| topic. * **API Rate Limit Awareness** The connector stops fetching records from GitHub when the API rate limit is exceeded. Once the API rate limit resets, the connector will resume fetching records. * **Supports HTTPS Proxy** The connector can connect to GitHub using an HTTPS proxy server. To configure the proxy, you can set ``http.proxy.host``, ``http.proxy.port``, ``http.proxy.user`` and ``http.proxy.password`` in the configuration file. The connector has been tested with HTTPS proxy with basic authentication. Limitations ----------- * For resources that do not support fetching records by datetime, new records are fetched at an interval specified by the ``request.interval.ms`` configuration. Records for these resources might get duplicated every time connector restarts. * The connector is not be able to detect the deletion of data on GitHub. * In the case of connector restarts, the |ak| topic might end up having records that are out of order. * **GitHub has a defined API request limit. This limit is `5,000 requests `__ per hour. Once this rate limit is exceeded, the connector waits until the API request limit resets.** GitHub Resources ---------------- The GitHub connector supports fetching records from the following resources: * **assignees:** Available assignees for the specified repositories, refer the following `schema `__. * **collaborators:** Collaborators for the specified repositories, refer the following `schema `__. * **issues:** Issues in all GitHub states, refer the following `schema `__. * **comments:** Issue comments, refer the following `schema `__. * **commits:** Master branch commits only, refer the following `schema `__. * **pull_requests:** Pull Requests in all GitHub states, refer the following `schema `__. * **releases:** Release for the specified repositories, refer the following `schema `__. * **reviews:** Reviews on pull requests. Reviews can only be fetched with Pull Requests, refer the following `schema `__. * **review_comments:** Review comments on pull requests, refer the following `schema `__. * **stargazers:** Stargazers for the specified repositories, refer the following `schema `__. Prerequisites ------------- The following are required to run the |kconnect-long| GitHub Source Connector: * |ak| Broker: |cp| 3.3.0 or above. * |kconnect|: |cp| 4.1.0 or above. * Java 1.8 * No additional setup is required on GitHub account for this connector to work, other than access token with repository and user privileges. See `Creating a personal access token for the command line `__. Install the GitHub Source Connector ----------------------------------- .. include:: ../includes/connector-install.rst .. include:: ../includes/connector-install-hub.rst .. codewithvars:: bash confluent-hub install confluentinc/kafka-connect-github:latest .. include:: ../includes/connector-install-version.rst .. codewithvars:: bash confluent-hub install confluentinc/kafka-connect-github:1.0.0-preview ------------------------------ Install the connector manually ------------------------------ `Download and extract the ZIP file `__ for your connector and then follow the manual connector installation :ref:`instructions `. License ------- .. include:: ../includes/enterprise-license.rst See :ref:`github-source-connector-license-config` for license properties and :ref:`github-source-license-topic-configuration` for information about the license topic Configuration Properties ------------------------ For a complete list of configuration properties for this connector, see :ref:`configuration_options`. Quick Start ----------- In this quick start, you configure the GitHub Source connector to fetch Github users who have stared `Apache Kafka repository `_ since 2019-01-01 to a |ak| topic called github-stargazers. --------------- Start Confluent --------------- Start the Confluent services using the following :ref:`cli` command: .. sourcecode:: bash confluent local start .. important:: Do not use the :ref:`cli` in production environments. ------------------------- Properties-based example ------------------------- Create a file called github-source-quickstart.properties file with following properties: .. sourcecode:: bash name=MyGithubConnector confluent.topic.bootstrap.servers=localhost:9092 confluent.topic.replication.factor=1 tasks.max=1 connector.class=io.confluent.connect.github.GithubSourceConnector github.service.url=https://api.github.com github.access.token= github.repositories=apache/kafka github.tables=stargazers github.since=2019-01-01 topic.name.pattern=github-${entityName} key.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=http://localhost:8081 value.converter=io.confluent.connect.avro.AvroConverter value.converter.schema.registry.url=http://localhost:8081 Next, load the Source Connector. .. include:: ../../includes/confluent-local-consume-limit.rst .. codewithvars:: bash .|confluent_load| MyGithubConnector|dash| -d github-source-quickstart.properties Your output should resemble the following: .. sourcecode:: bash { "name": "MyGithubConnector", "config": { "connector.class": "io.confluent.connect.github.GithubSourceConnector", "tasks.max": "1", "confluent.topic.bootstrap.servers":"localhost:9092", "confluent.topic.replication.factor":"1", "github.service.url":"https://api.github.com", "github.repositories":"apache/kafka", "github.tables":"stargazers", "github.since":"2019-01-01", "github.access.token":"", "topic.name.pattern":"github-${entityName}", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081" }, "tasks": [], "type": null } Enter the following command to confirm that the connector is in a ``RUNNING`` state: .. codewithvars:: bash |confluent_status| MyGithubConnector The output should resemble: .. code-block:: bash { "name":"MyGithubConnector", "connector": { "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks": [ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } ------------------ REST-based example ------------------ Use this setting with :ref:`distributed workers `. Write the following JSON to ``config.json``, configure all of the required values, and use the following command to post the configuration to one of the distributed |kconnect| workers. Check here for more information about the |kconnect-long| :ref:`REST API `. .. code-block:: json { "name" : "MyGithubConnector", "config" : { "connector.class" : "io.confluent.connect.github.GithubSourceConnector", "confluent.topic.bootstrap.servers": "localhost:9092", "confluent.topic.replication.factor": "1", "tasks.max" : "1", "github.service.url":"https://api.github.com", "github.access.token":"< Github-Access-Token >", "github.repositories":"apache/kafka", "github.tables":"stargazers", "github.since":"2019-01-01", "topic.name.pattern":"github-${entityName}", "key.converter":"io.confluent.connect.avro.AvroConverter", "key.converter.schema.registry.url":"http://localhost:8081", "value.converter":"io.confluent.connect.avro.AvroConverter", "value.converter.schema.registry.url":"http://localhost:8081" } } .. note:: For staging or production use * Change the ``confluent.topic.bootstrap.servers`` property to include your broker address(es). * Change the ``confluent.topic.replication.factor`` to ``3`` for staging or production use. * Change ``http://localhost:8083/`` to the endpoint of one of your |kconnect| worker(s). Use curl to post a configuration to one of the |kconnect| workers. .. code-block:: bash curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors Confirm that the connector is in a ``RUNNING`` state by running the following command: .. codewithvars:: bash curl http://localhost:8083/connectors/MyGithubConnector/status The output should resemble the example below: .. code-block:: bash { "name":"MyGithubConnector", "connector":{ "state":"RUNNING", "worker_id":"127.0.1.1:8083" }, "tasks":[ { "id":0, "state":"RUNNING", "worker_id":"127.0.1.1:8083" } ], "type":"source" } Enter the following command to consume records written by the connector to the |ak| topic: .. sourcecode:: bash ./kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic github-stargazers --from-beginning The output should resemble the example below: .. sourcecode:: bash { "type": { "string": "STARGAZERS" }, "createdAt": null, "data": { "data": { "login": { "string": "User.Name" }, "id": { "int": 1234 }, "node_id": { "string": "MDQ6VXNlcjM0OTE3MTE=" }, "avatar_url": { "string": "https://avatars2.githubusercontent.com/u/1234?v=4" }, "gravatar_id": { "string": "" }, "url": { "string": "https://api.github.com/users/User.Name" }, "html_url": { "string": "https://github.com/User.Name" }, "followers_url": { "string": "https://api.github.com/users/User.Name/followers" }, "following_url": { "string": "https://api.github.com/users/User.Name/following{/other_user}" }, "gists_url": { "string": "https://api.github.com/users/User.Name/gists{/gist_id}" }, "starred_url": { "string": "https://api.github.com/users/User.Name/starred{/owner}{/repo}" }, "subscriptions_url": { "string": "https://api.github.com/users/User.Name/subscriptions" }, "organizations_url": { "string": "https://api.github.com/users/User.Name/orgs" }, "repos_url": { "string": "https://api.github.com/users/User.Name/repos" }, "events_url": { "string": "https://api.github.com/users/User.Name/events{/privacy}" }, "received_events_url": { "string": "https://api.github.com/users/User.Name/received_events" }, "type": { "string": "User" }, "site_admin": { "boolean": false } } }, "id": { "string": "1234" } } ------------------ Clean up resources ------------------ Delete the connector .. codewithvars:: bash |confluent_unload| MyGithubConnector Stop |cp| .. codewithvars:: bash |confluent_stop| Additional Documentation ------------------------ .. toctree:: :maxdepth: 1 configuration_options changelog