.. _github-source-connector:
GitHub Source Connector for |cp|
================================
The |kconnect-long| GitHub Source Connector is used to write meta data (detect changes in real time or consume the history)
from GitHub to |ak-tm| topics. This connector polls data from GitHub through `GitHub APIs `__, converts data into |ak| records, and then
pushes the records into a |ak| topic. Each record from GitHub is converted
into exactly one |ak| record.
Features
--------
The GitHub Source connector offers the following features:
* **At Least Once Delivery**: The connector guarantees that records from GitHub are delivered at least once to the |ak| topic.
* **API Rate Limit Awareness** The connector stops fetching records from GitHub when the API rate limit is exceeded. Once the API rate limit resets, the connector will resume fetching records.
* **Supports HTTPS Proxy** The connector can connect to GitHub using an HTTPS proxy server. To configure the proxy, you can set ``http.proxy.host``, ``http.proxy.port``, ``http.proxy.user`` and ``http.proxy.password`` in the configuration file. The connector has been tested with HTTPS proxy with basic authentication.
Limitations
-----------
* For resources that do not support fetching records by datetime, new records are fetched at an interval specified by the ``request.interval.ms`` configuration. Records for these resources might get duplicated every time connector restarts.
* The connector is not be able to detect the deletion of data on GitHub.
* In the case of connector restarts, the |ak| topic might end up having records that are out of order.
* **GitHub has a defined API request limit. This limit is `5,000 requests `__ per hour. Once this rate limit is exceeded, the connector waits until the API request limit resets.**
GitHub Resources
----------------
The GitHub connector supports fetching records from the following resources:
* **assignees:** Available assignees for the specified repositories, refer the following `schema `__.
* **collaborators:** Collaborators for the specified repositories, refer the following `schema `__.
* **issues:** Issues in all GitHub states, refer the following `schema `__.
* **comments:** Issue comments, refer the following `schema `__.
* **commits:** Master branch commits only, refer the following `schema `__.
* **pull_requests:** Pull Requests in all GitHub states, refer the following `schema `__.
* **releases:** Release for the specified repositories, refer the following `schema `__.
* **reviews:** Reviews on pull requests. Reviews can only be fetched with Pull Requests, refer the following `schema `__.
* **review_comments:** Review comments on pull requests, refer the following `schema `__.
* **stargazers:** Stargazers for the specified repositories, refer the following `schema `__.
Prerequisites
-------------
The following are required to run the |kconnect-long| GitHub Source Connector:
* |ak| Broker: |cp| 3.3.0 or above.
* |kconnect|: |cp| 4.1.0 or above.
* Java 1.8
* No additional setup is required on GitHub account for this connector to work, other than access token with repository and user privileges. See `Creating a personal access token for the command line `__.
Install the GitHub Source Connector
-----------------------------------
.. include:: ../includes/connector-install.rst
.. include:: ../includes/connector-install-hub.rst
.. codewithvars:: bash
confluent-hub install confluentinc/kafka-connect-github:latest
.. include:: ../includes/connector-install-version.rst
.. codewithvars:: bash
confluent-hub install confluentinc/kafka-connect-github:1.0.0-preview
------------------------------
Install the connector manually
------------------------------
`Download and extract the ZIP file `__ for your connector and then follow the manual connector installation :ref:`instructions `.
License
-------
.. include:: ../includes/enterprise-license.rst
See :ref:`github-source-connector-license-config` for license properties and :ref:`github-source-license-topic-configuration` for information about the license topic
Configuration Properties
------------------------
For a complete list of configuration properties for this connector, see :ref:`configuration_options`.
Quick Start
-----------
In this quick start, you configure the GitHub Source connector to fetch Github users who have stared `Apache Kafka repository `_ since 2019-01-01 to a |ak| topic called github-stargazers.
---------------
Start Confluent
---------------
Start the Confluent services using the following :ref:`cli` command:
.. sourcecode:: bash
confluent local start
.. important::
Do not use the :ref:`cli` in production environments.
-------------------------
Properties-based example
-------------------------
Create a file called github-source-quickstart.properties file with following properties:
.. sourcecode:: bash
name=MyGithubConnector
confluent.topic.bootstrap.servers=localhost:9092
confluent.topic.replication.factor=1
tasks.max=1
connector.class=io.confluent.connect.github.GithubSourceConnector
github.service.url=https://api.github.com
github.access.token=
github.repositories=apache/kafka
github.tables=stargazers
github.since=2019-01-01
topic.name.pattern=github-${entityName}
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Next, load the Source Connector.
.. include:: ../../includes/confluent-local-consume-limit.rst
.. codewithvars:: bash
.|confluent_load| MyGithubConnector|dash| -d github-source-quickstart.properties
Your output should resemble the following:
.. sourcecode:: bash
{
"name": "MyGithubConnector",
"config": {
"connector.class": "io.confluent.connect.github.GithubSourceConnector",
"tasks.max": "1",
"confluent.topic.bootstrap.servers":"localhost:9092",
"confluent.topic.replication.factor":"1",
"github.service.url":"https://api.github.com",
"github.repositories":"apache/kafka",
"github.tables":"stargazers",
"github.since":"2019-01-01",
"github.access.token":"",
"topic.name.pattern":"github-${entityName}",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://localhost:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://localhost:8081"
},
"tasks": [],
"type": null
}
Enter the following command to confirm that the connector is in a ``RUNNING`` state:
.. codewithvars:: bash
|confluent_status| MyGithubConnector
The output should resemble:
.. code-block:: bash
{
"name":"MyGithubConnector",
"connector":
{
"state":"RUNNING",
"worker_id":"127.0.1.1:8083"
},
"tasks":
[
{
"id":0,
"state":"RUNNING",
"worker_id":"127.0.1.1:8083"
}
],
"type":"source"
}
------------------
REST-based example
------------------
Use this setting with :ref:`distributed workers `. Write the following JSON to ``config.json``, configure all of the required values, and use the following command to post the configuration to one of the distributed |kconnect| workers. Check here for more information about the |kconnect-long| :ref:`REST API `.
.. code-block:: json
{
"name" : "MyGithubConnector",
"config" :
{
"connector.class" : "io.confluent.connect.github.GithubSourceConnector",
"confluent.topic.bootstrap.servers": "localhost:9092",
"confluent.topic.replication.factor": "1",
"tasks.max" : "1",
"github.service.url":"https://api.github.com",
"github.access.token":"< Github-Access-Token >",
"github.repositories":"apache/kafka",
"github.tables":"stargazers",
"github.since":"2019-01-01",
"topic.name.pattern":"github-${entityName}",
"key.converter":"io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url":"http://localhost:8081",
"value.converter":"io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url":"http://localhost:8081"
}
}
.. note::
For staging or production use
* Change the ``confluent.topic.bootstrap.servers`` property to include your broker address(es).
* Change the ``confluent.topic.replication.factor`` to ``3`` for staging or production use.
* Change ``http://localhost:8083/`` to the endpoint of one of your |kconnect| worker(s).
Use curl to post a configuration to one of the |kconnect| workers.
.. code-block:: bash
curl -sS -X POST -H 'Content-Type: application/json' --data @config.json http://localhost:8083/connectors
Confirm that the connector is in a ``RUNNING`` state by running the following command:
.. codewithvars:: bash
curl http://localhost:8083/connectors/MyGithubConnector/status
The output should resemble the example below:
.. code-block:: bash
{
"name":"MyGithubConnector",
"connector":{
"state":"RUNNING",
"worker_id":"127.0.1.1:8083"
},
"tasks":[
{
"id":0,
"state":"RUNNING",
"worker_id":"127.0.1.1:8083"
}
],
"type":"source"
}
Enter the following command to consume records written by the connector to the |ak| topic:
.. sourcecode:: bash
./kafka-avro-console-consumer --bootstrap-server localhost:9092 --topic github-stargazers --from-beginning
The output should resemble the example below:
.. sourcecode:: bash
{
"type": {
"string": "STARGAZERS"
},
"createdAt": null,
"data": {
"data": {
"login": {
"string": "User.Name"
},
"id": {
"int": 1234
},
"node_id": {
"string": "MDQ6VXNlcjM0OTE3MTE="
},
"avatar_url": {
"string": "https://avatars2.githubusercontent.com/u/1234?v=4"
},
"gravatar_id": {
"string": ""
},
"url": {
"string": "https://api.github.com/users/User.Name"
},
"html_url": {
"string": "https://github.com/User.Name"
},
"followers_url": {
"string": "https://api.github.com/users/User.Name/followers"
},
"following_url": {
"string": "https://api.github.com/users/User.Name/following{/other_user}"
},
"gists_url": {
"string": "https://api.github.com/users/User.Name/gists{/gist_id}"
},
"starred_url": {
"string": "https://api.github.com/users/User.Name/starred{/owner}{/repo}"
},
"subscriptions_url": {
"string": "https://api.github.com/users/User.Name/subscriptions"
},
"organizations_url": {
"string": "https://api.github.com/users/User.Name/orgs"
},
"repos_url": {
"string": "https://api.github.com/users/User.Name/repos"
},
"events_url": {
"string": "https://api.github.com/users/User.Name/events{/privacy}"
},
"received_events_url": {
"string": "https://api.github.com/users/User.Name/received_events"
},
"type": {
"string": "User"
},
"site_admin": {
"boolean": false
}
}
},
"id": {
"string": "1234"
}
}
------------------
Clean up resources
------------------
Delete the connector
.. codewithvars:: bash
|confluent_unload| MyGithubConnector
Stop |cp|
.. codewithvars:: bash
|confluent_stop|
Additional Documentation
------------------------
.. toctree::
:maxdepth: 1
configuration_options
changelog