Official

Premium

GitHub source integration documentation

The CloudQuery GitHub plugin extracts your GitHub API and loads it into any supported CloudQuery destination

Publisher

cloudquery

Latest version

v14.1.0

Type

Source

Platforms

Date Published

Download CloudQuery CLI

Documentation Tables Changelog Destinations

Overview FIPS Licenses

Overview #

GitHub Source Plugin

The CloudQuery GitHub plugin extracts your GitHub API and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).

Authentication #

The GitHub source plugin supports two authentication methods: Personal Access Token and App authentication. Which one you use is up to and the security requirements of your organization.

Keep in mind rate limits for GitHub Apps are higher than for personal access tokens. Review GitHub rate limits documentation for details.

CloudQuery requires only read permissions (we will never make any changes to your GitHub account or organizations), so, following the principle of least privilege, it's recommended to grant it read-only permissions to all the resources you wish to sync.

Personal Access Token #

Follow this guide on how to create a personal access token for CloudQuery.

App authentication #

For App authentication, you need to create a GitHub App and install it on your organization. Follow this guide and install the App into your organization(s). Give it all the permissions you need (read-only is recommended).

Every organization will have a unique installation ID. You can find it by going to the organization's settings page, and clicking on the "Installed GitHub Apps" tab. The installation ID is the number in the URL of the page.

Passing `private_key` as plaintext #

You can use | to pass the multi-line private key as plaintext.

For example:

- org: cloudquery
  private_key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEpQIBAAKCAQEA3eVv6PCn9P8zO+EP8K7pLMfxcA2uVrSZ2f+H3GgYIavDxWtO
    vM9tE3jAA8mOjZpdLaG5yy4QfV1LQ3R7kO49JCB6VbClwN2lNvd8Iw49JCBDid7D
    ...
    -----END RSA PRIVATE KEY-----
  app_id: your_app_id

Referencing `private_key` as environment variable #

When referencing the private_key as a string from environment variables, you will need to replace all the new lines in your PEM file with \n otherwise the new line and indent will prevent CloudQuery from reading the variable correctly.

For example:

- org: cloudquery
  private_key: "${GITHUB_PRI_KEY}"
  app_id: your_app_id
  ...

where

GITHUB_PRI_KEY="-----BEGIN RSA PRIVATE KEY-----\nMIIEpQIBAAKCAQEA3eVv6PCn9P8zO+EP8K7pLMfxcA2uVrSZ2f+H3GgYIavDxWtO\n...vM9tE3jAA8mOjZpdLaG5yy4QfV1LQ3R7kO49JCB6VbClwN2lNvd8Iw==\n-----END RSA PRIVATE KEY-----"

Configuration #

To configure CloudQuery to extract from GitHub, create a .yml file in your CloudQuery configuration directory.

The following configuration will extract all issues from the cloudquery/cloudquery repository:

kind: source
spec:
  # Source spec section
  name: github
  path: cloudquery/github
  registry: cloudquery
  version: "v14.1.0"
  tables: ["github_issues"]
  destinations: ["postgresql"]
  # Learn more about the configuration options at https://cql.ink/github_source
  spec:
    access_token: "${GITHUB_PERSONAL_ACCESS_TOKEN}" # Personal Access Token, required if not using App Authentication.
    # # App Authentication (one per org):
    # app_auth:
    # - org: cloudquery
    #   private_key: <PRIVATE_KEY> # Private key as a string
    #   private_key_path: <PATH_TO_PRIVATE_KEY> # Path to private key file
    #   app_id: <YOUR_APP_ID> # App ID, required for App Authentication.
    #   installation_id: <ORG_INSTALLATION_ID> # Installation ID for this org
    # # List of organizations to sync from. You must specify either orgs or repos in the configuration.
    # orgs: []
    # # List of repositories to sync from. The format is `owner/repo` (e.g. `cloudquery/cloudquery`). You must specify either `orgs` or `repos` in the configuration.
    # repos: ["cloudquery/cloudquery"]
    # # GitHub Enterprise
    # # In order to enable GHE you have to provide two urls, the base url of the server and the upload url.
    # # Quote from GitHub's client:
    # #   If the base URL does not have the suffix "/api/v3/", it will be added automatically. If the upload URL does not have the suffix "/api/uploads", it will be added automatically.
    # #   Another important thing is that by default, the GitHub Enterprise URL format should be http(s)://[hostname]/api/v3/ or you will always receive the 406 status code. The upload URL format should be http(s)://[hostname]/api/uploads/"
    # # If you are not configuring against an enterprise server please omit the enterprise configuration bellow
    # enterprise:
    #   base_url: "http(s)://[your-ghe-hostname]/api/v3/"
    #   upload_url: "http(s)://[your-ghe-hostname]/api/uploads/"
    # # Optional parameters
    # concurrency: 1500 # Optional. The best effort maximum number of Go routines to use. Lower this number to reduce memory usage or to avoid hitting GitHub API rate limits. Default 1500.
    # discovery_concurrency: 1 # Optional. Number of concurrent requests to GitHub API during discovery phase. Default 1.
    # include_archived_repos: false # Optional. Include archived repositories in the sync. Default false.
    # local_cache_path: "" # Optional. Path to a local directory that will hold the cache. If set, the plugin will cache the GitHub API responses in this directory. Defaults to an empty string (no cache).
    # table_options:
    #   github_workflow_runs:
    #     created_since: "" # e.g. "7 days ago", defaults to all workflow runs
    #   github_issues:
    #     state: "" # e.g. "open, all, closed", defaults to `all`

See tables for a full list of available tables.

You must specify either orgs or repos in the configuration. If a repository is specified in both orgs and repos, it will be extracted only once, and other repositories from that organization will be ignored.

You can define either private_key or private_key_path in the configuration, but not both.

It is recommended that you use environment variable expansion for the access token in production. For example, if the access token is stored in an environment variable called GITHUB_ACCESS_TOKEN:

spec:
  access_token: ${GITHUB_ACCESS_TOKEN}

GitHub Spec #

This is the (nested) spec used by GitHub Source Plugin

repos ([]string, optional. Default: empty) List of repositories to sync from. The format is owner/repo (e.g. cloudquery/cloudquery). You must specify either orgs or repos in the configuration.
orgs ([]string, optional. Default: empty): List of organizations to sync from. You must specify either orgs or repos in the configuration.
concurrency (integer, optional, default: 1500) A best effort maximum number of Go routines to use. Lower this number to reduce memory usage or to avoid hitting GitHub API rate limits.
discovery_concurrency (integer, optional, default: 1)
During initialization the GitHub source plugin discovers all repositories under the organizations configured in orgs, to be used later on during the sync process. By default the plugin discovers repositories one organization at a time. You can increase discovery_concurrency to discover multiple organizations in parallel, or use a negative value to discover all organizations in parallel. Please note that it's possible to hit GitHub API rate limits when using a high value for discovery_concurrency.
scheduler (string, optional, default: dfs) The scheduler to use when determining the priority of resources to sync. Supported values are dfs (depth-first search), round-robin, shuffle and shuffle-queue.
For more information about this, see performance tuning.
include_archived_repos (bool) (default: false)
By default archived repositories are not included in the sync. To include archived repositories set include_archived_repos to true.
local_cache_path (string, optional, default: empty) Path to a local directory that will hold the cache. If set, the plugin will cache the GitHub API responses in this directory. Defaults to an empty string (no cache). By using a cache, the plugin can use conditional requests when appropriate, and help avoid hitting GitHub API rate limits.
table_options (Table Options spec) (optional)
Options to apply to specific tables. See [Table Options](#Table Options) for more information.

GitHub Table Options Spec #

github_workflow_runs
- created_since (string in natural date format) (optional)
  Sync only workflow runs created after this date (inclusive). Defaults to all workflows. Examples of valid formats are: 7 days ago, last month (see more here)
github_issues
- state (string) (optional)
  Sync only issues with the specified state. Possible values are open, closed, all. Defaults to all.
- labels ([]string) (optional)
  Sync only issues with the specified labels. Defaults to all labels.
- since (string in natural date format) (optional)
  Sync only issues updated after this date (inclusive). Defaults to all issues. If you are using incremental sync, this option will be ignored after the first sync is completed. In all future syncs, only issues updated after the last sync will be fetched. Examples of valid formats are: 7 days ago, last month (see more here)
- Assignee (string) (optional)
  Sync only issues with the specified assignee. Possible values are a user name, "none" for issues that are not assigned, "*" for issues with any assigned user.
- Creator (string) (optional)
  Sync only issues with the specified creator.
- Mentioned (string) (optional)
  Sync only issues with the specified user mentioned.
github_repositories
- updated_after (Time) (optional)
  Sync only repositories updated after this date.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_copilot_org_metrics
- since (Time) (optional)
  Sync only metrics created after this date.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_organization_dependabot_alerts
- updated_after (Time) (optional)
  Sync only alerts created after this date.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_repository_dependabot_alerts
- updated_after (Time) (optional)
  Sync only alerts created after this date.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_code_scanning_alerts
- updated_after (Time) (optional)
  Sync only code scanning alerts updated after this date. Format: a relative or absolute timestamp.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_secret_scanning_alerts
- updated_after (Time) (optional)
  Sync only secret scanning alerts updated after this date. Format: a relative or absolute timestamp.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_pull_requests
- updated_after (Time) (optional)
  Sync only pull requests updated after this date. Format: a relative or absolute timestamp.
  If you use incremental syncing the most recent value between the state and this spec option will be used.
github_pull_request_closing_issues_references
- updated_after (Time) (optional)
  Sync only closing issues references for pull requests updated after this date. Format: a relative or absolute timestamp.
  If you use incremental syncing the most recent value between the state and this spec option will be used.

Table Options Time Format #

This is a special time format that allows either absolute or relative timestamps.

Absolute timestamps must be RFC3339 formatted. Example: 2025-01-01T12:00:00+00:00. Relative timestamps can take this format:

now
x seconds [ago|from now]
x minutes [ago|from now]
x hours [ago|from now]
x days [ago|from now]

FIPS #

A FIPS-compliant version of this plugin is available if your environment requires it. You may enable it by updating the version string in the configuration like this:

kind: source
spec:
  name: gihtub
  path: cloudquery/github
  registry: cloudquery
  version: "v14.1.0-fips"
   ...

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/adrg/xdg	MIT
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/bradleyfalzon/ghinstallation/v2	Apache-2.0
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/codegen/jsonschema/docs	MPL-2.0
github.com/cloudquery/httpcache	MIT
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/gofri/go-github-ratelimit/github_ratelimit	MIT
github.com/golang-jwt/jwt/v4	MIT
github.com/golang/mock/gomock	Apache-2.0
github.com/google/btree	Apache-2.0
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/go-github/v57/github	BSD-3-Clause
github.com/google/go-github/v69/github	BSD-3-Clause
github.com/google/go-querystring/query	BSD-3-Clause
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/peterbourgon/diskv	MIT
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/samber/lo	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/shurcooL/githubv4	MIT
github.com/shurcooL/graphql	MIT
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/tj/go-naturaldate	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/auto/sdk	Apache-2.0
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/oauth2	BSD-3-Clause
golang.org/x/sync	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation

Test CloudQuery's capabilities with a demo

GitHub source integration documentation

Overview #

GitHub Source Plugin

Authentication #

Personal Access Token #

App authentication #

Passing private_key as plaintext #

Referencing private_key as environment variable #

Configuration #

GitHub Spec #

GitHub Table Options Spec #

Table Options Time Format #

FIPS #

Licenses #

Overview #

GitHub Source Plugin

Authentication #

Personal Access Token #

App authentication #

Passing private_key as plaintext #

Referencing private_key as environment variable #

Configuration #

GitHub Spec #

GitHub Table Options Spec #

Table Options Time Format #

FIPS #

Licenses #

Passing `private_key` as plaintext #

Referencing `private_key` as environment variable #

Passing `private_key` as plaintext #

Referencing `private_key` as environment variable #