Sync data from GitHub to S3

CloudQuery is the simple, fast data integration platform that can fetch your data from GitHub APIs and load it into S3

Trusted by

https://cdn.cloudquery.io/hub/4sgacx5fv/_next/static/media/zendesk.7797fa4d.svg

https://cdn.cloudquery.io/hub/4sgacx5fv/_next/static/media/palo_alto_networks.408311f5.svg

https://cdn.cloudquery.io/hub/4sgacx5fv/_next/static/media/instructure.dcb4ccf2.svg

https://cdn.cloudquery.io/hub/4sgacx5fv/_next/static/media/ridgeline.93285988.svg

Enterprise Ready

Customize & Extend

Query Assets with SQL

Non-invasive account access for better security and efficiency.

Import data with CloudQuery SDKs and build your own plugins.

Query cloud assets and security with a simple SQL-based UI.

Step by step guide for how to export data from GitHub to S3

MacOS Windows Linux

MacOS Setup

Step 1: Install CloudQuery

To install CloudQuery, run the following command in your terminal:

brew install cloudquery/tap/cloudquery

Next, log in to the CloudQuery CLI. If you have't already, you can sign up for a free account as part of this step:

cloudquery login

Step 3: Create a Configuration File

Next, run the following command to initialize a sync configuration file for GitHub to S3:

cloudquery init --source=github --destination=s3

This will generate a config file named github_to_s3.yaml. Follow the instructions to fill out the necessary fields to authenticate against your own environment.

Step 4: Run a Sync

cloudquery sync github_to_s3.yaml

This will start syncing data from the GitHub API to your S3 environment! 🚀

See the CloudQuery documentation portal for more deployment guides, options and further tips.

FAQs

What is CloudQuery?

CloudQuery is an open-source tool that helps you extract, transform, and load cloud asset data from various sources into databases for security, compliance, and visibility.

Why does CloudQuery require login?

Logging in allows CloudQuery to authenticate your access to the CloudQuery Hub and monitor usage for billing purposes. Data synced with CloudQuery remains private to your environment and is not shared with our servers or any third parties.

What data does CloudQuery have access to?

CloudQuery accesses only the metadata and configurations of your cloud resources that you specify without touching sensitive data or workloads.

How is CloudQuery priced?

CloudQuery offers flexible pricing based on the number of cloud accounts and usage. Visit our pricing page for detailed plans.

Is there a free version of CloudQuery?

Yes, CloudQuery offers a free plan that includes basic features, perfect for smaller teams or personal use. More details can be found on our pricing page.

What authentication methods does the CloudQuery GitHub integration support?

The CloudQuery GitHub integration supports two authentication methods and the best option for your sync to S3 will depend on your personal preferences and your organizational security policy.

What is the difference between personal access tokens and app authentication?

The main difference your choice of authentication method will make to your sync from GitHub to S3 is the rate at which CloudQuery can read and sync data. Personal access tokens have a lower rate limit than app authentication - so if you need to move a particularly large amount of data quickly, we would recommend using app authentication.

Which tables can I sync from GitHub to S3?

A full list of supported tables is available in the tables tab on our integration information page.

Will archived repos be included in the sync from GitHub to S3?

Archived repos will only be synced if a specific request is made. To include archived repos in the sync, include_archived_repos must be set to true.

Which permissions do I need to grant in S3 in order to sync from GitHub?

The CloudQuery integration only requires PutObject permissions to sync to the S3 destination and we recommend that you grant only these permissions to CloudQuery in order to keep your setup as secure as possible. CloudQuery does not require access to your S3 cloud configuration settings, only permission to add objects to your S3 buckets.

Can CloudQuery read from my credentials and config S3 files?

Yes, if you want CloudQuery to use these credentials, then the files need to be located in the .aws directory of your home folder. The two files are almost identical in format but if there is a conflict, CloudQuery will prioritise the credential information that it reads from the credentials file over those found in config.

What formats can CloudQuery load from GitHub to an S3 destination?

CloudQuery can load information in your choice of csv, parquet or json. This is specified in the format field.