Back to destination list
Official
S3
This destination plugin lets you sync data from a CloudQuery source to remote S3 storage in various formats such as CSV, JSON and Parquet
Loading plugin documentation
This destination plugin lets you sync data from a CloudQuery source to remote S3 storage in various formats such as CSV, JSON and Parquet
Loading plugin documentation
We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.
s3://bucket_name/path/to/files
, with each table placed in its own directory.kind: destination
spec:
name: "s3"
path: "cloudquery/s3"
registry: "cloudquery"
version: "v7.9.3"
write_mode: "append"
# Learn more about the configuration options at https://cql.ink/s3_destination
spec:
bucket: "bucket_name"
region: "region-name" # Example: us-east-1
path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"
format: "parquet" # options: parquet, json, csv
format_spec:
# CSV specific parameters:
# delimiter: ","
# skip_header: false
# Parquet specific parameters:
# version: "v2Latest"
# root_repetition: "repeated"
# max_row_group_length: 134217728 # 128 * 1024 * 1024
# Optional parameters
# compression: "" # options: gzip
# no_rotate: false
# athena: false # <- set this to true for Athena compatibility
# write_empty_objects_for_empty_tables: false # <- set this to true if using with the CloudQuery Compliance policies
# test_write: true # tests the ability to write to the bucket before processing the data
# endpoint: "" # Endpoint to use for S3 API calls.
# endpoint_skip_tls_verify # Disable TLS verification if using an untrusted certificate
# use_path_style: false
# batch_size: 10000 # 10K entries
# batch_size_bytes: 52428800 # 50 MiB
# batch_timeout: 30s # 30 seconds
# max_retries: 3 # 3 retries
# max_backoff: 30 # 30 seconds
# part_size: 5242880 # 5 MiB
# aws_debug: true
# credentials: # <- Use this to specify non-default role assumption parameters
# local_profile: "s3-profile" # Use a local profile instead of the default one
# role_arn: "arn:aws:iam::123456789012:role/role_name" # Specify the role to assume
# external_id: "external_id" # Used when assuming a role
# role_session_name: "session_name" # Used when assuming a role
{{YEAR}}
, {{MONTH}}
, {{DAY}}
and {{HOUR}}
in the path to create a directory structure based on the current time. For example:path: "path/to/files/{{TABLE}}/dt={{YEAR}}-{{MONTH}}-{{DAY}}/{{UUID}}.parquet"
json
and csv
.append
write_mode
. The (top level) spec section is described in the Destination Spec Reference.batch_size
, batch_size_bytes
and batch_timeout
options (see below).bucket
(string
) (required)region
(string
) (required)credentials
(credentials) (optional)path
(string
) (required)path/to/files/{{TABLE}}/{{UUID}}.parquet
.{{TABLE}}
will be replaced with the table name{{TABLE_HYPHEN}}
will be replaced with the table name with hyphens instead of underscores.{{SYNC_ID}}
will be replaced with the unique identifier of the sync. This value is a UUID and is randomly generated for each sync.{{FORMAT}}
will be replaced with the file format, such as csv
, json
or parquet
. If compression is enabled, the format will be csv.gz
, json.gz
etc.{{UUID}}
will be replaced with a random UUID to uniquely identify each file{{YEAR}}
will be replaced with the current year in YYYY
format{{MONTH}}
will be replaced with the current month in MM
format{{DAY}}
will be replaced with the current day in DD
format{{HOUR}}
will be replaced with the current hour in HH
format{{MINUTE}}
will be replaced with the current minute in mm
formatUTC
and will be the current time at the time the file is written, not when the sync started.format
(string
) (required)csv
, json
and parquet
.format_spec
(format_spec) (optional)server_side_encryption_configuration
(server_side_encryption_configuration) (optional)compression
(string
) (optional) (default: ""
)""
or gzip
. Not supported for parquet
format.no_rotate
(boolean
) (optional) (default: false
)true
, the plugin will write to one file per table.
Otherwise, for every batch a new file will be created with a different .<UUID>
suffix.athena
(boolean
) (optional) (default: false
)athena
is set to true
, the S3 plugin will sanitize keys in JSON columns to be compatible with the Hive Metastore / Athena.
This allows tables to be created with a Glue Crawler and then queried via Athena, without changes to the table schema.write_empty_objects_for_empty_tables
(boolean
) (optional) (default: false
)test_write
(boolean
) (optional) (default: true
)false
to skip the test.endpoint
(string
) (optional) (default: ""
)https://s3.amazonaws.com/BUCKET/KEY
, use_path_style
should be enabled, too.acl
(string
) (optional) (default: ""
)private
, public-read
, public-read-write
, authenticated-read
, aws-exec-read
, bucket-owner-read
, bucket-owner-full-control
.endpoint_skip_tls_verify
(boolean
) (optional) (default: false
)endpoint
option.use_path_style
(boolean
) (optional) (default: false
)endpoint
option, i.e., https://s3.amazonaws.com/BUCKET/KEY
.
By default, the S3 client will use virtual hosted bucket addressing when possible (https://BUCKET.s3.amazonaws.com/KEY
).batch_size
(integer
) (optional) (default: 10000
)batch_size_bytes
(integer
) (optional) (default: 52428800
(= 50 MiB))batch_timeout
(duration
) (optional) (default: 30s
(30 seconds))delimiter
(string
) (optional) (default: ,
)skip_header
(boolean
) (optional) (default: false
)true
, the CSV file will not contain a header row as the first row.version
(string
) (optional) (default: v2Latest
)v1.0
, v2.4
, v2.6
and v2Latest
.
v2Latest
is an alias for the latest version available in the Parquet library which is currently v2.6
.root_repetition
(string
) (optional) (default: repeated
)undefined
, required
, optional
and repeated
.undefined
.max_row_group_length
(integer
) (optional) (default: 134217728
(= 128 * 1024 * 1024))sse_kms_key_id
(string
) (required)server_side_encryption
.server_side_encryption
(string
) (required)AES256
, aws:kms
and aws:kms:dsse
.local_profile
(string
) (default: will use current credentials)[default]
aws_access_key_id=xxxx
aws_secret_access_key=xxxx
[user1]
aws_access_key_id=xxxx
aws_secret_access_key=xxxx
local_profile
should be set to either default
or user1
.role_arn
(string
)role_session_name
(string
)role_arn
.external_id
(string
)role_arn
.delimiter
(string
) (optional) (default: ,
)skip_header
(boolean
) (optional) (default: false
)true
, the CSV file will not contain a header row as the first row.PutObject
permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it PutObject
permissions.AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, AWS_SESSION_TOKEN
environment variables.credentials
and config
files in ~/.aws
(the credentials
file takes priority).aws sso
to authenticate cloudquery - you can read more about it here.AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and
AWS_SESSION_TOKEN
environment variables (AWS_SESSION_TOKEN
can be optional for some accounts). For information on obtaining credentials, see the AWS guide.export AWS_ACCESS_KEY_ID='{Your AWS Access Key ID}'
export AWS_SECRET_ACCESS_KEY='{Your AWS secret access key}'
export AWS_SESSION_TOKEN='{Your AWS session token}'
credentials
and config
files in the .aws
directory in your home folder.
The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials
file.credentials
file:[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
[myprofile]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
AWS_PROFILE
environment variable (On Linux/Mac, similar for Windows):export AWS_PROFILE=myprofile
local_profile
field in plugin configuration (can be helpful for syncing between different accounts).aws sts get-session-token --serial-number <YOUR_MFA_SERIAL_NUMBER> --token-code <YOUR_MFA_TOKEN_CODE> --duration-seconds 3600
export AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>
export AWS_SESSION_TOKEN=<YOUR_SESSION_TOKEN>
endpoint
spec option. If you're using authentication, the region
option in the spec determines the signing region used.