Official

File destination integration documentation

This destination plugin lets you sync data from a CloudQuery source to local files in various formats. It currently supports CSV, line-delimited JSON and Parquet

Publisher

cloudquery

Repository

github.com

Latest version

v5.4.23

Type

Destination

Platforms

Date Published

Download CloudQuery CLI

Documentation Changelog

Overview Licenses

Overview #

File Destination Plugin

This destination plugin lets you sync data from a CloudQuery source to local files in various formats. It currently supports CSV, line-delimited JSON and Parquet.

This plugin is useful in local environments, but also in production environments where scalability, performance and cost are requirements. For example, this plugin can be used as part of a system that syncs sources across multiple virtual machines, uploads Parquet files to a remote storage (such as S3 or GCS), and finally loads them to data lakes such as BigQuery or Athena in batch mode. If this is your end goal, you may also want to look at more specific destination cloud storage destination plugins such as S3, GCS or Azure Blob Storage.

Example #

This example configures the file destination, to create CSV files in ./cq_csv_output. You can also choose json or parquet as the output format.

kind: destination
spec:
  name: "file"
  path: "cloudquery/file"
  registry: "cloudquery"
  version: "v5.4.23"
  write_mode: "append"
  # Learn more about the configuration options at https://cql.ink/file_destination
  spec:
    path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"
    format: "parquet" # options: parquet, json, csv
    # Optional parameters
    # format_spec:
      # CSV specific parameters:
      # delimiter: ","
      # skip_header: false
      # Parquet specific parameters:
      # version: "v2Latest"
      # root_repetition: "repeated"
      # max_row_group_length: 134217728 # 128 * 1024 * 1024
    # compression: "" # options: gzip
    # no_rotate: false
    # batch_size: 10000
    # batch_size_bytes: 52428800 # 50 MiB
    # batch_timeout: 30s

Note that the file plugin only supports append write_mode. The (top level) spec section is described in the Destination Spec Reference.

File Spec #

This is the (nested) spec used by the file destination Plugin.

path (string) (required)
Path template string that determines where files will be written, for example path/to/files/{{TABLE}}/{{UUID}}.parquet.
The path supports the following placeholder variables:
- {{TABLE}} will be replaced with the table name
- {{FORMAT}} will be replaced with the file format, such as csv, json or parquet. If compression is enabled, the format will be csv.gz, json.gz etc.
- {{UUID}} will be replaced with a random UUID to uniquely identify each file
- {{YEAR}} will be replaced with the current year in YYYY format
- {{MONTH}} will be replaced with the current month in MM format
- {{DAY}} will be replaced with the current day in DD format
- {{HOUR}} will be replaced with the current hour in HH format
- {{MINUTE}} will be replaced with the current minute in mm format
Note that timestamps are in UTC and will be the current time at the time the file is written, not when the sync started.
format (string) (required)
Format of the output file. Supported values are csv, json and parquet.
format_spec (format_spec) (optional)
Optional parameters to change the format of the file.
no_rotate (boolean) (optional) (default: false)
If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.
compression (string) (optional) (default: "")
Compression algorithm to use. Supported values are "" and gzip. Not supported for parquet format.
batch_size (integer) (optional) (default: 10000)
Number of records to write before starting a new file.
batch_size_bytes (integer) (optional) (default: 52428800 (50 MiB))
Number of bytes (as Arrow buffer size) to write before starting a new file.
batch_timeout (duration) (optional) (default: 30s (30 seconds))
Maximum interval between batch writes.

format_spec #

CSV

delimiter (string) (optional) (default: ,)
Delimiter to use in the CSV file.
skip_header (boolean) (optional) (default: false)
If set to true, the CSV file will not contain a header row as the first row.

JSON

Reserved for future use.

Parquet

version (string) (optional) (default: v2Latest)
Parquet format version to use. Supported values are v1.0, v2.4, v2.6 and v2Latest. v2Latest is an alias for the latest version available in the Parquet library which is currently v2.6.
Useful when the reader consuming the Parquet files does not support the latest version.
root_repetition (string) (optional) (default: repeated)
Repetition option to use for the root node. Supported values are undefined, required, optional and repeated.
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into Snowflake requires the root repetition to be undefined.
max_row_group_length (integer) (optional) (default: 134217728 (= 128 * 1024 * 1024))
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/JohnCGriffin/overflow	MIT
github.com/adrg/xdg	MIT
github.com/andybalholm/brotli	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apache/thrift/lib/go/thrift	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/codegen/jsonschema	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/golang/snappy	BSD-3-Clause
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/huandu/xstrings	MIT
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/klauspost/cpuid/v2	MIT
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/sync/errgroup	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation

Test CloudQuery's capabilities with a demo

File destination integration documentation

Overview #

File Destination Plugin

Example #

File Spec #

format_spec #

CSV

JSON

Parquet

Licenses #

Overview #

File Destination Plugin

Example #

File Spec #

format_spec #

CSV

JSON

Parquet

Licenses #