Official

Google Cloud Storage destination integration documentation

This destination plugin lets you sync data from a CloudQuery source to remote GCS (Google Cloud Storage) storage in various formats such as CSV, JSON and Parquet

Publisher

cloudquery

Repository

github.com

Latest version

v5.4.26

Type

Destination

Platforms

Date Published

Download CloudQuery CLI

Documentation Changelog

Overview Licenses

Overview #

GCS (Google Cloud Storage) Destination Plugin

This destination plugin lets you sync data from a CloudQuery source to remote GCS (Google Cloud Storage) storage in various formats such as CSV, JSON and Parquet.

This is useful in various use-cases, especially in data lakes where you can query the data direct from Athena or load it to various data warehouses such as BigQuery, RedShift, Snowflake and others.

Example #

This example configures a GCS destination, to create CSV files in gcs://bucket_name/path/to/files.

kind: destination
spec:
  name: "gcs"
  path: "cloudquery/gcs"
  registry: "cloudquery"
  version: "v5.4.26"
  write_mode: "append"
  spec:
    bucket: "bucket_name"
    path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"
    format: "parquet" # options: parquet, json, csv
    format_spec:
      # CSV specific parameters:
      # delimiter: ","
      # skip_header: false
      # Parquet specific parameters:
      # version: "v2Latest"
      # root_repetition: "repeated"
      # max_row_group_length: 134217728 # 128 * 1024 * 1024

    # Optional parameters
    # compression: "" # options: gzip
    # no_rotate: false
    # batch_size: 10000
    # batch_size_bytes: 52428800 # 50 MiB
    # batch_timeout: 30s

Note that the GCS plugin only supports append write_mode. The (top level) spec section is described in the Destination Spec Reference.

The GCS destination utilizes batching, and supports batch_size, batch_size_bytes and batch_timeout options (see below).

GCS Spec #

This is the (nested) spec used by the CSV destination Plugin.

bucket (string) (required)
Bucket where to sync the files.
path (string) (required)
Path to where the files will be uploaded in the above bucket, for example path/to/files/{{TABLE}}/{{UUID}}.parquet.
If no path variables are present, the path will be appended with TABLE, FORMAT and Compression extension by default.
The path supports the following placeholder variables:
- {{TABLE}} will be replaced with the table name
- {{SYNC_ID}} will be replaced with the unique identifier of the sync. This value is a UUID and is randomly generated for each sync.
- {{FORMAT}} will be replaced with the file format, such as csv, json or parquet. If compression is enabled, the format will be csv.gz, json.gz etc.
- {{UUID}} will be replaced with a random UUID to uniquely identify each file
- {{YEAR}} will be replaced with the current year in YYYY format
- {{MONTH}} will be replaced with the current month in MM format
- {{DAY}} will be replaced with the current day in DD format
- {{HOUR}} will be replaced with the current hour in HH format
- {{MINUTE}} will be replaced with the current minute in mm format
format (string) (required)
Format of the output file. Supported values are csv, json and parquet.
format_spec (format_spec) (optional)
Optional parameters to change the format of the file.
compression (string) (optional) (default: empty)
Compression algorithm to use. Supported values are empty or gzip. Not supported for parquet format.
no_rotate (boolean) (optional) (default: false)
If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.
batch_size (integer) (optional) (default: 10000)
Number of records to write before starting a new object.
batch_size_bytes (integer) (optional) (default: 52428800 (50 MiB))
Number of bytes (as Arrow buffer size) to write before starting a new object.
batch_timeout (duration) (optional) (default: 30s (30 seconds))
Maximum interval between batch writes.

format_spec #

CSV

delimiter (string) (optional) (default: ,)
Delimiter to use in the CSV file.
skip_header (boolean) (optional) (default: false)
If set to true, the CSV file will not contain a header row as the first row.

JSON

Reserved for future use.

Parquet

version (string) (optional) (default: v2Latest)
Parquet format version to use. Supported values are v1.0, v2.4, v2.6 and v2Latest. v2Latest is an alias for the latest version available in the Parquet library which is currently v2.6.
Useful when the reader consuming the Parquet files does not support the latest version.
root_repetition (string) (optional) (default: repeated)
Repetition option to use for the root node. Supported values are undefined, required, optional and repeated.
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into Snowflake requires the root repetition to be undefined.
max_row_group_length (integer) (optional) (default: 134217728 (= 128 * 1024 * 1024))
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.

Authentication #

The GCS plugin authenticates using your Application Default Credentials. Available options are all the same options described here in detail:

Local Environment:

gcloud auth application-default login (recommended when running locally)

Google Cloud cloud-based development environment:

When you run on Cloud Shell or Cloud Code credentials are already available.

Google Cloud containerized environment:

When running on GKE use workload identity.

Google Cloud services that support attaching a service account:

Services such as Compute Engine, App Engine and functions supporting attaching a user-managed service account which will CloudQuery will be able to utilize.

On-premises or another cloud provider

The suggested way is to use Workload identity federation
If not available you can always use service account keys and export the location of the key via GOOGLE_APPLICATION_CREDENTIALS. (Not recommended as long-lived keys are a security risk)

Licenses #

The following tools / packages are used in this plugin:

Name	License
cel.dev/expr	Apache-2.0
cloud.google.com/go/auth	Apache-2.0
cloud.google.com/go/auth/oauth2adapt	Apache-2.0
cloud.google.com/go/compute/metadata	Apache-2.0
cloud.google.com/go/iam	Apache-2.0
cloud.google.com/go/internal	Apache-2.0
cloud.google.com/go/monitoring	Apache-2.0
cloud.google.com/go/storage	Apache-2.0
github.com/GoogleCloudPlatform/opentelemetry-operations-go/detectors/gcp	Apache-2.0
github.com/GoogleCloudPlatform/opentelemetry-operations-go/exporter/metric	Apache-2.0
github.com/GoogleCloudPlatform/opentelemetry-operations-go/internal/resourcemapping	Apache-2.0
github.com/JohnCGriffin/overflow	MIT
github.com/adrg/xdg	MIT
github.com/andybalholm/brotli	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apache/thrift/lib/go/thrift	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/census-instrumentation/opencensus-proto/gen-go	Apache-2.0
github.com/cespare/xxhash/v2	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/codegen/jsonschema	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/cncf/xds/go	Apache-2.0
github.com/davecgh/go-spew/spew	ISC
github.com/envoyproxy/go-control-plane/envoy	Apache-2.0
github.com/envoyproxy/protoc-gen-validate/validate	Apache-2.0
github.com/felixge/httpsnoop	MIT
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/golang/groupcache/lru	Apache-2.0
github.com/golang/snappy	BSD-3-Clause
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/s2a-go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/googleapis/enterprise-certificate-proxy/client	Apache-2.0
github.com/googleapis/gax-go/v2	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/huandu/xstrings	MIT
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/klauspost/cpuid/v2	MIT
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opencensus.io	Apache-2.0
go.opentelemetry.io/contrib/detectors/gcp	Apache-2.0
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc	Apache-2.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp	Apache-2.0
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/crypto	BSD-3-Clause
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/oauth2	BSD-3-Clause
golang.org/x/sync	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/time/rate	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/api	BSD-3-Clause
google.golang.org/api/internal/third_party/uritemplates	BSD-3-Clause
google.golang.org/genproto/googleapis/api	Apache-2.0
google.golang.org/genproto/googleapis/rpc	Apache-2.0
google.golang.org/genproto/googleapis/type	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/grpc/stats/opentelemetry	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation