Official

Azure Blob Storage destination integration documentation

This destination plugin lets you sync data from a CloudQuery source to remote Azure Blob Storage storage in various formats such as CSV, JSON and Parquet

Publisher

cloudquery

Repository

github.com

Latest version

v4.4.31

Type

Destination

Platforms

Date Published

Start Syncing

Documentation Changelog

Overview Licenses

Overview #

Azure Blob Storage Destination Plugin

This destination plugin lets you sync data from a CloudQuery source to remote Azure Blob Storage storage in various formats such as CSV, JSON and Parquet.

Authentication #

The plugin needs to be authenticated with your Azure account in order to fetch information about your cloud setup.

You can either authenticate with az login (when running locally), or by using a "service principal" and exporting environment variables (appropriate for automated deployments).

You can find out more about authentication with Azure at Azure's documentation for the Go SDK.

Example #

This example configures an Azure blob storage destination, to create CSV files in https://cqdestinationazblob.blob.core.windows.net/test/path/to/files.

The (top level) spec section is described in the Destination Spec Reference.

kind: destination
spec:
  name: "azblob"
  path: "cloudquery/azblob"
  registry: "cloudquery"
  version: "v4.4.31"
  send_sync_summary: true
  spec:
    storage_account: "cqdestinationazblob"
    container: "test"
    path: "path/to/files"

    format: "csv" # options: parquet, json, csv
    format_spec:
      # CSV specific parameters:
      # delimiter: ","
      # skip_header: false
      # Parquet specific parameters:
      # version: "v2Latest"
      # root_repetition: "repeated"
      # max_row_group_length: 134217728 # 128 * 1024 * 1024

    # Optional parameters
    # compression: "" # options: gzip
    # no_rotate: false
    # batch_size: 10000
    # batch_size_bytes: 52428800 # 50 MiB
    # batch_timeout: 30s

The Azure Blob destination utilizes batching, and supports batch_size, batch_size_bytes and batch_timeout options (see below).

Azure Blob Spec #

This is the (nested) spec used by the Azure blob destination Plugin.

storage_account (string) (required)
Storage account where to sync the files.
container (string) (required)
Storage container inside the storage account where to sync the files.
path (string) (required)
Path to where the files will be uploaded in the above bucket.
no_rotate (boolean) (optional) (default: false)
If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.
format (string) (required)
Format of the output file. Supported values are csv, json and parquet.
format_spec (format_spec) (optional)
Optional parameters to change the format of the file.
compression (string) (optional) (default: empty)
Compression algorithm to use. Supported values are empty or gzip. Not supported for parquet format.
batch_size (integer) (optional) (default: 10000)
Number of records to write before starting a new object.
batch_size_bytes (integer) (optional) (default: 52428800 (50 MiB))
Number of bytes (as Arrow buffer size) to write before starting a new object.
batch_timeout (duration) (optional) (default: 30s (30 seconds))
Maximum interval between batch writes.

format_spec #

CSV

delimiter (string) (optional) (default: ,)
Delimiter to use in the CSV file.
skip_header (boolean) (optional) (default: false)
If set to true, the CSV file will not contain a header row as the first row.

JSON

Reserved for future use.

Parquet

version (string) (optional) (default: v2Latest)
Parquet format version to use. Supported values are v1.0, v2.4, v2.6 and v2Latest. v2Latest is an alias for the latest version available in the Parquet library which is currently v2.6.
Useful when the reader consuming the Parquet files does not support the latest version.
root_repetition (string) (optional) (default: repeated)
Repetition option to use for the root node. Supported values are undefined, required, optional and repeated.
Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into Snowflake requires the root repetition to be undefined.
max_row_group_length (integer) (optional) (default: 134217728 (= 128 * 1024 * 1024))
The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/Azure/azure-sdk-for-go/sdk/azcore	MIT
github.com/Azure/azure-sdk-for-go/sdk/azidentity	MIT
github.com/Azure/azure-sdk-for-go/sdk/internal	MIT
github.com/Azure/azure-sdk-for-go/sdk/storage/azblob	MIT
github.com/AzureAD/microsoft-authentication-library-for-go/apps	MIT
github.com/JohnCGriffin/overflow	MIT
github.com/adrg/xdg	MIT
github.com/andybalholm/brotli	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apache/thrift/lib/go/thrift	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/codegen/jsonschema	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/golang-jwt/jwt/v5	MIT
github.com/golang/snappy	BSD-3-Clause
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/huandu/xstrings	MIT
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/klauspost/cpuid/v2	MIT
github.com/kylelemons/godebug	Apache-2.0
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pkg/browser	BSD-2-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/crypto/pkcs12	BSD-3-Clause
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/sync/errgroup	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation