Report an issue
Back to destination list
azblob
Official

Azure Blob Storage

This destination plugin lets you sync data from a CloudQuery source to remote Azure Blob Storage storage in various formats such as CSV, JSON and Parquet

Publisher

cloudquery

Repositorygithub.com
Latest version

v4.3.1

Type

Destination

Platforms
Date Published

Price

Free

Overview #

Azure Blob Storage Destination Plugin

This destination plugin lets you sync data from a CloudQuery source to remote Azure Blob Storage storage in various formats such as CSV, JSON and Parquet.

Authentication #

The plugin needs to be authenticated with your Azure account in order to fetch information about your cloud setup.
You can either authenticate with az login (when running locally), or by using a "service principal" and exporting environment variables (appropriate for automated deployments).
You can find out more about authentication with Azure at Azure's documentation for the Go SDK.

Example #

This example configures an Azure blob storage destination, to create CSV files in https://cqdestinationazblob.blob.core.windows.net/test/path/to/files.
The (top level) spec section is described in the Destination Spec Reference.
kind: destination
spec:
  name: "azblob"
  path: "cloudquery/azblob"
  registry: "cloudquery"
  version: "v4.3.1"
  spec:
    storage_account: "cqdestinationazblob"
    container: "test"
    path: "path/to/files"

    format: "csv" # options: parquet, json, csv
    format_spec:
      # CSV specific parameters:
      # delimiter: ","
      # skip_header: false
      # Parquet specific parameters:
      # version: "v2Latest"
      # root_repetition: "repeated"
      # max_row_group_length: 134217728 # 128 * 1024 * 1024

    # Optional parameters
    # compression: "" # options: gzip
    # no_rotate: false
    # batch_size: 10000
    # batch_size_bytes: 52428800 # 50 MiB
    # batch_timeout: 30s
The Azure Blob destination utilizes batching, and supports batch_size, batch_size_bytes and batch_timeout options (see below).

Azure Blob Spec #

This is the (nested) spec used by the Azure blob destination Plugin.
  • storage_account (string) (required)
    Storage account where to sync the files.
  • container (string) (required)
    Storage container inside the storage account where to sync the files.
  • path (string) (required)
    Path to where the files will be uploaded in the above bucket.
  • no_rotate (boolean) (optional) (default: false)
    If set to true, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different .<UUID> suffix.
  • format (string) (required)
    Format of the output file. Supported values are csv, json and parquet.
  • format_spec (format_spec) (optional)
    Optional parameters to change the format of the file.
  • compression (string) (optional) (default: empty)
    Compression algorithm to use. Supported values are empty or gzip. Not supported for parquet format.
  • batch_size (integer) (optional) (default: 10000)
    Number of records to write before starting a new object.
  • batch_size_bytes (integer) (optional) (default: 52428800 (50 MiB))
    Number of bytes (as Arrow buffer size) to write before starting a new object.
  • batch_timeout (duration) (optional) (default: 30s (30 seconds))
    Maximum interval between batch writes.

format_spec #

CSV
  • delimiter (string) (optional) (default: ,)
    Delimiter to use in the CSV file.
  • skip_header (boolean) (optional) (default: false)
    If set to true, the CSV file will not contain a header row as the first row.
JSON
Reserved for future use.
Parquet
  • version (string) (optional) (default: v2Latest)
    Parquet format version to use. Supported values are v1.0, v2.4, v2.6 and v2Latest. v2Latest is an alias for the latest version available in the Parquet library which is currently v2.6.
    Useful when the reader consuming the Parquet files does not support the latest version.
  • root_repetition (string) (optional) (default: repeated)
    Repetition option to use for the root node. Supported values are undefined, required, optional and repeated.
    Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into Snowflake requires the root repetition to be undefined.
  • max_row_group_length (integer) (optional) (default: 134217728 (= 128 * 1024 * 1024))
    The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.


Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.