New
Join our webinar! Building a customizable and extensible cloud asset inventory at scale
Back to destination list
databricks
Official

Databricks

Sync your data from any supported CloudQuery source into the Databricks Data Intelligence Platform.

Publisher

cloudquery

Latest version

v1.3.4

Type

Destination

Platforms
Date Published

Price

Free

Overview #

Databricks destination plugin

This destination plugin lets you sync data from a CloudQuery source to Databricks.
Supported Databricks versions: >= 12

Configuration #

Example #

kind: destination
spec:
  name: "databricks"
  path: "cloudquery/databricks"
  registry: "cloudquery"
  version: "v1.3.4"
  write_mode: "append"
  spec:
    hostname: ${DATABRICKS_HOSTNAME} # optionally it can include protocol like https://abc.cloud.databricks.com
    http_path: ${DATABRICKS_HTTP_PATH} # HTTP path for SQL compute
    staging_path: ${DATABRICKS_STAGING_PATH} # Databricks FileStore or Unity volume path to store temporary files for staging
    auth:
      access_token: ${DATABRICKS_ACCESS_TOKEN}
    # Optional parameters
    # protocol: https
    # port: 443
    # catalog: ""
    # schema: "default"
    # migration_concurrency: 10
    # timeout: 1m
    # batch:
    #   size: 10000
    #   bytes: 5242880 # 5 MiB
    #   timeout: 20s
The (top level) spec section is described in the Destination Spec Reference.

Databricks spec #

This is the (nested) spec used by the Databricks destination plugin.
  • hostname (string) (required)
    SQL compute hostname. May optionally include protocol value as well (like https://server.databricks.com).
  • http_path (string) (required)
    SQL compute HTTP path.
  • staging_path (string) (required)
    Unity volume path where temporary (staging) files should be uploaded to.
  • auth (Auth spec) (required)
    Authentication options.
  • catalog (string) (required)
    Catalog to be used.
  • protocol (string) (optional) (default: https)
    Protocol for connecting to Databricks. Can be also specified in the hostname.
  • port (integer) (optional) (default: 443)
    Port for connecting to Databricks.
  • schema (string) (optional) (default: cloudquery)
    Schema to be used. If it doesn't exist, it will be created.
  • batch (Batching spec) (optional)
    Batching options.
  • migration_concurrency (integer) (optional) (default: 10)
    How many table operations will be performed in parallel during migration.
  • timeout (duration) (optional) (default: 1m (= 1 minute))
    Timeout for the queries.
Databricks authentication spec
This section allows specifying authentication method to connect to Databricks. Currently only personal access tokens are supported.
  • access_token (string) (required)
    Personal access token.
Batching spec
This section controls how data is batched for writing.
  • size (integer) (optional) (default: 10000)
    Maximum number of items that may be grouped together to be written in a single write.
  • bytes (integer) (optional) (default: 5242880 (= 5 MiB))
    Maximum size of items that may be grouped together to be written in a single write.
  • timeout (duration) (optional) (default: 1m (= 1 minute))
    Maximum interval between batch writes.


Types #

Apache Arrow type conversion #

The Databricks destination plugin supports most of Apache Arrow types. The following table shows the supported types and how they are mapped to Databricks data types.
Arrow Column TypeDatabricks Type
BinaryBINARY
Binary ViewBINARY
BooleanBOOLEAN
Date32DATE
Date64DATE
Decimal128 (Decimal)DECIMAL
Decimal256DECIMAL
Fixed Size BinaryBINARY
Fixed Size ListARRAY
Float16FLOAT
Float32FLOAT
Float64DOUBLE
Int8TINYINT
Int16SMALLINT
Int32INTEGER
Int64BIGINT
Large BinaryBINARY
Large ListARRAY
Large StringSTRING
ListARRAY
NullVOID
MapMAP
StringSTRING
String ViewSTRING
StructSTRUCT
Time32TIMESTAMP
Time64TIMESTAMP
TimestampTIMESTAMP
UUID (CloudQuery extension)STRING
Uint8SMALLINT
Uint16INTEGER
Uint32BIGINT
Uint64BIGINT


Licenses #

The following tools / packages are used in this plugin:
NameLicense
github.com/JohnCGriffin/overflowMIT
github.com/adrg/xdgMIT
github.com/andybalholm/brotliMIT
github.com/apache/arrow/go/v13Apache-2.0
github.com/apache/arrow-go/v18Apache-2.0
github.com/apache/thrift/lib/go/thriftApache-2.0
github.com/apapsch/go-jsonmerge/v2MIT
github.com/aws/aws-sdk-go-v2Apache-2.0
github.com/aws/aws-sdk-go-v2/configApache-2.0
github.com/aws/aws-sdk-go-v2/credentialsApache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imdsApache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsourcesApache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/iniApache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflightBSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encodingApache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-urlApache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanagerApache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemeteringApache-2.0
github.com/aws/aws-sdk-go-v2/service/ssoApache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidcApache-2.0
github.com/aws/aws-sdk-go-v2/service/stsApache-2.0
github.com/aws/smithy-goApache-2.0
github.com/aws/smithy-go/internal/sync/singleflightBSD-3-Clause
github.com/bahlo/generic-list-goBSD-3-Clause
github.com/buger/jsonparserMIT
github.com/cenkalti/backoff/v4MIT
github.com/cloudquery/cloudquery-api-goMPL-2.0
github.com/cloudquery/plugin-pb-goMPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/globMIT
github.com/cloudquery/plugin-sdk/v2/schemaMIT
github.com/cloudquery/plugin-sdk/v2/typesMPL-2.0
github.com/cloudquery/plugin-sdk/v4MPL-2.0
github.com/cloudquery/plugin-sdk/v4/globMIT
github.com/cloudquery/plugin-sdk/v4/scalarMIT
github.com/coreos/go-oidc/v3/oidcApache-2.0
github.com/databricks/databricks-sql-goApache-2.0
github.com/davecgh/go-spew/spewISC
github.com/ghodss/yamlMIT
github.com/go-jose/go-jose/v4Apache-2.0
github.com/go-jose/go-jose/v4/jsonBSD-3-Clause
github.com/go-logr/logrApache-2.0
github.com/go-logr/stdrApache-2.0
github.com/goccy/go-jsonMIT
github.com/golang/snappyBSD-3-Clause
github.com/google/flatbuffers/goApache-2.0
github.com/google/uuidBSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptorsApache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2BSD-3-Clause
github.com/hashicorp/go-cleanhttpMPL-2.0
github.com/hashicorp/go-retryablehttpMPL-2.0
github.com/huandu/xstringsMIT
github.com/invopop/jsonschemaMIT
github.com/klauspost/compressApache-2.0
github.com/klauspost/compress/internal/snaprefBSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhashMIT
github.com/klauspost/cpuid/v2MIT
github.com/mailru/easyjsonMIT
github.com/mattn/go-colorableMIT
github.com/mattn/go-isattyMIT
github.com/oapi-codegen/runtimeApache-2.0
github.com/pierrec/lz4/v4BSD-3-Clause
github.com/pkg/browserBSD-2-Clause
github.com/pkg/errorsBSD-2-Clause
github.com/pmezard/go-difflib/difflibBSD-3-Clause
github.com/rs/zerologMIT
github.com/santhosh-tekuri/jsonschema/v6Apache-2.0
github.com/shopspring/decimalMIT
github.com/spf13/cobraApache-2.0
github.com/spf13/pflagBSD-3-Clause
github.com/stretchr/testifyMIT
github.com/thoas/go-funkMIT
github.com/wk8/go-ordered-map/v2Apache-2.0
github.com/zeebo/xxh3BSD-2-Clause
go.opentelemetry.io/otelApache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttpApache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttpApache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptraceApache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttpApache-2.0
go.opentelemetry.io/otel/logApache-2.0
go.opentelemetry.io/otel/metricApache-2.0
go.opentelemetry.io/otel/sdkApache-2.0
go.opentelemetry.io/otel/sdk/logApache-2.0
go.opentelemetry.io/otel/sdk/metricApache-2.0
go.opentelemetry.io/otel/traceApache-2.0
go.opentelemetry.io/proto/otlpApache-2.0
golang.org/x/crypto/pbkdf2BSD-3-Clause
golang.org/x/expBSD-3-Clause
golang.org/x/netBSD-3-Clause
golang.org/x/oauth2BSD-3-Clause
golang.org/x/sync/errgroupBSD-3-Clause
golang.org/x/sysBSD-3-Clause
golang.org/x/textBSD-3-Clause
golang.org/x/xerrorsBSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbodyApache-2.0
google.golang.org/genproto/googleapis/rpc/statusApache-2.0
google.golang.org/grpcApache-2.0
google.golang.org/protobufBSD-3-Clause
gopkg.in/yaml.v2Apache-2.0
gopkg.in/yaml.v3MIT


Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.