Back to destination list
Official
Databricks
Sync your data from any supported CloudQuery source into the Databricks Data Intelligence Platform.
Publisher
cloudquery
Latest version
v1.3.8
Type
Destination
Platforms
Date Published
Overview #
Databricks destination plugin
This destination plugin lets you sync data from a CloudQuery source to Databricks.
Supported Databricks versions: >=
12
Configuration #
Example #
kind: destination
spec:
name: "databricks"
path: "cloudquery/databricks"
registry: "cloudquery"
version: "v1.3.8"
write_mode: "append"
spec:
hostname: ${DATABRICKS_HOSTNAME} # optionally it can include protocol like https://abc.cloud.databricks.com
http_path: ${DATABRICKS_HTTP_PATH} # HTTP path for SQL compute
staging_path: ${DATABRICKS_STAGING_PATH} # Databricks FileStore or Unity volume path to store temporary files for staging
auth:
access_token: ${DATABRICKS_ACCESS_TOKEN}
# Optional parameters
# protocol: https
# port: 443
# catalog: ""
# schema: "default"
# migration_concurrency: 10
# timeout: 1m
# batch:
# size: 10000
# bytes: 5242880 # 5 MiB
# timeout: 20s
The (top level) spec section is described in the Destination Spec Reference.
Databricks spec #
This is the (nested) spec used by the Databricks destination plugin.
hostname
(string
) (required)SQL compute hostname. May optionally includeprotocol
value as well (likehttps://server.databricks.com
).http_path
(string
) (required)SQL compute HTTP path.staging_path
(string
) (required)Unity volume path where temporary (staging) files should be uploaded to.auth
(Auth spec) (required)Authentication options.catalog
(string
) (required)Catalog to be used.protocol
(string
) (optional) (default:https
)Protocol for connecting to Databricks. Can be also specified in thehostname
.port
(integer
) (optional) (default:443
)Port for connecting to Databricks.schema
(string
) (optional) (default:cloudquery
)Schema to be used. If it doesn't exist, it will be created.batch
(Batching spec) (optional)Batching options.migration_concurrency
(integer
) (optional) (default:10
)How many table operations will be performed in parallel during migration.timeout
(duration
) (optional) (default:1m
(= 1 minute))Timeout for the queries.
Databricks authentication spec
This section allows specifying authentication method to connect to Databricks.
Currently only personal access tokens are supported.
access_token
(string
) (required)Personal access token.
Batching spec
This section controls how data is batched for writing.
size
(integer
) (optional) (default:10000
)Maximum number of items that may be grouped together to be written in a single write.bytes
(integer
) (optional) (default:5242880
(= 5 MiB))Maximum size of items that may be grouped together to be written in a single write.timeout
(duration
) (optional) (default:1m
(= 1 minute))Maximum interval between batch writes.
Types #
Apache Arrow type conversion #
The Databricks destination plugin supports most of Apache Arrow types.
The following table shows the supported types and how they are mapped
to Databricks data types.