Back to destination list
Official
File
This destination plugin lets you sync data from a CloudQuery source to local files in various formats. It currently supports CSV, line-delimited JSON and Parquet
Price
Free
Overview #
File Destination Plugin
This destination plugin lets you sync data from a CloudQuery source to local files in various formats. It currently supports CSV, line-delimited JSON and Parquet.
This plugin is useful in local environments, but also in production environments where scalability, performance and cost are requirements. For example, this plugin can be used as part of a system that syncs sources across multiple virtual machines, uploads Parquet files to a remote storage (such as S3 or GCS), and finally loads them to data lakes such as BigQuery or Athena in batch mode. If this is your end goal, you may also want to look at more specific destination cloud storage destination plugins such as S3, GCS or Azure Blob Storage.
Example #
This example configures the file destination, to create CSV files in
./cq_csv_output
. You can also choose json
or parquet
as the output format.kind: destination
spec:
name: "file"
path: "cloudquery/file"
registry: "cloudquery"
version: "v5.3.1"
write_mode: "append"
# Learn more about the configuration options at https://cql.ink/file_destination
spec:
path: "path/to/files/{{TABLE}}/{{UUID}}.{{FORMAT}}"
format: "parquet" # options: parquet, json, csv
# Optional parameters
# format_spec:
# CSV specific parameters:
# delimiter: ","
# skip_header: false
# Parquet specific parameters:
# version: "v2Latest"
# root_repetition: "repeated"
# max_row_group_length: 134217728 # 128 * 1024 * 1024
# compression: "" # options: gzip
# no_rotate: false
# batch_size: 10000
# batch_size_bytes: 52428800 # 50 MiB
# batch_timeout: 30s
Note that the file plugin only supports
append
write_mode
. The (top level) spec section is described in the Destination Spec Reference.File Spec #
This is the (nested) spec used by the file destination Plugin.
path
(string
) (required)Path template string that determines where files will be written, for examplepath/to/files/{{TABLE}}/{{UUID}}.parquet
.The path supports the following placeholder variables:{{TABLE}}
will be replaced with the table name{{FORMAT}}
will be replaced with the file format, such ascsv
,json
orparquet
. If compression is enabled, the format will becsv.gz
,json.gz
etc.{{UUID}}
will be replaced with a random UUID to uniquely identify each file{{YEAR}}
will be replaced with the current year inYYYY
format{{MONTH}}
will be replaced with the current month inMM
format{{DAY}}
will be replaced with the current day inDD
format{{HOUR}}
will be replaced with the current hour inHH
format{{MINUTE}}
will be replaced with the current minute inmm
format
Note that timestamps are inUTC
and will be the current time at the time the file is written, not when the sync started.format
(string
) (required)Format of the output file. Supported values arecsv
,json
andparquet
.format_spec
(format_spec) (optional)Optional parameters to change the format of the file.no_rotate
(boolean
) (optional) (default:false
)If set totrue
, the plugin will write to one file per table. Otherwise, for every batch a new file will be created with a different.<UUID>
suffix.compression
(string
) (optional) (default:""
)Compression algorithm to use. Supported values are""
andgzip
. Not supported forparquet
format.batch_size
(integer
) (optional) (default:10000
)Number of records to write before starting a new file.batch_size_bytes
(integer
) (optional) (default:52428800
(50 MiB))Number of bytes (as Arrow buffer size) to write before starting a new file.batch_timeout
(duration
) (optional) (default:30s
(30 seconds))Maximum interval between batch writes.
format_spec #
CSV
delimiter
(string
) (optional) (default:,
)Delimiter to use in the CSV file.skip_header
(boolean
) (optional) (default:false
)If set totrue
, the CSV file will not contain a header row as the first row.
JSON
Reserved for future use.
Parquet
version
(string
) (optional) (default:v2Latest
)Parquet format version to use. Supported values arev1.0
,v2.4
,v2.6
andv2Latest
.v2Latest
is an alias for the latest version available in the Parquet library which is currentlyv2.6
.Useful when the reader consuming the Parquet files does not support the latest version.root_repetition
(string
) (optional) (default:repeated
)Repetition option to use for the root node. Supported values areundefined
,required
,optional
andrepeated
.Some Parquet readers require a specific root repetition option to be able to read the file. For example, importing Parquet files into Snowflake requires the root repetition to beundefined
.max_row_group_length
(integer
) (optional) (default:134217728
(= 128 * 1024 * 1024))The maximum number of rows in a single row group. Use a lower number to reduce memory usage when reading the Parquet files, and a higher number to increase the efficiency of reading the Parquet files.