Back to plugin list
Official
Premium
File
The CloudQuery File plugin syncs parquet files to any of the supported CloudQuery destinations
Publisher
cloudquery
Latest version
v1.5.0
Type
Source
Platforms
Date Published
Price per 1M rows
Starting from $15
monthly free quota
1M rows
Set up process #
brew install cloudquery/tap/cloudquery
1. Download CLI and login
2. Create source and destination configs
Plugin configurationOverview #
The CloudQuery File plugin reads parquet files and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
kind: source
spec:
name: file
path: cloudquery/file
registry: cloudquery
version: "v1.5.0"
tables: ["*"]
destinations: ["postgresql"]
# Learn more about the configuration options at https://cql.ink/file_source
spec:
files_dir: "/path/to/files-to-sync" # required. Path to the directory with files to sync
# Optional parameters
# rows_per_record: 500
# concurrency: 50
File spec #
This is the (nested) spec used by the File source plugin.
files_dir
(string
) (required)Path to the directory with files to sync. Only files with.parquet
extension will be synced.rows_per_record
(integer
) (optional) (default:500
)Amount of rows to be packed into a single Apache Arrow record to be sent over the wire during sync.concurrency
(integer
) (optional) (default:50
)Number of files to sync in parallel. Negative values mean no limit.
Example with AWS Cost and Usage Reports #
AWS Cost and Usage Reports are stored in S3 as parquet files. The following example shows how to sync these files and AWS infrastructure data to a PostgreSQL database.
To learn more about visualizing AWS Cost and Usage Reports, visit our dashboards page.
kind: source
spec:
name: file
path: cloudquery/file
registry: cloudquery
version: "v1.5.0"
destinations: [postgresql]
tables: ["*"]
spec:
files_dir: "/path/to/cost_and_usage_reports" # Update this value to the local directory with your AWS Cost and Usage Reports
---
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v28.1.0"
destinations: [postgresql]
tables: ["*"]
skip_tables:
- aws_ec2_vpc_endpoint_services
- aws_cloudtrail_events
- aws_docdb_cluster_parameter_groups
- aws_docdb_engine_versions
- aws_ec2_instance_types
- aws_elasticache_engine_versions
- aws_elasticache_parameter_groups
- aws_elasticache_reserved_cache_nodes_offerings
- aws_elasticache_service_updates
- aws_iam_group_last_accessed_details
- aws_iam_policy_last_accessed_details
- aws_iam_role_last_accessed_details
- aws_iam_user_last_accessed_details
- aws_neptune_cluster_parameter_groups
- aws_neptune_db_parameter_groups
- aws_rds_cluster_parameter_groups
- aws_rds_db_parameter_groups
- aws_rds_engine_versions
- aws_servicequotas_services
---
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
registry: cloudquery
version: "v8.6.4"
spec:
connection_string: postgresql://postgres:pass@localhost:5432/postgres