Back to source listdata:image/s3,"s3://crabby-images/4479a/4479accf57e02b73e106d4d4db74b9210ba4a85f" alt="file"
Official
Premium
File
The CloudQuery File plugin syncs parquet files to any of the supported CloudQuery destinations
Publisher
cloudquery
Latest version
v1.6.7
Type
Source
Platforms
Date Published
Overview #
The CloudQuery File plugin reads parquet files and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
kind: source
spec:
name: file
path: cloudquery/file
registry: cloudquery
version: "v1.6.7"
tables: ["*"]
destinations: ["postgresql"]
# Learn more about the configuration options at https://cql.ink/file_source
spec:
files_dir: "/path/to/files-to-sync" # required. Path to the directory with files to sync
# Optional parameters
# rows_per_record: 500
# concurrency: 50
File spec #
This is the (nested) spec used by the File source plugin.
files_dir
(string
) (required)Path to the directory with files to sync. Only files with.parquet
extension will be synced.rows_per_record
(integer
) (optional) (default:500
)Amount of rows to be packed into a single Apache Arrow record to be sent over the wire during sync.concurrency
(integer
) (optional) (default:50
)Number of files to sync in parallel. Negative values mean no limit.
Example with AWS Cost and Usage Reports #
AWS Cost and Usage Reports are stored in S3 as parquet files. The following example shows how to sync these files and AWS infrastructure data to a PostgreSQL database.
To learn more about visualizing AWS Cost and Usage Reports, visit our dashboards page.
kind: source
spec:
name: file
path: cloudquery/file
registry: cloudquery
version: "v1.6.7"
destinations: [postgresql]
tables: ["*"]
spec:
files_dir: "/path/to/cost_and_usage_reports" # Update this value to the local directory with your AWS Cost and Usage Reports
---
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v32.6.0"
destinations: [postgresql]
tables: ["*"]
skip_tables:
- aws_ec2_vpc_endpoint_services
- aws_cloudtrail_events
- aws_docdb_cluster_parameter_groups
- aws_docdb_engine_versions
- aws_ec2_instance_types
- aws_elasticache_engine_versions
- aws_elasticache_parameter_groups
- aws_elasticache_reserved_cache_nodes_offerings
- aws_elasticache_service_updates
- aws_iam_group_last_accessed_details
- aws_iam_policy_last_accessed_details
- aws_iam_role_last_accessed_details
- aws_iam_user_last_accessed_details
- aws_neptune_cluster_parameter_groups
- aws_neptune_db_parameter_groups
- aws_rds_cluster_parameter_groups
- aws_rds_db_parameter_groups
- aws_rds_engine_versions
- aws_servicequotas_services
---
kind: destination
spec:
name: postgresql
path: cloudquery/postgresql
registry: cloudquery
version: "v8.7.10"
spec:
connection_string: postgresql://postgres:pass@localhost:5432/postgres