Back to plugin list
file
Official
Premium

File

The CloudQuery File plugin syncs parquet files to any of the supported CloudQuery destinations

Publisher

cloudquery

Latest version

v1.4.7

Type

Source

Platforms
Date Published

Price per 1M rows

Starting from $15

monthly free quota

1M rows

Set up process #


brew install cloudquery/tap/cloudquery

1. Download CLI and login

See installation options

2. Create source and destination configs

Plugin configuration

cloudquery sync file.yml postgresql.yml

3. Run the sync

CloudQuery sync

Overview #

The CloudQuery File plugin reads parquet files and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
kind: source
spec:
  name: file
  path: cloudquery/file
  registry: cloudquery
  version: "v1.4.7"
  tables: ["*"]
  destinations: ["postgresql"]

  spec:
    files_dir: "/path/to/files-to-sync" # required. Path to the directory with files to sync
    # Optional parameters
    # rows_per_record: 500
    # concurrency: 50

File spec #

This is the (nested) spec used by the File source plugin.
  • files_dir (string) (required)
    Path to the directory with files to sync. Only files with .parquet extension will be synced.
  • rows_per_record (integer) (optional) (default: 500)
    Amount of rows to be packed into a single Apache Arrow record to be sent over the wire during sync.
  • concurrency (integer) (optional) (default: 50)
    Number of files to sync in parallel. Negative values mean no limit.

Example with AWS Cost and Usage Reports #

AWS Cost and Usage Reports are stored in S3 as parquet files. The following example shows how to sync these files and AWS infrastructure data to a PostgreSQL database. To learn more about visualizing AWS Cost and Usage Reports, visit our dashboards page.
kind: source
spec:
  name: file
  path: cloudquery/file
  registry: cloudquery
  version: "v1.4.7"
  destinations: [postgresql]
  tables: ["*"]
  spec:
    files_dir: "/path/to/cost_and_usage_reports" # Update this value to the local directory with your AWS Cost and Usage Reports
---
kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v27.18.0"
  destinations: [postgresql]
  tables: ["*"]
  skip_tables:
    - aws_ec2_vpc_endpoint_services 
    - aws_cloudtrail_events
    - aws_docdb_cluster_parameter_groups
    - aws_docdb_engine_versions
    - aws_ec2_instance_types
    - aws_elasticache_engine_versions
    - aws_elasticache_parameter_groups
    - aws_elasticache_reserved_cache_nodes_offerings
    - aws_elasticache_service_updates
    - aws_iam_group_last_accessed_details
    - aws_iam_policy_last_accessed_details
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
    - aws_neptune_cluster_parameter_groups
    - aws_neptune_db_parameter_groups
    - aws_rds_cluster_parameter_groups
    - aws_rds_db_parameter_groups
    - aws_rds_engine_versions
    - aws_servicequotas_services
---
kind: destination
spec:
  name: postgresql
  path: cloudquery/postgresql
  registry: cloudquery
  version: "v8.5.4"
  spec:
    connection_string: postgresql://postgres:pass@localhost:5432/postgres