Back to plugin list
file
Official
Premium

File

The CloudQuery File plugin syncs parquet files to any of the supported CloudQuery destinations

Publisher

cloudquery

Latest version

v1.2.1

Type

Source

Platforms
Date Published

Feb 20, 2024

Price per 1M rows

$100

free quota

1M rows

Set up process


brew install cloudquery/tap/cloudquery

1. Download CLI and login

See installation options

2. Create source and destination configs

Plugin configuration

cloudquery sync file.yml postgresql.yml

3. Run the sync

CloudQuery sync

Overview

The CloudQuery File plugin reads parquet files and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
kind: source
spec:
  name: file
  path: cloudquery/file
  registry: cloudquery
  version: "v1.2.1"
  tables: ["*"]
  destinations: ["postgresql"]

  spec:
    files_dir: "/path/to/files-to-sync" # required. Path to the directory with files to sync
    # concurrency: 50 # optional. Number of files to sync in parallel. Default: 50

File spec

This is the (nested) spec used by the File source plugin.
  • files_dir (string) (required)
    Path to the directory with files to sync. Only files with .parquet extension will be synced.
  • concurrency (integer) (optional) (default: 50)
    Number of files to sync in parallel. Negative values mean no limit.

Example with AWS Cost and Usage Reports

AWS Cost and Usage Reports are stored in S3 as parquet files. The following example shows how to sync these files and AWS infrastructure data to a PostgreSQL database. To learn more about visualizing AWS Cost and Usage Reports, visit our dashboards page.
kind: source
spec:
  name: file
  path: cloudquery/file
  registry: cloudquery
  version: "v1.2.1"
  destinations: [postgresql]
  tables: ["*"]
  spec:
    files_dir: "/path/to/cost_and_usage_reports" # Update this value to the local directory with your AWS Cost and Usage Reports
---
kind: source
spec:
  name: aws
  path: cloudquery/aws
  registry: cloudquery
  version: "v24.3.3"
  destinations: [postgresql]
  tables: ["*"]
  skip_tables:
    - aws_ec2_vpc_endpoint_services 
    - aws_cloudtrail_events
    - aws_docdb_cluster_parameter_groups
    - aws_docdb_engine_versions
    - aws_ec2_instance_types
    - aws_elasticache_engine_versions
    - aws_elasticache_parameter_groups
    - aws_elasticache_reserved_cache_nodes_offerings
    - aws_elasticache_service_updates
    - aws_iam_group_last_accessed_details
    - aws_iam_policy_last_accessed_details
    - aws_iam_role_last_accessed_details
    - aws_iam_user_last_accessed_details
    - aws_neptune_cluster_parameter_groups
    - aws_neptune_db_parameter_groups
    - aws_rds_cluster_parameter_groups
    - aws_rds_db_parameter_groups
    - aws_rds_engine_versions
    - aws_servicequotas_services
---
kind: destination
spec:
  name: postgresql
  path: cloudquery/postgresql
  registry: cloudquery
  version: "v7.3.6"
  spec:
    connection_string: postgresql://postgres:pass@localhost:5432/postgres


Subscribe to product updates

Be the first to know about new features.