Official

Premium

BigQuery source integration documentation

Sync from BigQuery to any destination

Publisher

cloudquery

Latest version

v1.11.6

Type

Source

Platforms

Date Published

Download CloudQuery CLI

Documentation Tables Changelog Destinations

Overview Incremental-Examples Licenses

Overview #

The CloudQuery BigQuery source plugin syncs tables from BigQuery to any of the supported CloudQuery destinations (e.g. PostgreSQL, BigQuery, Snowflake, and more).

Example Configuration #

This example syncs from BigQuery to a Postgres destination. The (top level) source spec section is described in the Source Spec Reference.

kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v1.11.6"
  tables: ["*"]
  destinations: ["postgresql"]
  # Learn more about the configuration options at https://cql.ink/bigquery_source
  spec:
    project_id: ${PROJECT_ID}
    dataset_id: ${DATASET_ID}
    # Optional parameters
    # dataset_location: ""
    # service_account_key_json: ""
    # endpoint: ""

This example above expects the following environment variables to be set:

PROJECT_ID - The Google Cloud Project ID
DATASET_ID - The Google Cloud BigQuery Dataset ID

Configuration Reference #

This is the (nested) spec used by the BigQuery source plugin.

project_id (string) (required)
The id of the project where the destination BigQuery database resides.
dataset_id (string) (required)
The id of the BigQuery dataset within the project, e.g. my_dataset.
dataset_location (string) (optional)
The data location of the BigQuery dataset. If set, will be used as the default location for job operations. Pro-tip: this can solve "dataset not found" issues for newly created datasets.
service_account_key_json (string) (optional) (default: empty).
GCP service account key content. This allows for using different service accounts for the GCP source and BigQuery destination. If using service account keys, it is best to use environment or file variable substitution.
endpoint (string) (optional)
The BigQuery API endpoint to use. This is useful for testing against a local emulator.
table_options (map of Table Options Spec) (optional)
Table options to set for specific tables. Untemplated table names are the keys. Only valid if queries is empty.
queries (list of queries Spec) (optional)
(Preview feature) List of queries to run instead of directly reading tables. The tables in top-level spec should be left as * or can be a subset of these tables.
destination_table_name (string) (optional) (default: {{TABLE}})
The destination table name to write the data to. (Preview Feature)
Supports the following placeholder variables:
- {{TABLE}} will be replaced with the table name
- {{DATASET}} will be replaced with the dataset name
- {{UUID}} will be replaced with a random UUID in raw format to uniquely identify each table
- {{YEAR}} will be replaced with the current year in YYYY format
- {{MONTH}} will be replaced with the current month in MM format
- {{DAY}} will be replaced with the current day in DD format
- {{HOUR}} will be replaced with the current hour in HH format
- {{MINUTE}} will be replaced with the current minute in mm format
At least one of {{TABLE}} or {{UUID}} is required.
Note that timestamps are in UTC and will be the time sync started.
concurrency (integer) (optional) (default: 100)
Number of tables to sync concurrently. Lower or increase this number based on your database size and available resources.
discovery_concurrency (integer) (optional) (default: 100)
Number of goroutines to use for discovering table schemas.
scheduler (string) (optional) (default: dfs)
The scheduler to use when determining the priority of resources to sync. Supported values are dfs (depth-first search), round-robin, shuffle and shuffle-queue.
For more information about this, see performance tuning.

Table Options Spec (Preview) #

Enables setting table options for specific tables. Map key is the name of the table in untemplated form.

incremental_column (string) (optional)

Name of the incremental column in the table. If empty, no incremental column will be used.

Queries Spec (Preview) #

Allows running arbitrary queries instead of fetching existing tables. Each query will be run as a separate table.

name (string) (required)
Name of the table to be generated from the query.
query (string) (required)
SQL query to run. Should have a {incremental_column} > @cursor or similar in the WHERE clause if incremental_column is set.
incremental_column (string) (optional)
Name of the incremental column in the query result set. The query must have a reference to @cursor in its WHERE clause, and a column with this name in the result set. If empty, no incremental column will be used.
incremental_column_type (enum) (required if incremental_column is set)
Type of the incremental column, one of TIMESTAMP, INTEGER or STRING. This is required if incremental_column is set.

Authentication #

The BigQuery plugin authenticates using your Application Default Credentials. Available options are all the same options described here in detail:

Local Environment:

gcloud auth application-default login (recommended when running locally)

Google Cloud cloud-based development environment:

When you run on Cloud Shell or Cloud Code credentials are already available.

Google Cloud containerized environment:

When running on GKE use workload identity.

Google Cloud services that support attaching a service account:

Services such as Compute Engine, App Engine and functions supporting attaching a user-managed service account which will CloudQuery will be able to utilize.

On-premises or another cloud provider

The suggested way is to use Workload identity federation
If not available you can always use service account keys and export the location of the key via GOOGLE_APPLICATION_CREDENTIALS. (Not recommended as long-lived keys are a security risk)

Underlying library #

We use the official cloud.google.com/go/bigquery package for database connection.

Incremental-Examples #

Configuration Examples for Incremental Sync

To sync tables incrementally each table needs to have an incremental_column specified in the table_options section of the configuration. The tables without an incremental_column will be synced fully.

Sync a Table Incrementally #

To sync a table incrementally, you need to specify the incremental_column in the table_options section of the configuration. Here's an example source spec to sync a list of tables incrementally:

kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v1.11.6"
  tables:
    - "my_table"
    - "another_table"
    - "yet_another_table"
  destinations: ["postgresql"]
  backend_options:
    table_name: "cq_state_bq"
    connection: "@@plugins.postgresql.connection"
  spec:
    project_id: my-project
    dataset_id: my_dataset
    table_options:
      my_table:
        incremental_column: updated_at
      another_table:
        incremental_column: id

In the example above, the my_table table will be synced incrementally based on the updated_at column. another_table will be synced incrementally based on the id column. yet_another_table will be synced fully.

The incremental column will be used to fetch only the new or updated rows from the source table and the state will be stored in the cq_state_bq table in the PostgreSQL destination.

Sync a Custom Query Incrementally #

To sync a custom query incrementally, you need to specify the incremental_column as well as incremental_column_type in the queries Spec. Here's an example source spec sync a custom query incrementally:

kind: source
spec:
  name: bigquery
  path: cloudquery/bigquery
  registry: cloudquery
  version: "v1.11.6"
  tables:
    - "my_query_result"
  destinations: ["postgresql"]
  backend_options:
    table_name: "cq_state_bq"
    connection: "@@plugins.postgresql.connection"
  spec:
    project_id: my-project
    dataset_id: my_dataset
    queries:
      - name: my_query_result
        query: "SELECT * FROM my_table WHERE updated_at > @cursor"
        incremental_column: updated_at
        incremental_column_type: timestamp

In the example above, the my_query_result table will be generated from the custom query SELECT * FROM my_table WHERE updated_at > @cursor and will be synced incrementally based on the updated_at column. @cursor is a placeholder that will be replaced with the last sync time. If no sync time is available, it will be replaced with the positive minimum value of that type.

For more information about managing state for incremental tables, see Managing Incremental Tables.

Licenses #

The following tools / packages are used in this plugin:

Name	License
cloud.google.com/go	Apache-2.0
cloud.google.com/go/auth	Apache-2.0
cloud.google.com/go/auth/oauth2adapt	Apache-2.0
cloud.google.com/go/bigquery	Apache-2.0
cloud.google.com/go/compute/metadata	Apache-2.0
cloud.google.com/go/iam	Apache-2.0
github.com/adrg/xdg	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow/go/v15	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/bahlo/generic-list-go	BSD-3-Clause
github.com/buger/jsonparser	MIT
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/felixge/httpsnoop	MIT
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/golang/groupcache/lru	Apache-2.0
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/s2a-go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/googleapis/enterprise-certificate-proxy/client	Apache-2.0
github.com/googleapis/gax-go/v2	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/invopop/jsonschema	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/mailru/easyjson	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/samber/lo	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/wk8/go-ordered-map/v2	Apache-2.0
github.com/zeebo/xxh3	BSD-2-Clause
go.opencensus.io	Apache-2.0
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc	Apache-2.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp	Apache-2.0
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/crypto	BSD-3-Clause
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/oauth2	BSD-3-Clause
golang.org/x/sync	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/time/rate	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/api	BSD-3-Clause
google.golang.org/api/internal/third_party/uritemplates	BSD-3-Clause
google.golang.org/genproto/googleapis/api	Apache-2.0
google.golang.org/genproto/googleapis/rpc	Apache-2.0
google.golang.org/genproto/googleapis/type/expr	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation