Back to plugin list
Official
Premium
PostgreSQL
The CloudQuery PostgreSQL plugin syncs your PostgreSQL database to any of the supported CloudQuery destinations
Publisher
cloudquery
Latest version
v6.7.2
Type
Source
Platforms
Date Published
Price per 1M rows
Starting from $15
monthly free quota
1M rows
Set up process #
brew install cloudquery/tap/cloudquery
1. Download CLI and login
2. Create source and destination configs
Plugin configurationOverview #
The CloudQuery PostgreSQL plugin syncs your PostgreSQL database to any of the supported CloudQuery destinations (e.g. PostgreSQL, BigQuery, Snowflake, and more).
This plugin also supports CDC via PostgreSQL logical replication, which enables keeping your PostgreSQL up to date with any destination by subscribing to changes.
Authentication #
In order to fetch information from PostgreSQL,
cloudquery
needs to be authenticated. Credentials can be inserted in the connection_string
field.Configuration #
This example configures a PostgreSQL source, located at
localhost:5432
. The (top level) spec section is described in the Source Spec Reference.kind: source
spec:
name: "postgresql"
registry: "cloudquery"
path: "cloudquery/postgresql"
version: "v6.7.2"
tables: ["*"]
destinations: ["postgresql"]
# Learn more about the configuration options at https://cql.ink/postgresql_source
spec:
connection_string: "${POSTGRESQL_CONNECTION_STRING}" # set the environment variable in a format like postgres://postgres:pass@localhost:5432/postgres?sslmode=disable
# Optional parameters:
# cdc_id: "postgresql" # Set to a unique string per source to enable Change Data Capture mode (logical replication, or CDC)
# pgx_log_level: error
# rows_per_record: 500
PostgreSQL Spec #
This is the (nested) spec used by the PostgreSQL source Plugin.
connection_string
(string
) (required)Connection string to connect to the database. This can be a URL or a DSN, as perpgxpool
"postgres://jack:secret@localhost:5432/mydb?sslmode=prefer"
connect with tcp and prefer TLS"postgres://jack:secret@localhost:5432/mydb?sslmode=disable&application_name=pgxtest&search_path=myschema&connect_timeout=5"
be explicit with all options"postgres://localhost:5432/mydb?sslmode=disable"
connect with os username cloudquery is being run as"postgres:///mydb?host=/tmp"
connect over unix socket"dbname=mydb"
unix domain socket, just specifying the db name - useful if you want to use peer authentication"user=jack password=jack\\'ssooper\\\\secret host=localhost port=5432 dbname=mydb sslmode=disable"
DSN with escaped backslash and single quote
pgx_log_level
(string
) (optional) (default:error
)cdc_id
(string
) (optional)If set to a non-empty string the source plugin will start syncing CDC via PostgreSQL logical replication in real-time. The value should be unique across all sources.rows_per_record
(integer
) (optional) (default:500
)Amount of rows to be packed into a single Apache Arrow record to be sent over the wire during sync (or initial sync in the CDC mode).destination_table_name
(string
) (optional) (default:{{TABLE}}
)The destination table name to write the data to. (Preview Feature)Supports the following placeholder variables:{{TABLE}}
will be replaced with the table name{{UUID}}
will be replaced with a random UUID in raw format to uniquely identify each table (*){{YEAR}}
will be replaced with the current year inYYYY
format (*){{MONTH}}
will be replaced with the current month inMM
format (*){{DAY}}
will be replaced with the current day inDD
format (*){{HOUR}}
will be replaced with the current hour inHH
format (*){{MINUTE}}
will be replaced with the current minute inmm
format (*)
At least one of{{TABLE}}
or{{UUID}}
is required. Dynamic placeholders (denoted with*
) are not allowed in CDC mode (whencdc_id
is set)Note that timestamps are inUTC
and will be the time sync started.
CDC #
PostgreSQL CDC (Logical Replication) Configuration
Overview #
Change Data Capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data.
In PostgreSQL CDC can be implemented using Logical Replication.
Logical Replication is a feature of PostgreSQL that allows you to stream changes from a database to another database, file or custom handler. It is also used to keep a copy of a database up to date with the original database in an efficient manner (Internally it is implemented by the so called Write-Ahead Log).
In this document we won't go into details of how Logical Replication works internally, but we will show you how to enable it in number of environments and how to configure CloudQuery PostgreSQL source plugin that can stream the changes to any of CloudQuery supported destinations.
The PostgreSQL source plugin streams changes directly to any CloudQuery destination without the need for additional infrastructure (e.g. Kafka, RabbitMQ, etc). This means the setup is much easier.
PSQL Test #
To test that the current PostgreSQL instance supports Logical Replication we can run the following command in PSQL:
SHOW wal_level;
or the following SQL query:
SELECT setting FROM pg_settings WHERE name='wal_level'
The default is
replication
but for CDC to work we need to set it to logical
(This can only be done on the database startup).Docker Setup #
If you are running it locally for testing purposes you can use the following command to docker to enable logical replication:
docker run -p 5432:5432 -e POSTGRES_PASSWORD=pass -d postgres:11 -c "wal_level=logical"
Self-hosted #
Change the
wal_level
to logical
in the postgresql.conf
file and restart the database.RDS #
You will need to create a custom RDS parameter group, associate it with your RDS instance, set
rds.logical_replication
to 1
and restart the database.See full resolution on AWS documentation
AWS Aurora serverless (V2) #
AWS Aurora serverless V1 doesn't support Logical Replication, but V2 does.
Similar to RDS, you will need to create a custom parameter group, associate it with your Aurora serverless instance, set
rds.logical_replication
to 1
and restart the database.See full resolution on AWS documentation
GCP Cloud SQL (PostgreSQL) #
You will need to set the
cloudsql.logical_deconging
configuration to on. See full documentation on GCP documentationAzure DB (PostgreSQL) #
Please follow the Official Azure documentation on how to enable logical replication via the CLI or Console (UI).
Current Limitations #
The following event types are not supported:
- DDL changes (e.g.
CREATE TABLE
,ALTER TABLE
, etc) TRUNCATE
statements (e.g.TRUNCATE TABLE table_name
)DELETE
statements (e.g.DELETE FROM table_name
)