B

Official

Basic

This plugin is in preview.

The CloudQuery transformer plugin provides basic transformation capabilities, such as removing columns, adding literal string columns, obfuscating string columns and renaming tables

Publisher

cloudquery

Repository

github.com

Latest version

v2.6.2

Type

Transformer

Platforms

Date Published

Documentation Changelog

Overview Licenses

Overview #

This CloudQuery transformer plugin provides basic transformation capabilities:

Removing columns
Adding literal string columns
Adding a column with the timestamp that the record was processed by the transformer
Obfuscating string columns
Renaming tables using a name template (use {{.OldName}} to refer to the original name, see example below)
Normalizing column values to all-upper/lowercase
Dropping rows based on column values

Configuration #

First, add the transformer to your destination. For example, this will add a basic transformer to a PostgreSQL destination:

kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.0.7"
  write_mode: "overwrite-delete-stale"
  migrate_mode: forced # optional
  transformers:
    - "basic"

  spec:
    connection_string: "postgresql://your.user:your.password@localhost:5432/db_name"

The migrate_mode: forced setting might make sense if you plan on modifying the schema from a previous sync.

Then, add your transformer spec. Here's an example that transforms the XKCD source table:

kind: transformer
spec:
  name: "basic"
  path: "cloudquery/basic"
  version: v2.6.2
  spec:
    transformations:
      - kind: obfuscate_columns
        tables: ["xkcd_comics"]
        columns: ["safe_title", "title"]
      - kind: obfuscate_sensitive_columns
      - kind: remove_columns
        tables: ["xkcd_comics"]
        columns: ["transcript", "news"]
      - kind: add_column
        tables: ["xkcd_comics"]
        name: "source"
        value: "xkcd"
      - kind: add_primary_keys
        tables: ["xkcd_comics"]
        columns: ["_cq_source_name"]
      - kind: add_current_timestamp_column
        tables: ["xkcd_comics"]
        name: "_record_processed_at"
      - kind: change_table_names
        tables: ["*"]
        new_table_name_template: "cq_sync_{{.OldName}}"
      - kind: rename_column
        tables: ["xkcd_comics"]
        name: img
        value: img_url
      - kind: uppercase
        tables: ["xkcd_comics"]
        columns: ["title"]
      - kind: lowercase
        tables: ["xkcd_comics"]
        columns: ["title"]
      - kind: drop_rows
        tables: ["xkcd_comics"]
        columns: ["year"]
        value: "2023"

JSON is supported for removing paths and obfuscating string values, as well as lower/uppercasing field values. Array indexes are supported in both cases. For example, with a JSON column named tags:

{"foo":{"bar":["a","b","c"]},"hello":"world","kubectl.kubernetes.io/last-applied-configuration":"secrets"}

You can obfuscate "a" and remove "b", "world", and "secrets" with:

kind: transformer
spec:
  name: "basic"
  path: "cloudquery/basic"
  registry: "cloudquery"
  spec:
    transformations:
      - kind: obfuscate_columns
        tables: ["example"]
        columns: ["tags.foo.bar.0"]
      - kind: remove_columns
        tables: ["example"]
        columns: ["tags.hello", "tags.foo.bar.1", "tags.kubectl\\.kubernetes\\.io\\/last-applied-configuration"]

To obfuscate nested JSON arrays like: column example_column with value: {"top_foo":[{"foo": "baz0"},{"foo": "baz1"},{"foo": "baz2"}]} you can use the following syntax:

kind: transformer
spec:
  name: "basic"
  path: "cloudquery/basic"
  registry: "cloudquery"
  spec:
    transformations:
      - kind: obfuscate_columns
        tables: ["example"]
        columns: ["example_column.top_foo.#.foo"]

Note: Obfuscating JSON arrays using #.foo syntax will cause all foo values to be replaced with the same obfuscated value

{"top_foo":[{"foo": "Redacted by CloudQuery | XXX"},{"foo": "Redacted by CloudQuery | XXX"},{"foo": "Redacted by CloudQuery | XXX"}]}

You can also use the obfuscate_sensitive_columns transformation to automatically obfuscate all columns marked by the source plugin as sensitive and possibly containing secret information.

Note: transformations are applied sequentially. If you rename tables, the table matcher configuration of subsequent transformations will need to be updated to the new names. Note: escape syntax is SJSON sytax.

Edge Cases and limitations for drop_rows transformation:

Only non-list columns are supported
To drop rows with nil values, configure value: null, value: ~ or drop the value configuration altogether
To drop rows based on a JSON value, use the compacted version of the JSON. For example, if you want to drop rows where a JSON column tags has a value of {"foo": "bar"}, you should specify the value as {"foo":"bar"} without any whitespace.

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/adrg/xdg	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/tidwall/gjson	MIT
github.com/tidwall/match	MIT
github.com/tidwall/pretty	MIT
github.com/tidwall/sjson	MIT
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/sync/errgroup	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT