Official

Opensearch destination integration documentation

This destination plugin lets you sync data from a CloudQuery source to an Opensearch compatible service.

Publisher

cloudquery

Latest version

v1.2.35

Type

Destination

Platforms

Date Published

Start Syncing

Documentation Changelog

Overview Types Licenses

Overview #

Opensearch Destination Plugin

The Opensearch plugin syncs data from any CloudQuery source plugin(s) to an Opensearch cluster.

Example config #

The following config will sync data to an Opensearch cluster running on localhost:9200:

kind: destination
spec:
  name: opensearch
  path: cloudquery/opensearch
  registry: cloudquery
  version: "v1.2.35"
  write_mode: "overwrite-delete-stale"
  spec:
    # Optional parameters
    # addresses: ["http://localhost:9200"]
    # username: ""
    # password: ""
    # ca_cert: ""
    # concurrency: 5 # default: number of CPUs
    # batch_size: 1000
    # batch_size_bytes: 5242880 # 5 MiB
    # aws_signing:
    #  region: "us-west-2"

The Opensearch destination utilizes batching, and supports batch_size and batch_size_bytes.

It supports append, overwrite and overwrite-delete-stale write modes. The default write mode is overwrite-delete-stale.

Opensearch Spec #

This is the spec used by the Opensearch destination plugin.

addresses ([]string) (optional) (default: ["http://localhost:9200"])
A list of Opensearch nodes to use.
username (string) (optional)
Username for HTTP Basic Authentication.
password (string) (optional)
Password for HTTP Basic Authentication.
ca_cert (string) (optional)
PEM-encoded certificate authorities. When set, an empty certificate pool will be created, and the certificates will be appended to it. See file variable substitution for how to read this value from a file.
concurrency (string) (optional) (default: number of CPUs)
Number of concurrent worker goroutines to use for indexing.
batch_size (integer) (optional) (default: 1000)
Maximum number of items that may be grouped together to be written in a single write.
batch_size_bytes (integer) (optional) (default: 5242880 (5 MiB))
Maximum size of items that may be grouped together to be written in a single write.
aws_signing (aws_signing_spec) (optional)
AWS signing configuration used to enable AWS request signing for requests to the AWS Opensearch Service.

aws_signing_spec #

region (string) (required)
AWS region to use for signing.

Index Template Creation #

The Opensearch destination will create an index template for every table during the migration step. It is recommended that you use the generated index templates, as it will automatically create indexes with the correct mappings for the table. However, to skip index template creation (or use your own), you may use the --no-migrate option when running cloudquery sync.

Index Naming #

Index names will be formatted according to the selected write mode:

append: indexes will be named using the format <table_name>-<YYYY-MM-DD>. In other words, a new index will be created every day the table is synced. Entries will never be overwritten.
overwrite: indexes will be named using the format <table_name>. Objects with duplicate primary keys will be overwritten.
overwrite-delete-stale: indexes will be named using the format <table_name>. Objects with duplicate primary keys will be overwritten, and any objects that are not present in the current sync will be deleted.

Index templates will also be created such that they match the index names generated by the selected write mode.

Querying From Opensearch Dashboard #

To query data from the Opensearch Dashboard, you will need to configure the index patterns. To query a specific table, the index pattern should be in the format <table_name>-*. For example, if you have a table named aws_ec2_instances, you should create a data view with index pattern named aws_ec2_instances-*. One useful feature of Opensearch, however, is the ability to query across all data. To do this for the aws source plugin, for example, you may use an index pattern named aws_*. This will then allow queries across all tables synced by the aws source plugin.

Underlying library #

We use the official opensearch-go package. It is tested against Opensearch 8.6.0. Please open an issue if you encounter any problems with this (or another) version.

Authentication #

The plugin authenticates with your account(s) using AWS request signing.

There are multiple ways to authenticate with AWS, and the plugin respects the AWS credential provider chain. This means that CloudQuery will follow the following priorities when attempting to authenticate:

The AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN environment variables.
The credentials and config files in ~/.aws (the credentials file takes priority).
You can also use aws sso to authenticate cloudquery - you can read more about it here.
IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).

You can read more about AWS authentication here and here.

Environment Variables #

CloudQuery can use the credentials from the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables (AWS_SESSION_TOKEN can be optional for some accounts). For information on obtaining credentials, see the AWS guide.

To export the environment variables (On Linux/Mac - similar for Windows):

export AWS_ACCESS_KEY_ID={Your AWS Access Key ID}
export AWS_SECRET_ACCESS_KEY={Your AWS secret access key}
export AWS_SESSION_TOKEN={Your AWS session token}

Shared Configuration files #

The plugin can use credentials from your credentials and config files in the .aws directory in your home folder. The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials file.

For information about obtaining credentials, see the AWS guide.

Here are example contents for a credentials file:

[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

You can also specify credentials for a different profile, and instruct CloudQuery to use the credentials from this profile instead of the default one.

For example:

[myprofile]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY

Then, you can either export the AWS_PROFILE environment variable (On Linux/Mac, similar for Windows):

export AWS_PROFILE=myprofile

IAM Roles for AWS Compute Resources #

The plugin can use IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers). If you configured your AWS compute resources with IAM, the plugin will use these roles automatically. For more information on configuring IAM, see the AWS docs here and here.

User Credentials with MFA #

In order to leverage IAM User credentials with MFA, the STS "get-session-token" command may be used with the IAM User's long-term security credentials (Access Key and Secret Access Key). For more information, see here.

aws sts get-session-token --serial-number <YOUR_MFA_SERIAL_NUMBER> --token-code <YOUR_MFA_TOKEN_CODE> --duration-seconds 3600

Then export the temporary credentials to your environment variables.

export AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>
export AWS_SESSION_TOKEN=<YOUR_SESSION_TOKEN>

Mapping AWS IAM Roles to OpenSearch Roles #

The AWS OpenSearch Service offers fine grained access control using IAM roles. You can map these IAM roles to OpenSearch roles using the OpenSearch Dashboard. This allows you to control access to indices and documents in OpenSearch based on the IAM roles of the user making the request.

In the following example, we will map the IAM role arn:aws:iam::123456789012:role/CloudquerySyncRole to the OpenSearch role cloudquery-sync-role, configured with the required permissions for a Cloudquery sync.

OpenSearch Domain Access Policy

The OpenSearch domain access policy must allow the IAM role arn:aws:iam::123456789012:role/CloudquerySyncRole to access the domain. The following is an example of an OpenSearch domain access policy that allows the IAM role arn:aws:iam::123456789012:role/CloudquerySyncRole to access the domain:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/CloudquerySyncRole"
      },
      "Action": "es:*",
      "Resource": "arn:aws:es:us-west-2:123456789012:domain/my-domain/*"
    }
  ]
}

It is also possible to only use fine-grained access control, and not use the domain access policy. In this case, no access policy is required.

Create an OpenSearch Role

Open the OpenSearch Dashboard and navigate to the "Security" section.
Click on "Roles" and then "Create Role" entering a role name of cloudquery-sync-role.
Add the following to the Cluster Permissions section:
- cluster_monitor
- cluster_composite_ops
- indices:admin/index_template/put
Add the following to the Index Permissions section for an Index Pattern of *:
- indices_all
- crud
Click "Create".
Click on "Mapped Users" and "Manage Mappings"
Add the IAM role arn:aws:iam::123456789012:role/CloudquerySyncRole to the backend roles and click "Map".

Requests made by instances with the IAM role arn:aws:iam::123456789012:role/CloudquerySyncRole will now be authorized by the OpenSearch role cloudquery-sync-role.

Types #

Opensearch Types

The Opensearch destination supports most Apache Arrow types. The following table shows the supported types and how they are mapped to Opensearch field data types.

Arrow Column Type	Supported?	Opensearch Type
Binary	✅ Yes	`binary`
Boolean	✅ Yes	`boolean`
Date32	✅ Yes	`date` with format `yyyy-MM-dd`
Date64	✅ Yes	`date` with format `yyyy-MM-dd`
Decimal	✅ Yes	`text`
Dense Union	✅ Yes	`text`
Dictionary	✅ Yes	`text`
Duration[ms]	✅ Yes	`text`
Duration[ns]	✅ Yes	`text`
Duration[s]	✅ Yes	`text`
Duration[us]	✅ Yes	`text`
Fixed Size List	✅ Yes	Uses type from list elements
Float16	✅ Yes	`half_float`
Float32	✅ Yes	`float`
Float64	✅ Yes	`double`
Inet	✅ Yes	`text`
Int8	✅ Yes	`byte`
Int16	✅ Yes	`short`
Int32	✅ Yes	`integer`
Int64	✅ Yes	`long`
Interval[DayTime]	✅ Yes	`object`
Interval[MonthDayNano]	✅ Yes	`object`
Interval[Month]	✅ Yes	`object`
JSON	✅ Yes	`text`
Large Binary	✅ Yes	`byte`
Large List	✅ Yes	Uses type from list elements
Large String	✅ Yes	`text`
List	✅ Yes	Uses type from list elements
MAC	✅ Yes	`text`
Map	✅ Yes	`object` with `key` and `value` fields
String	✅ Yes	`text`
Struct	✅ Yes	`object`
Time32[s]	✅ Yes	`date` with format `HH:mm:ss`
Time32[ms]	✅ Yes	`date` with format `HH:mm:ss.SSS`
Time64[us]	✅ Yes	`text`
Time64[ns]	✅ Yes	`text`
Timestamp[s]	✅ Yes	`date` with format `2006-01-02T15:04:05Z`
Timestamp[ms]	✅ Yes	`date` with format `2006-01-02T15:04:05.999Z`
Timestamp[us]	✅ Yes	`date` with format `2006-01-02T15:04:05.999999Z"`
Timestamp[ns]	✅ Yes	`date_nanos` with format `2006-01-02T15:04:05.99999999Z`
UUID	✅ Yes	`text`
Uint8	✅ Yes	`unsigned_long`
Uint16	✅ Yes	`unsigned_long`
Uint32	✅ Yes	`unsigned_long`
Uint64	✅ Yes	`unsigned_long`
Union	✅ Yes	`text`

Licenses #

The following tools / packages are used in this plugin:

Name	License
github.com/adrg/xdg	MIT
github.com/apache/arrow/go/v13	Apache-2.0
github.com/apache/arrow-go/v18	Apache-2.0
github.com/apapsch/go-jsonmerge/v2	MIT
github.com/aws/aws-sdk-go-v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/config	Apache-2.0
github.com/aws/aws-sdk-go-v2/credentials	Apache-2.0
github.com/aws/aws-sdk-go-v2/feature/ec2/imds	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/configsources	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/ini	Apache-2.0
github.com/aws/aws-sdk-go-v2/internal/sync/singleflight	BSD-3-Clause
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/licensemanager	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/marketplacemetering	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sso	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/ssooidc	Apache-2.0
github.com/aws/aws-sdk-go-v2/service/sts	Apache-2.0
github.com/aws/smithy-go	Apache-2.0
github.com/aws/smithy-go/internal/sync/singleflight	BSD-3-Clause
github.com/cenkalti/backoff/v4	MIT
github.com/cloudquery/cloudquery-api-go	MPL-2.0
github.com/cloudquery/plugin-pb-go	MPL-2.0
github.com/cloudquery/plugin-sdk/v2/internal/glob	MIT
github.com/cloudquery/plugin-sdk/v2/schema	MIT
github.com/cloudquery/plugin-sdk/v2/types	MPL-2.0
github.com/cloudquery/plugin-sdk/v4	MPL-2.0
github.com/cloudquery/plugin-sdk/v4/glob	MIT
github.com/cloudquery/plugin-sdk/v4/scalar	MIT
github.com/davecgh/go-spew/spew	ISC
github.com/elastic/elastic-transport-go/v8/elastictransport	Apache-2.0
github.com/elastic/go-elasticsearch/v8/typedapi/core/deletebyquery	Apache-2.0
github.com/elastic/go-elasticsearch/v8/typedapi/types	Apache-2.0
github.com/elastic/go-elasticsearch/v8/typedapi/types/enums/licensestatus	Apache-2.0
github.com/elastic/go-elasticsearch/v8/typedapi/types/enums/licensetype	Apache-2.0
github.com/ghodss/yaml	MIT
github.com/go-logr/logr	Apache-2.0
github.com/go-logr/stdr	Apache-2.0
github.com/goccy/go-json	MIT
github.com/google/flatbuffers/go	Apache-2.0
github.com/google/uuid	BSD-3-Clause
github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors	Apache-2.0
github.com/grpc-ecosystem/grpc-gateway/v2	BSD-3-Clause
github.com/hashicorp/go-cleanhttp	MPL-2.0
github.com/hashicorp/go-retryablehttp	MPL-2.0
github.com/huandu/xstrings	MIT
github.com/klauspost/compress	Apache-2.0
github.com/klauspost/compress/internal/snapref	BSD-3-Clause
github.com/klauspost/compress/zstd/internal/xxhash	MIT
github.com/mattn/go-colorable	MIT
github.com/mattn/go-isatty	MIT
github.com/oapi-codegen/runtime	Apache-2.0
github.com/opensearch-project/opensearch-go/v3	Apache-2.0
github.com/pierrec/lz4/v4	BSD-3-Clause
github.com/pmezard/go-difflib/difflib	BSD-3-Clause
github.com/rs/zerolog	MIT
github.com/samber/lo	MIT
github.com/santhosh-tekuri/jsonschema/v6	Apache-2.0
github.com/segmentio/fasthash/fnv1a	MIT
github.com/spf13/cobra	Apache-2.0
github.com/spf13/pflag	BSD-3-Clause
github.com/stretchr/testify	MIT
github.com/thoas/go-funk	MIT
github.com/zeebo/xxh3	BSD-2-Clause
go.opentelemetry.io/otel	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlpmetric/otlpmetrichttp	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace	Apache-2.0
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp	Apache-2.0
go.opentelemetry.io/otel/log	Apache-2.0
go.opentelemetry.io/otel/metric	Apache-2.0
go.opentelemetry.io/otel/sdk	Apache-2.0
go.opentelemetry.io/otel/sdk/log	Apache-2.0
go.opentelemetry.io/otel/sdk/metric	Apache-2.0
go.opentelemetry.io/otel/trace	Apache-2.0
go.opentelemetry.io/proto/otlp	Apache-2.0
golang.org/x/exp	BSD-3-Clause
golang.org/x/net	BSD-3-Clause
golang.org/x/sync/errgroup	BSD-3-Clause
golang.org/x/sys	BSD-3-Clause
golang.org/x/text	BSD-3-Clause
golang.org/x/xerrors	BSD-3-Clause
google.golang.org/genproto/googleapis/api/httpbody	Apache-2.0
google.golang.org/genproto/googleapis/rpc/status	Apache-2.0
google.golang.org/grpc	Apache-2.0
google.golang.org/protobuf	BSD-3-Clause
gopkg.in/yaml.v2	Apache-2.0
gopkg.in/yaml.v3	MIT

Loading plugin documentation