Back to source list
Official
Premium
AWS
The AWS Source plugin extracts information from many of the supported services by Amazon Web Services (AWS) and loads it into any supported CloudQuery destination
Publisher
cloudquery
Latest version
v32.21.0
Type
Source
Platforms
Date Published
Overview #
AWS Source Plugin
The AWS Source plugin extracts information from many of the supported services by Amazon Web Services (AWS) and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
Authentication #
The plugin needs to be authenticated with your account(s) in order to sync information from your cloud setup.
The plugin requires only read permissions (we will never make any changes to your cloud setup), so, following the principle of least privilege, it's recommended to grant it read-only permissions.
There are multiple ways to authenticate with AWS, and the plugin respects the AWS credential provider chain. This means that CloudQuery will follow the following priorities when attempting to authenticate:
- The
AWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,AWS_SESSION_TOKEN
environment variables. - The
credentials
andconfig
files in~/.aws
(thecredentials
file takes priority). - You can also use
aws sso
to authenticate cloudquery - you can read more about it here. - IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).
Environment Variables #
CloudQuery can use the credentials from the
AWS_ACCESS_KEY_ID
, AWS_SECRET_ACCESS_KEY
, and
AWS_SESSION_TOKEN
environment variables (AWS_SESSION_TOKEN
can be optional for some accounts). For information on obtaining credentials, see the
AWS guide.To export the environment variables (On Linux/Mac - similar for Windows):
export AWS_ACCESS_KEY_ID={Your AWS Access Key ID}
export AWS_SECRET_ACCESS_KEY={Your AWS secret access key}
export AWS_SESSION_TOKEN={Your AWS session token}
Shared Configuration files #
The plugin can use credentials from your
credentials
and config
files in the .aws
directory in your home folder.
The contents of these files are practically interchangeable, but CloudQuery will prioritize credentials in the credentials
file.For information about obtaining credentials, see the
AWS guide.
Here are example contents for a
credentials
file:[default]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
You can also specify credentials for a different profile, and instruct CloudQuery to use the credentials from this profile instead of the default one.
For example:
[myprofile]
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
Then, you can either export the
AWS_PROFILE
environment variable (On Linux/Mac, similar for Windows):export AWS_PROFILE=myprofile
or, configure your desired profile in the
local_profile
field:accounts:
id: <account_alias>
local_profile: myprofile
IAM Roles for AWS Compute Resources #
The plugin can use IAM roles for AWS compute resources (including EC2 instances, Fargate and ECS containers).
If you configured your AWS compute resources with IAM, the plugin will use these roles automatically.
For more information on configuring IAM, see the AWS docs here and here.
User Credentials with MFA #
In order to leverage IAM User credentials with MFA, the STS "get-session-token" command may be used with the IAM User's long-term security credentials (Access Key and Secret Access Key). For more information, see here.
aws sts get-session-token --serial-number <YOUR_MFA_SERIAL_NUMBER> --token-code <YOUR_MFA_TOKEN_CODE> --duration-seconds 3600
Then export the temporary credentials to your environment variables.
export AWS_ACCESS_KEY_ID=<YOUR_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<YOUR_SECRET_ACCESS_KEY>
export AWS_SESSION_TOKEN=<YOUR_SESSION_TOKEN>
Query Examples #
Find all public-facing load balancers #
SELECT * FROM aws_elbv2_load_balancers WHERE scheme = 'internet-facing';
Find all unencrypted RDS instances #
SELECT * FROM aws_rds_clusters WHERE storage_encrypted IS FALSE;
Find all S3 buckets that are permitted to be public #
SELECT arn, region
FROM aws_s3_buckets
WHERE block_public_acls IS NOT TRUE
OR block_public_policy IS NOT TRUE
OR ignore_public_acls IS NOT TRUE
OR restrict_public_buckets IS NOT TRUE
Configuration #
AWS Source Plugin Configuration Reference
Examples #
Single Account Example #
This example connects a single AWS account in one region to a Postgres destination. The (top level) source spec section is described in the Source Spec Reference.
kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v22.19.2"
tables: ["aws_ec2_instances"]
destinations: ["postgresql"]
spec:
# AWS Spec section described below
regions:
- us-east-1
accounts:
- id: "account1"
local_profile: "account1"
aws_debug: false
See tables for a list of all available tables.
AWS Organization Example #
CloudQuery supports discovery of AWS Accounts via AWS Organizations. This means that as Accounts get added or removed from your organization CloudQuery will be able to handle new or removed accounts without any configuration changes.
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v22.19.2"
tables: ['aws_s3_buckets']
destinations: ["postgresql"]
spec:
aws_debug: false
org:
admin_account:
local_profile: "<NAMED_PROFILE>"
member_role_name: OrganizationAccountAccessRole
regions:
- '*'
For full details, see the Multi Account Configuration Tutorial.
AWS Spec #
This is the (nested) spec used by the AWS source plugin.
regions
([]string
) (default:[]
. Will use all enabled regions)Regions to use.accounts
([]account) (default: current account)List of all accounts to fetch information fromorg
(org) (default: not used)In AWS organization mode, CloudQuery will source all accounts underneath automaticallyconcurrency
(int
) (default:50000
):The best effort maximum number of Go routines to use. Lower this number to reduce memory usage.initialization_concurrency
(int
) (default:4
)During initialization the AWS source plugin fetches information about each account and region. This setting controls how many accounts can be initialized concurrently. Only configurations with many accounts (either hardcoded or discovered via Organizations) should require modifying this setting, to either lower it to avoid rate limit errors, or to increase it to speed up the initialization process.scheduler
(string) (default:dfs
):The scheduler to use when determining the priority of resources to sync. Currently, the only supported values aredfs
(depth-first search),round-robin
andshuffle
. For more information about this, see performance tuning.aws_debug
(bool
) (default:false
)If true, will log AWS debug logs, including retries and other request/response metadatamax_retries
(int
) (default:10
)Defines the maximum number of times an API request will be retriedmax_backoff
(int
) (default:30
)Defines the duration between retry attemptscustom_endpoint_url
(string
) (default: not used)The base URL endpoint the SDK API clients will use to make API calls to. The SDK will suffix URI path and query elements to this endpointcustom_endpoint_hostname_immutable
(bool
) (default: not used)Specifies if the endpoint's hostname can be modified by the SDK's API client. When using something like LocalStack make sure to set it equal totrue
.custom_endpoint_partition_id
(string
) (default: not used)The AWS partition the endpoint belongs tocustom_endpoint_signing_region
(string
) (default: not used)The region that should be used for signing the request to the endpointuse_paid_apis
(bool
) (default:false
)When set totrue
plugin will sync data from APIs that incur a fee. Currently onlyaws_costexplorer*
andaws_alpha_cloudwatch_metric*
tables require this flag to be set totrue
.
- preview
table_options
(map
) (default: not used)This is a preview feature (for more information aboutpreview
features look at plugin versioning) that enables users to override the default options for specific tables. The root of the object takes a table name, and the next level takes an API method name. The final level is the actual input object as defined by the API.The format of thetable_options
object is as follows:table_options: <table_name>: <api_method_name>: - <input_object>
A list of<input_object>
objects should be provided. CloudQuery will iterate through these to make multiple API calls. This is useful for APIs like CloudTrail'sLookupEvents
that only supports a single event type per call. For example:table_options: aws_cloudtrail_events: lookup_events: - start_time: 2023-05-01T20:20:52Z end_time: 2023-05-03T20:20:52Z lookup_attributes: - attribute_key: EventName attribute_value: RunInstances - start_time: 2023-05-01T20:20:52Z end_time: 2023-05-03T20:20:52Z lookup_attributes: - attribute_key: EventName attribute_value: StartInstances - start_time: 2023-05-01T20:20:52Z end_time: 2023-05-03T20:20:52Z lookup_attributes: - attribute_key: EventName attribute_value: StopInstances
The naming for all the fields is the same as the AWS API but in snake case. For exampleEndTime
is represented asend_time
. As ofv18.4.0
the following tables and APIs are supported:table_options: aws_accessanalyzer_analyzer_findings: list_findings: - <AccessAnalyzer.ListFindings> # NextToken & AnalyzerArn are prohibited aws_alpha_cloudwatch_metrics: - list_metrics: <CloudWatch.ListMetrics> # NextToken is prohibited get_metric_statistics: - <CloudWatch.GetMetricStatistics> # Namespace, MetricName and Dimensions are prohibited aws_cloudtrail_events: lookup_events: - <CloudTrail.LookupEvents> # NextToken is prohibited aws_inspector2_findings: list_findings: - <InspectorV2.ListFindings> # NextToken is prohibited. aws_securityhub_findings: get_findings: - <SecurityHub.GetFindings> # NextToken is prohibited. MaxResults should be in range [1-100]. aws_ecs_cluster_tasks: list_tasks: - <ECS.ListTasks> # Cluster and NextToken are prohibited. MaxResults should be in range [1-100].
- enterprise version only
event_based_sync
(array
) (default: empty)
account #
This is used to specify one or more accounts to extract information from. Note that it should be an array of objects, each with the following fields:
id
(string) (required)Will be used as an alias in the source plugin and in the logslocal_profile
(string) (default: will use current credentials)Local profile to use to authenticate this account with. Please note this should be set to the name of the profile.For example, with the following credentials file:[default] aws_access_key_id=xxxx aws_secret_access_key=xxxx [user1] aws_access_key_id=xxxx aws_secret_access_key=xxxx
local_profile
should be set to eitherdefault
oruser1
.role_arn
(string)If specified will use this to assume rolerole_session_name
(string)If specified will use this session name when assume role torole_arn
external_id
(string)If specified will use this when assuming role torole_arn
default_region
(string) (default:us-east-1
)If specified, this region will be used as the default region for the account.regions
(string)Regions to use for this account. Defaults to globalregions
setting.
org #
admin_account
(Account)Configuration for how to grab credentials from an Admin accountmember_trusted_principal
(Account)Configuration for how to specify the principle to use in order to assume a role in the member accountsmember_role_name
(string) (required)Role name that CloudQuery should use to assume a role in the member account from the admin account.Note: This is not a full ARN, it is just the name.member_role_session_name
(string)Overrides the default session name.member_external_id
(string)Specify an external ID for use in the trust policymember_regions
([]string)Limit fetching resources within this specific account to only these regions. This will override any regions specified in the provider block. You can specify all regions by using the*
character as the only argument in the array.organization_units
([]string)List of Organizational Units that CloudQuery should use to source accounts from. If you specify an OU, CloudQuery will also traverse nested OUs.skip_organization_units
([]string)List of Organizational Units to skip. This is useful in conjunction withorganization_units
if there are child OUs that should be ignored.skip_member_accounts
([]string)List of OU member accounts to skip. This is useful if there are accounts under the selected OUs that should be ignored.
event_based_sync #
account
(Account)Configuration for the credentials that will be used to grab records from the specified Kinesis Stream. If this is not specified the default credentials will be usedkinesis_stream_arn
(string) (required)ARN for the Kinesis stream that will hold all of the CloudTrail recordsstart_time
(timestamp)Defines the place in the stream where record processing should begin. By default, the time at which the sync began will be used. The value should follow the RFC 3339 format. For example2023-09-04T19:24:14Z
full_sync
(bool
) (default:true
)By default, CQ will do a full sync on the specified tables before starting to consume the events in the stream. This parameter enables users to skip the full pull based sync and go straight to the event based sync.
Advanced Configuration #
Incremental Tables #
Some tables, like
aws_cloudtrail_events
, support incremental syncs. When incremental syncing is enabled, CloudQuery will only fetch new data since the last sync. This is useful for tables that have a lot of data and are updated frequently. To enable incremental syncs, add a backend_options
section to the source config:kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v22.19.2"
tables: ["aws_cloudtrail_events"]
destinations: ["postgresql"]
backend_options:
table_name: "cq_state_aws"
connection: "@@plugins.postgresql.connection"
spec:
---
kind: destination
spec:
name: "postgresql"
path: "cloudquery/postgresql"
registry: cloudquery
version: "v8.8.8"
write_mode: "overwrite-delete-stale"
spec:
connection_string: "${CONNECTION_STRING}"
The
connection
string can reference any destination that supports overwrite
mode; in the example above it will use the same postgresql
destination that the aws_cloudtrail_events
table is written to. The table_name
is the name of the table that will be used to store state. This table will be created automatically if it does not exist. For more information about managing state for incremental tables, see Managing Incremental Tables.Skip Tables #
AWS has tables that may contain many resources, nested information, and AWS-provided data. These tables may cause certain syncs to be slow due to the amount of AWS-provided data and may not be needed. We recommend only specifying syncing from necessary tables. If
*
is necessary for tables, Below is a reference configuration of skip tables, where certain tables are skipped.kind: source
spec:
# Source spec section
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v22.19.2"
tables: ["*"]
skip_tables:
- aws_cloudtrail_events
- aws_docdb_cluster_parameter_groups
- aws_docdb_engine_versions
- aws_ec2_instance_types
- aws_ec2_vpc_endpoint_services
- aws_elasticache_engine_versions
- aws_elasticache_parameter_groups
- aws_elasticache_reserved_cache_nodes_offerings
- aws_elasticache_service_updates
- aws_iam_group_last_accessed_details
- aws_iam_policy_last_accessed_details
- aws_iam_role_last_accessed_details
- aws_iam_user_last_accessed_details
- aws_neptune_cluster_parameter_groups
- aws_neptune_db_parameter_groups
- aws_rds_cluster_parameter_groups
- aws_rds_db_parameter_groups
- aws_rds_engine_versions
- aws_servicequotas_services
- aws_stepfunctions_map_run_executions
- aws_stepfunctions_map_runs
destinations: ["postgresql"]
spec:
# AWS Spec section described below
Event-Based-Sync #
Event-Based Sync
AWS CloudTrail enables users to get an audit log of events occurring within their account. By subscribing to a stream of AWS CloudTrail events in a Kinesis Data stream CloudQuery can trigger selective syncs to update just the singular resource that had a configuration change.
Each table in the supported list is a top level table. When an event is received for a table, all child tables are re-synced too by default. To skip some child tables you can use
skip_tables
Supported Services and Events #
Configuration #
- Configure an AWS CloudTrail Trail to send management events to a Kinesis Data Stream via CloudWatch Logs. The most straight-forward way to do this is to use the CloudFormation template provided by CloudQuery.
The CloudFormation template will deploy the following architecture:
aws cloudformation deploy --template-file ./streaming-deployment.yml --stack-name <STACK-NAME> --capabilities CAPABILITY_IAM --disable-rollback --region <DESIRED-REGION>
2. Copy the ARN of the Kinesis stream. If you used the CloudFormation template you can run the following command:
aws cloudformation describe-stacks --stack-name <STACK-NAME> --query "Stacks[].Outputs" --region <DESIRED-REGION>
- Define a
config.yml
file like the one below
kind: source
spec:
name: "aws-event-based"
registry: "local"
path: <PATH/TO/BINARY>
tables:
- aws_ec2_instances
- aws_ec2_internet_gateways
- aws_ec2_security_groups
- aws_ec2_subnets
- aws_ec2_vpcs
- aws_ecs_cluster_tasks
- aws_iam_groups
- aws_iam_roles
- aws_iam_users
- aws_rds_instances
destinations: ["postgresql"]
spec:
event_based_sync:
# account:
# local_profile: "<ROLE-NAME>"
kinesis_stream_arn: <OUTPUT-FROM-CLOUDFORMATION-STACK>
- Sync the data!
cloudquery sync config.yml
This will start a long lived process that will only stop when there is an error or you stop the process
Limitations #
- Kinesis Stream can only have a single shard. (This is a limitation that we expect to remove in the future)
- Stale records will only be deleted if the plugin stops consuming the Kinesis Stream, which only can occur if there is an error
Multi-Account #
Multi Account Configuration Tutorial
AWS Organizations #
The plugin supports discovery of AWS Accounts via AWS Organizations. This means that as Accounts get added or removed from your organization, it will be able to handle new or removed accounts without any configuration changes.
kind: source
spec:
name: aws
path: cloudquery/aws
registry: cloudquery
version: "v22.19.2"
tables: ['aws_s3_buckets']
destinations: ["postgresql"]
spec:
aws_debug: false
org:
admin_account:
local_profile: "<NAMED_PROFILE>"
member_role_name: cloudquery-ro
regions:
- '*'
Prerequisites for using AWS Org functionality:
- Have a role (or user) in an Admin account with the following access:
organizations:ListAccounts
organizations:ListAccountsForParent
organizations:ListChildren
- Have a role in each member account that has a trust policy with a single principal. The default profile name is
OrganizationAccountAccessRole
. TheOrganizationAccountAccessRole
is created by default in AWS Accounts created as part of an AWS Organization. We do not recommend using theOrganizationAccountAccessRole
due to the level of permissions typically granted to the role, but instead recommend for CloudQuery users to create their own IAM roles in each member account with the appropriate read-only permissions. We also recommend ensuring that the IAM roles and policies used for CloudQuery adhere to company security standards.
- Reference IAM assets and the CloudFormation templates for deployment in an AWS Organization for CloudQuery can be found here.
Configuring AWS Organization:
- It is always necessary to specify a member role name:
org: member_role_name: cloudquery-ro
- Sourcing credentials that have the necessary
organizations
permissions can be done in any of the following ways:- Source credentials from the default credential tool chain:
org: member_role_name: cloudquery-ro
- Source credentials from a named profile in the shared configuration or credentials file
org: member_role_name: cloudquery-ro admin_account: local_profile: <Named-Profile>
- Assume a role in admin account using credentials in the shared configuration or credentials file:
org: member_role_name: cloudquery-ro admin_account: local_profile: <Named-Profile> role_arn: arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME> # Optional. Specify the name of the session # role_session_name: "" # Optional. Specify the ExternalID if required for trust policy # external_id: ""
- Optional. If the trust policy configured for the member accounts requires different credentials than you configured in the previous step, then you can specify the credentials to use in the
member_trusted_principal
block:org: member_role_name: cloudquery-ro member_trusted_principal: local_profile: <Named-Profile-Member>
- Optional. If you want to specify specific Organizational Units to fetch from you can add them to the
organization_units
list.org: member_role_name: cloudquery-ro organization_units: - ou-<ID-1> - ou-<ID-2>
Child OUs will also be included. To skip a child OU or account, use theskip_organization_units
orskip_member_accounts
options respectively:org: member_role_name: cloudquery-ro organization_units: - ou-<ID-1> - ou-<ID-2> skip_organization_units: - ou-<ID-3> skip_member_accounts: - <ACCOUNT_ID>
Arguments for Org block #
See AWS org configuration for more information on all the arguments in the
org
block.Multi Account: Specific Accounts #
CloudQuery can fetch from multiple accounts in parallel by using AssumeRole (You will need to use credentials that can AssumeRole to all other specified accounts). Below is an example configuration:
accounts:
- id: <AccountID_Alias_1>
role_arn: <YOUR_ROLE_ARN_1>
# Optional. Local Profile is the named profile in your shared configuration file (usually `~/.aws/config`) that you want to use for this specific account
local_profile: <NAMED_PROFILE>
# Optional. Specify the Role Session name
role_session_name: ""
- id: <AccountID_Alias_2>
local_profile: provider
# Optional. Role ARN we want to assume when accessing this account
role_arn: <YOUR_ROLE_ARN_2>
Arguments for Accounts block #
See AWS accounts configuration for more information on all the arguments in the
accounts
block.Versioning #
Plugin Versioning #
Changes to schema, configurations and required user permissions are all factors that go into the versioning of the AWS plugin. Any release that requires manual changes to a an existing deployment of the AWS plugin in order to retain the same functionality will be indicated by an increase to the major version. When support for additional resources is added it will result in a minor version bump. This is important to be aware of because if you are using
tables: ["*"]
to specify the set of tables to sync then in minor versions new resources that might require additional IAM permissions might result in errors being raised.Breaking changes #
The following examples are some of the most common examples of reasons for a major version change:
- Changing a primary key for a table
- Changing the name of a table
- Changing the permissions required to sync a resource
All releases contain a release log that indicates all of the changes (and highlights the breaking changes), all changelogs are available here. If you are ever unsure about a change that is included feel free to reach out to the CloudQuery team on Discord to find out more.
Preview features #
Sometimes features or tables will be released and marked as
alpha
. This indicates that future minor versions might change, break or remove functionality. This enables the CloudQuery team to release functionality prior to it being fully stable so that the community can give feedback. Once a feature is released as Generally Available then all of the above rules for semantic versioning will apply.Current Preview features
The following features are currently in
Preview
- the
table_options
parameter in the AWS plugin spec - All tables that are prefixed with
aws_alpha_
including:aws_alpha_cloudwatch_metrics
aws_alpha_cloudwatch_metric_statistics
aws_alpha_costexplorer_cost_custom