Back to source list
Official
Premium
Azure
The CloudQuery Azure source plugin extracts information from many of the supported services by Microsoft Azure and loads it into any supported CloudQuery destination
Publisher
cloudquery
Latest version
v15.3.0
Type
Source
Platforms
Date Published
Price per 1M rows
Starting from $15
monthly free quota
1M rows
Set up process #
brew install cloudquery/tap/cloudquery
1. Download CLI and login
2. Create source and destination configs
Plugin configurationOverview #
The CloudQuery Azure source plugin extracts information from many of the supported services by Microsoft Azure and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).
Authentication #
The Azure plugin uses
DefaultAzureCredential
to authenticate.DefaultAzureCredential
will attempt to authenticate via different mechanisms in order, stopping when one succeeds. The order is described in detail in the Azure SDK documentation.For getting started quickly with the Azure plugin, we recommend using a service principal and exporting environment variables or using
az login
. The latter is highly discouraged for production use as it requires spawning a new Azure CLI process each time an authentication token is needed and causes memory and performance issues.Authentication with Environment Variables #
You will need to create a service principal for the plugin to use:
Creating a service principal
First, install the Azure CLI (
az
).Then, login with the Azure CLI:
az login
Then, create the service principal the plugin will use to access your cloud deployment. WARNING: The output of
az ad sp create-for-rbac
contains credentials that you must protect - Make sure to handle with appropriate care.
This example uses bash - The commands for CMD and PowerShell are similar.export SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
az account set --subscription $SUBSCRIPTION_ID
az provider register --namespace 'Microsoft.Security'
# Create a service-principal for the plugin
az ad sp create-for-rbac --name cloudquery-sp --scopes /subscriptions/$SUBSCRIPTION_ID --role Reader
You can choose any name you'd like for your service-principal,cloudquery-sp
is an example. If the service principal doesn't exist it will create a new one, otherwise it will update the existing one
The output of
az ad sp create-for-rbac
should look like this:{
"appId": "YOUR AZURE_CLIENT_ID",
"displayName": "cloudquery-sp",
"password": "YOUR AZURE_CLIENT_SECRET",
"tenant": "YOUR AZURE_TENANT_ID"
}
Exporting environment variables
Next, you need to export the environment variables that the plugin will use to sync your cloud configuration.
Copy them from the output of
az ad sp create-for-rbac
.
The example shows how to export environment variables for Linux - exporting for CMD and PowerShell is similar.AZURE_TENANT_ID
istenant
in the JSON.AZURE_CLIENT_ID
isappId
in the JSON.AZURE_CLIENT_SECRET
ispassword
in the JSON.
export AZURE_TENANT_ID=<YOUR AZURE_TENANT_ID>
export AZURE_CLIENT_ID=<YOUR AZURE_CLIENT_ID>
export AZURE_CLIENT_SECRET=<YOUR AZURE_CLIENT_SECRET>
export AZURE_SUBSCRIPTION_ID=$SUBSCRIPTION_ID
Authentication with az login
#
First, install the Azure CLI (
az
). Then, login with the Azure CLI:az login
You are now authenticated!
Query Examples #
Find all MySQL servers #
SELECT * FROM azure_mysql_servers;
Find storage accounts that are allowing non-HTTPS traffic #
SELECT * from azure_storage_accounts where enable_https_traffic_only = false;
Find all expired key vaults #
SELECT * from azure_keyvault_vault_keys where attributes_expires >= extract(epoch from now()) * 1000;
List the Memory and vCPUs of all available Azure Compute VM types #
SELECT
distinct(vm.name),
vcpus.capability_value AS "vCPUs",
memory.capability_value AS "Memory"
FROM
azure_compute_skus vm
CROSS JOIN LATERAL (
SELECT (caps ->> 'value') AS capability_value
FROM jsonb_array_elements(vm.capabilities) caps
WHERE (caps ->> 'name') = 'vCPUs'
) vcpus
CROSS JOIN LATERAL (
SELECT (caps ->> 'value') AS capability_value
FROM jsonb_array_elements(vm.capabilities) caps
WHERE (caps ->> 'name') = 'MemoryGB'
) memory
WHERE
vm.resource_type = 'virtualMachines' order by name;
Results:
+---------------------------+-------+--------+
| name | vCPUs | Memory |
|---------------------------+-------+--------|
| Basic_A0 | 1 | 0.75 |
| Basic_A1 | 1 | 1.75 |
| Basic_A2 | 2 | 3.5 |
| Basic_A3 | 4 | 7 |
| Basic_A4 | 8 | 14 |
| Standard_A0 | 1 | 0.75 |
| Standard_A1 | 1 | 1.75 |
| Standard_A1_v2 | 1 | 2 |
... (truncated)
Configuration #
CloudQuery Azure Source Plugin Configuration Reference
Example #
This example connects a single Azure subscription to a single destination. The (top level) source spec section is described in the Source Spec Reference.
kind: source
spec:
# Source spec section
name: "azure"
path: "cloudquery/azure"
registry: "cloudquery"
version: "v15.3.0"
destinations: ["postgresql"]
tables: ["azure_compute_virtual_machines"]
# Learn more about the configuration options at https://cql.ink/azure_source
spec:
# Optional parameters
# subscriptions: []
# cloud_name: ""
# concurrency: 50000
# discovery_concurrency: 400
# skip_subscriptions: []
# normalize_ids: false
# oidc_token: ""
# retry_options:
# max_retries: 3
# try_timeout_seconds: 0
# retry_delay_seconds: 4
# max_retry_delay_seconds: 60
Azure Spec #
This is the (nested) spec used by the Azure source plugin.
subscriptions
([]string
) (default: empty. Will use all visible subscriptions)Specify which subscriptions to sync data from.cloud_name
(string
) (default: empty)The name of the cloud environment to use. Possible values areAzureCloud
,AzureChinaCloud
,AzureGovernment
.concurrency
(integer
) (default:50000
):The best effort maximum number of Go routines to use. Lower this number to reduce memory usage.discovery_concurrency
(integer
) (default:400
)During initialization the Azure source plugin discovers all resource groups and enabled resource providers per subscription, to be used later on during the sync process. The plugin runs the discovery process in parallel. This setting controls the maximum number of concurrent requests to the Azure API during discovery. Only accounts with many subscriptions should require modifying this setting, to either lower it to avoid network errors, or to increase it to speed up the discovery process.scheduler
(string
) (default:dfs
):The scheduler to use when determining the priority of resources to sync. Supported values aredfs
(depth-first search),round-robin
,shuffle
andshuffle-queue
.For more information about this, see performance tuning.skip_subscriptions
([]string
) (default: empty)A list of subscription IDs that CloudQuery will skip syncing. This is useful if CloudQuery is discovering the list of subscription IDs and there are some subscriptions that you want to not even attempt syncing.normalize_ids
(bool
) (default:false
)Enabling this setting will force allid
column values to be lowercase. This is useful to avoid case sensitivity and uniqueness issues around theid
primary keysoidc_token
(string
) (default: empty)An OIDC token can be used to authenticate with Azure instead ofAZURE_CLIENT_SECRET
. This is useful for Azure AD workload identity federation. When using this option, theAZURE_CLIENT_ID
andAZURE_TENANT_ID
environment variables must be set.retry_options
(RetryOptions
) (default: empty)Retry options to pass to the Azure Go SDK, see more details here
retry_options
#
max_retries
(integer
) (default:10
)
Described in the
Azure Go SDK.
The plugin overrides the Azure SDK default (described in the link above) of
3
to 10
.try_timeout_seconds
(integer
) (default:0
)
Disabled by default. Described in the
Azure Go SDK.
retry_delay_seconds
(integer
) (default:4
)
Described in the
Azure Go SDK.
max_retry_delay_seconds
(integer
) (default:60
)
Described in the
Azure Go SDK.
status_codes
([]integer
) (default:null
)
Described in the
Azure Go SDK.
The default of
null
uses the default status codes.
An empty value disables retries for HTTP status codes.Table options #
This feature allows users to override the default options for specific tables. The object structure begins with the table name at the root level. The next level represents the API service name, and the final level contains the input object as defined by the API.
The format of the
table_options
object is as follows:table_options:
<table_name>:
<service_name>:
- <input_object>
A list of
<input_object>
objects should be provided.
The plugin will iterate through these to make multiple API calls.For example,
table_options:
azure_compute_virtual_machines:
virtual_machines_options:
- status_only: "true"
- status_only: "false"
expand: "instanceView"
The field names follow the same naming conventions as the Azure API, but are converted to snake case. For example
StatusOnly
is represented as status_only
.The following tables and APIs are supported:
table_options:
azure_compute_virtual_machines:
virtual_machines_options:
- <VirtualMachines.ListAllOptions>
The full list of supported options are documented under the
Table Options
section of each table in the Azure plugin tables documentation.