Back to source list
azure
Official
Premium

Azure

The CloudQuery Azure source plugin extracts information from many of the supported services by Microsoft Azure and loads it into any supported CloudQuery destination

Publisher

cloudquery

Latest version

v15.3.0

Type

Source

Platforms
Date Published

Price per 1M rows

Starting from $15

monthly free quota

1M rows

Set up process #


brew install cloudquery/tap/cloudquery

1. Download CLI and login

See installation options

2. Create source and destination configs

Plugin configuration

cloudquery sync azure.yml postgresql.yml

3. Run the sync

CloudQuery sync

Overview #

The CloudQuery Azure source plugin extracts information from many of the supported services by Microsoft Azure and loads it into any supported CloudQuery destination (e.g. PostgreSQL, BigQuery, Snowflake, and more).

Authentication #

The Azure plugin uses DefaultAzureCredential to authenticate.
DefaultAzureCredential will attempt to authenticate via different mechanisms in order, stopping when one succeeds. The order is described in detail in the Azure SDK documentation.
For getting started quickly with the Azure plugin, we recommend using a service principal and exporting environment variables or using az login. The latter is highly discouraged for production use as it requires spawning a new Azure CLI process each time an authentication token is needed and causes memory and performance issues.

Authentication with Environment Variables #

You will need to create a service principal for the plugin to use:
Creating a service principal
First, install the Azure CLI (az).
Then, login with the Azure CLI:
az login
Then, create the service principal the plugin will use to access your cloud deployment. WARNING: The output of az ad sp create-for-rbac contains credentials that you must protect - Make sure to handle with appropriate care. This example uses bash - The commands for CMD and PowerShell are similar.
export SUBSCRIPTION_ID=<YOUR_SUBSCRIPTION_ID>
az account set --subscription $SUBSCRIPTION_ID
az provider register --namespace 'Microsoft.Security'

# Create a service-principal for the plugin
az ad sp create-for-rbac --name cloudquery-sp --scopes /subscriptions/$SUBSCRIPTION_ID --role Reader
You can choose any name you'd like for your service-principal, cloudquery-sp is an example. If the service principal doesn't exist it will create a new one, otherwise it will update the existing one
The output of az ad sp create-for-rbac should look like this:
{
  "appId": "YOUR AZURE_CLIENT_ID",
  "displayName": "cloudquery-sp",
  "password": "YOUR AZURE_CLIENT_SECRET",
  "tenant": "YOUR AZURE_TENANT_ID"
}
Exporting environment variables
Next, you need to export the environment variables that the plugin will use to sync your cloud configuration. Copy them from the output of az ad sp create-for-rbac. The example shows how to export environment variables for Linux - exporting for CMD and PowerShell is similar.
  • AZURE_TENANT_ID is tenant in the JSON.
  • AZURE_CLIENT_ID is appId in the JSON.
  • AZURE_CLIENT_SECRET is password in the JSON.
export AZURE_TENANT_ID=<YOUR AZURE_TENANT_ID>
export AZURE_CLIENT_ID=<YOUR AZURE_CLIENT_ID>
export AZURE_CLIENT_SECRET=<YOUR AZURE_CLIENT_SECRET>
export AZURE_SUBSCRIPTION_ID=$SUBSCRIPTION_ID

Authentication with az login #

First, install the Azure CLI (az). Then, login with the Azure CLI:
az login
You are now authenticated!

Query Examples #

Find all MySQL servers #

SELECT * FROM azure_mysql_servers;

Find storage accounts that are allowing non-HTTPS traffic #

SELECT * from azure_storage_accounts where enable_https_traffic_only = false;

Find all expired key vaults #

SELECT * from azure_keyvault_vault_keys where attributes_expires >= extract(epoch from now()) * 1000;

List the Memory and vCPUs of all available Azure Compute VM types #

SELECT
 distinct(vm.name),
 vcpus.capability_value AS "vCPUs",
 memory.capability_value AS "Memory"
FROM
 azure_compute_skus vm
 CROSS JOIN LATERAL (
   SELECT (caps ->> 'value') AS capability_value
   FROM jsonb_array_elements(vm.capabilities) caps
   WHERE (caps ->> 'name') = 'vCPUs'
 ) vcpus
 CROSS JOIN LATERAL (
   SELECT (caps ->> 'value') AS capability_value
   FROM jsonb_array_elements(vm.capabilities) caps
   WHERE (caps ->> 'name') = 'MemoryGB'
 ) memory
WHERE
 vm.resource_type = 'virtualMachines' order by name;
Results:
+---------------------------+-------+--------+
| name                      | vCPUs | Memory |
|---------------------------+-------+--------|
| Basic_A0                  | 1     | 0.75   |
| Basic_A1                  | 1     | 1.75   |
| Basic_A2                  | 2     | 3.5    |
| Basic_A3                  | 4     | 7      |
| Basic_A4                  | 8     | 14     |
| Standard_A0               | 1     | 0.75   |
| Standard_A1               | 1     | 1.75   |
| Standard_A1_v2            | 1     | 2      |
... (truncated)


Configuration #

CloudQuery Azure Source Plugin Configuration Reference

Example #

This example connects a single Azure subscription to a single destination. The (top level) source spec section is described in the Source Spec Reference.
kind: source
spec:
  # Source spec section
  name: "azure"
  path: "cloudquery/azure"
  registry: "cloudquery"
  version: "v15.3.0"
  destinations: ["postgresql"]
  tables: ["azure_compute_virtual_machines"]
  # Learn more about the configuration options at https://cql.ink/azure_source
  spec:
    # Optional parameters
    # subscriptions: []
    # cloud_name: ""
    # concurrency: 50000
    # discovery_concurrency: 400
    # skip_subscriptions: []
    # normalize_ids: false
    # oidc_token: ""
    # retry_options:
    #   max_retries: 3
    #   try_timeout_seconds: 0
    #   retry_delay_seconds: 4
    #   max_retry_delay_seconds: 60

Azure Spec #

This is the (nested) spec used by the Azure source plugin.
  • subscriptions ([]string) (default: empty. Will use all visible subscriptions)
    Specify which subscriptions to sync data from.
  • cloud_name (string) (default: empty)
    The name of the cloud environment to use. Possible values are AzureCloud, AzureChinaCloud, AzureGovernment.
  • concurrency (integer) (default: 50000):
    The best effort maximum number of Go routines to use. Lower this number to reduce memory usage.
  • discovery_concurrency (integer) (default: 400)
    During initialization the Azure source plugin discovers all resource groups and enabled resource providers per subscription, to be used later on during the sync process. The plugin runs the discovery process in parallel. This setting controls the maximum number of concurrent requests to the Azure API during discovery. Only accounts with many subscriptions should require modifying this setting, to either lower it to avoid network errors, or to increase it to speed up the discovery process.
  • scheduler (string) (default: dfs):
    The scheduler to use when determining the priority of resources to sync. Supported values are dfs (depth-first search), round-robin, shuffle and shuffle-queue.
    For more information about this, see performance tuning.
  • skip_subscriptions ([]string) (default: empty)
    A list of subscription IDs that CloudQuery will skip syncing. This is useful if CloudQuery is discovering the list of subscription IDs and there are some subscriptions that you want to not even attempt syncing.
  • normalize_ids (bool) (default: false)
    Enabling this setting will force all id column values to be lowercase. This is useful to avoid case sensitivity and uniqueness issues around the id primary keys
  • oidc_token (string) (default: empty)
    An OIDC token can be used to authenticate with Azure instead of AZURE_CLIENT_SECRET. This is useful for Azure AD workload identity federation. When using this option, the AZURE_CLIENT_ID and AZURE_TENANT_ID environment variables must be set.
  • retry_options (RetryOptions) (default: empty)
    Retry options to pass to the Azure Go SDK, see more details here

retry_options #

  • max_retries (integer) (default: 10)
Described in the Azure Go SDK. The plugin overrides the Azure SDK default (described in the link above) of 3 to 10.
  • try_timeout_seconds (integer) (default: 0)
Disabled by default. Described in the Azure Go SDK.
  • retry_delay_seconds (integer) (default: 4)
Described in the Azure Go SDK.
  • max_retry_delay_seconds (integer) (default: 60)
Described in the Azure Go SDK.
  • status_codes ([]integer) (default: null)
Described in the Azure Go SDK.
The default of null uses the default status codes. An empty value disables retries for HTTP status codes.


Table options #

This feature allows users to override the default options for specific tables. The object structure begins with the table name at the root level. The next level represents the API service name, and the final level contains the input object as defined by the API.
The format of the table_options object is as follows:
table_options:
  <table_name>:
    <service_name>:
      - <input_object>
A list of <input_object> objects should be provided. The plugin will iterate through these to make multiple API calls.
For example,
table_options:
    azure_compute_virtual_machines:
      virtual_machines_options:
        - status_only: "true"
        - status_only: "false"
          expand: "instanceView"
The field names follow the same naming conventions as the Azure API, but are converted to snake case. For example StatusOnly is represented as status_only.
The following tables and APIs are supported:
table_options:
  azure_compute_virtual_machines:
    virtual_machines_options:
      - <VirtualMachines.ListAllOptions>
The full list of supported options are documented under the Table Options section of each table in the Azure plugin tables documentation.


Join our mailing list

Subscribe to our newsletter to make sure you don't miss any updates.

Legal

© 2024 CloudQuery, Inc. All rights reserved.

We use tracking cookies to understand how you use the product and help us improve it. Please accept cookies to help us improve. You can always opt out later via the link in the footer.