Google BigQuery

Tree Schema integrates with BigQuery to extract the metadata from your tables, collect sample values for your fields and to sync your schema descriptions from Tree Schema back to BigQuery.

Connecting to BigQuery

The following fields are required for BigQuery:

  • Project ID

  • JSON Key File: paste the content of your JSON key file for your account or service account here into this input field

For details on how to create a service account GCP Documentation.

../../_images/bigquery_connection.png

Note

Tree Schema connects to BigQuery directly using the API provided by Google via HTTPS and is not eligible to connect through a jump server.


IAM Permissions

The following permissions are required for Tree Schema to integrate with BigQuery.

  • bigquery.datasets.get: allows access to list and retrieve metadata about BigQuery datasets

  • bigquery.tables.list: allows polling to find the full list of tables within a data set

  • bigquery.tables.get: allows access to retrieve metadata about BigQuery tables

In addition, these permissions are optional:

  • bigquery.tables.getData: Required to extract sample values from each field. Capturing sample values can also be turned off at your organization level.

  • bigquery.tables.update: Required to sync table descriptions made in Tree Schema back to BigQuery. Syncing descriptions to BigQuery can also be turned off at the data store level.

As an example, the following role can be created in Google’s IAM in order to provide all access required:

../../_images/bigquery_tree_schema_role.png

BigQuery Limitations

Tree Schema exclusively uses the Google APIs in order to access both your metadata as well as to collect sample values for your data. Tree Schema does not execute any queries against BigQuery since queries are billed by the amount of data scanned, this includes querying the INFORMATION_SCHEMA. Unfortunately, Google does not provide access to views via the API, therefore metadata about views is not captured within Tree Schema.


Synchronizing Metadata

Tree Schema will check to make sure that your metadata between BigQuery and Tree Schema is kept in sync. This includes validating that the descriptions of your tables in BigQuery matches the description of your schemas in Tree Schema. When your data is in sync you will see the successful synchronization icon on the README:

../../_images/readme_success_sync.png

If Tree Schema does not have the appropriate permissions to update the descriptions in BigQuery you will see an error similar to the following:

../../_images/bigquery_update_no_permission.png