dbt

Contents:

dbt & Tree Schema Overview

Tree Schema is able to process your dbt manifest.json output in order to ingest the metadata that is created by dbt. The following information can be extracted from the manifest.json file:

  • Schemas and fields: this includes the names, structure and types of schemas and fields

  • Descriptions: available for both schemas and fields

  • Tags: available for both schemas and fields

  • Data Lineage: available only for schemas

Note

all field-level information must be defined in the dbt schema.yml file.

Uploading the Manifest to Tree Schema

The dbt manifest.json can be uploaded to Tree Schema through the GUI or via the API.

Upload via the GUI

In order to upload a manifest file via the GUI the data store must be eligible for dbt. This eligibility is managed by dbt and Tree Schema reflects this dbt eligibility.

If a data store is eligible for dbt then the dbt tab will be displayed. Within this tab, there are two views:

  1. Upload a new manifest file

  2. View manifest file history

../_images/dbt_eligible_tab.png

To upload a new file, select the Upload dbt Manifest button and select your file. Once you have selected a file it will be parsed, this generally only takes a few moments and once completed you will see the summary screen:

../_images/dbt_manifest_summary.png

Once the file is parsed the output is shown for you to review.

Starting at the very top is the sumbut button, once selected your dbt manifest will be saved to Tree Schema using the options specified below. There is also a link back to this page in the event you need a refresher on what these items mean.

../_images/dbt_manifest_summary_submit.png

In the second section are the processing options. There are 4 total options:

../_images/dbt_manifest_summary_options.png
  1. Add Schemas & Fields: Whether or not to add the schemas and fields within the manifest file to Tree Schema. We highly suggest that you do not use this option unless are not connecting Tree Schema directly to your database. It is a better practice to connect Tree Schema to the same database that your dbt processes run on and allow Tree Schema to extract the schemas and fields before you upload a dbt manifest. This is because Tree Schema is able to extract the case-sensitive standardized names from the database but the field and schemas from your manifest file may have manually defined values which would make them incompatible with future automation within Tree Schema.

  2. Update Descriptions: Whether or not to update descriptions for all of the schemas and fields found in your manifest file. We suggest only doing this if you have existing dbt documentation and want to bootstrap your Tree Schema documentation. If you have previously added or updated documentation within Tree Schema you use this option to update descriptions from a dbt manifest then it will overwrite any documentation in Tree Schema.

  3. Add Tags: Whether or not to add tags to the schema and field assets that have been tagged in the manifest file. Note - if you have removed a tag from Tree Schema and then the same tag is added via the dbt manifest, it will be re-added.

  4. Add Lineage: Whether or not to add data lineage to Tree Schema from the manifest file. This will only add data lineage and will not deprecate any data lineage that already exists but is no longer found in the manifest file. See the [data lineage details](#) below for more information.

The final secton displays the schemas and the lineage found in the manifest. The schema display contains the name of the schemas, the type and whether or not they already exist within Tree Schema. The lineage table contains the source and target schemas found in the manifest.

../_images/dbt_manifest_summary_assets.png

Upload via the API / Python Client

The same functions described above can be executed via the REST API or Python Client. Links to the corresponding functions can be found here:

Viewing Historical Manifest Processes

You can view the historical processes that Tree Schema has executed to save your manifest files. The manifest file, timestamp it was processed as well as the parameters listed above for the processing will all be displayed:

../_images/dbt_historical_processes.png

dbt Data Lineage in Tree Schema

The data lineage created by dbt captures lineage at the schema level, however, in Tree Schema all data lineage is captured at the field level. When Tree Schema creates data lineage from the dbt manifest it attempts to determine the lineage between fields for the given schemas. However, this is unlikely to be exhaustive. There may also be times when Tree Schema is not able to infer the lineage for any fields between two schemas.

Tree Schema uses the following rules, in this order to create field level data lineage:

  1. Create a link between all fields that have the same name in both the source and target schemas (excluding the field “id”)

  2. Create a link between fields that share a table name & foreign ID. Tree Schema will attempt to match a given table name (e.g. customers) to an ID in another field (e.g. cust_id, customer_id, etc.)

  3. If there are no matches for the first 2 rules and if both schemas have an “id” field, Tree Schema will create the link between the two ID fields and generate a governance action for someone to review and to fix

  4. If there are no matches for the first 3 rules and if at least one of the schemas does not have an “id” field, Tree Schema will link the first two fields (sorted alphabetically) and generate a governance action for someone to review and to fix

The governance actions generated will be captured in the transformation where the data lineage is created. The governance actions will look like this:

../_images/dbt_lineage_no_match_gov_action.png

Automated Transformation Creation

When creating data lineage from a dbt manifest file in Tree Schema, Tree Schema creates one Transformation for each target schema. All of the data flows into that schema are captured as part of the transformation. This can be seen here, in this example there are two upstream tables that lead into one target.

../_images/dbt_lineage_transform_two_targets.png

And in this example there is only one upstream table:

../_images/dbt_lineage_transform_one_target.png

Transformations are created with the following naming convention:

  • DBT Model: {Model Name}

For example, if your model name is customer_facts your transformation will be DBT Model: customer_facts