Data Catalog

We want you to unlock more value from your data by providing you the tools and capabilities required to provide both self-service research as well as collaboration centered around your data.

Here we’ll give a very brief introduction to a few important key terms and entities that are important to Tree Schema along with how you can use them to extract value from your data.

Important Entities

Here is a very brief primer on the important entities that exist within the Tree Schema data catalog, we give this overview to provide the high level context before diving into each of these items in detail within their respective section below.

  • Data Store: These are the physical containers for your data; they can be anything from a database to a file store to an Excel sheet

  • Data Schema: Represent the shape, structure and format of your data; these may be tables, JSON files, Parquet files, Excel sheets, etc.

  • Fields: The actual data itself; there is likely a data type and a data format associated to each field

  • Field Value: An example value from a field, if the Field is “status” a Field Value may be “01” which may stand for “new”

  • Transformation: Describes how data moves between schemas; these could be ETL jobs, APIs, streaming data processes or even manual file movement

Creating Value

When you create data stores, add schemas, populate the schemas with fields and link the fields together with transformation in Tree Schema you are creating a lineage that enables you and your team to answer rich questions across different roles such as:

  • Developer:

    • What processes do I need to update if I remove this table?

    • What data scientist(s) should I notify that a change is going to occur?

  • Data Scientist / Business Analyst:

    • What data types are these fields and can they be null?

    • How do the values in these two fields differ?

    • Who should I contact if I have more questions about how this data is created?

  • Data Steward:

    • Which files contain NPI / PII data?

    • Is our data well documented?

Tree Schema aims to streamline the answers to these questions and to provide your team a centralized place to document and discus your data.

Now that you’ve got the introduction, jump right in and become a pro with help from these walkthroughs: