Data Schemas

Data Schemas are the heart and soul of a Data Catalog. They describe the shape, structure and format of the data. You may typically have data schemas represented as a table, a JSON or Parquet file, or an Excel sheet but a Data Schema is really just a reference to a structured set of fields.

We provide you two ways to populate your Data Catalog with Schemas:

  1. Manually defining the Schemas, either by entering each field individually or uploading a sample file that represents your schema

  2. Automatically generating them from a Data Store

We highly encourage the latter approach for several reasons:

  • It’s simply less effort to let us read the data and metadata from your Data Store and to infer the Schema

  • As your schema changes over time we can detect those changes and automatically update the catalog

That being said, there are valid reasons for why you need to create a Data Schema manually so we’ve attempted to make that as painless as possible.

The next few pages walk through how to: