Data Schemas¶
Data Schemas are the heart and soul of a Data Catalog. They describe the shape, structure and format of the data. You may typically have data schemas represented as a table, a JSON or Parquet file, or an Excel sheet but a Data Schema is really just a reference to a structured set of fields.
We provide you two ways to populate your Data Catalog with Schemas:
Manually defining the Schemas, either by entering each field individually or uploading a sample file that represents your schema
Automatically generating them from a Data Store
We highly encourage the latter approach for several reasons:
It’s simply less effort to let us read the data and metadata from your Data Store and to infer the Schema
As your schema changes over time we can detect those changes and automatically update the catalog
That being said, there are valid reasons for why you need to create a Data Schema manually so we’ve attempted to make that as painless as possible.
The next few pages walk through how to: