Manage Transformation Data Flow

Creating and updating a transformation will follow these same basic steps:

  1. Add source field(s) to the transformation

  2. Add target field(s) to the transformation

  3. Link source field(s) to target(s)

Note

If you are editing an existing transformation the existing source and target fields will already be available but you can add or remove fields as needed


Add Source and Target Field(s)

The top of the form will display two tabs:

  • Select source schema(s)

  • Select target schema(s)

Both tabs will contain the same set of inputs:

../../_images/transformation_field_inputs.png

When the form is initially loaded you will notice that the Add to transformation diagram button is disabled. In order to add a field to the transformation you must first select the Data Store that contains the source schema(s). This will populate the list of schemas that exist within ths Data Store.

../../_images/tform_data_store_selected.png

Once the data store is selected and the eligible schemas are available, select the schema that contains the fields to include in the transformation.

Two things will happen here:

  1. All of the fields in the schema will be added to the select fields list

  2. The Add to transformation diagram will become enabled and you will be able to add any selected fields to the transformation

../../_images/tform_data_schema_selected.png

By default, all fields for a schema are selected but you can select just one or specifically select any subset of fields that you would like.

Pressing the Add to transformation diagram button now adds these fields to the Build the transformation section:

../../_images/tform_data_field_added.png

Now that a schema with one or more fields has been added to your transformation you can add additional source fields from the same schema or from different schemas, even from different schemas in different data stores.

When you add fields while under the Select source schema(s) tab you will notice that there is a triangle on each row on the right side of each field added to the transformation, this is called a connector. When you add a field while under the Select target schema(s) tab the connector will be on the left side. We will use these shortly to connect from a source to a target.

To continue, add additional fields to the transformation while under the Select target schema(s) tab so that you have one or more target schemas added:

../../_images/tform_data_target_added.png



Suggestions

You may be wondering, what is the best way to create transformations? Is it better to create a few transformations that have many steps built in? Such as:

../../_images/transform_many_steps.png

Or does it make more sense to create more, smaller transformations, such as:

../../_images/transform_few_steps.png

You should do what you feel works best for you and your team. Here, at Tree Schema we find that there is a balance in the middle and we generally define our own transformations as they relate to the underlying use-case which can cause the size and scope of each transformation to vary widely. For example, we have:

  • One “lift and shift” transformation has all of the data movement from 20 tables in Postgres that are each dumped directly into S3

  • One “page view” transformation for the event stream that crosses several schemas between Postgres, Kafka and Redshift

  • One “invitation” transformation that represents data movement from our invitations database to our customer database