[Catalog] TIP - Using Slices with Creation Data Metadata Filter to Auto-Sync Data Rows For a Given Month

Often times, ML teams need to re-run/re-train models on newer data on a certain cadence (weekly, monthly, etc.) for a variety of reasons (e.g. distributional data drift).

A common way to organize data, therefore, would be to bucket data rows based on the day (if dealing with telemetry data, for example, which can comprise of data rows every minute/five minutes), or month they were created in.

Using Catalog’s built-in slices feature in tandem with the intelligent tooling for ‘Data Row Created At’, a practitioner can save a filter, so that any new records that get created for a given day, week, or month can automatically be added to the slice.

The example below shows how to construct a slice such that new records between June and July 2022 will be automatically added to the slice.

2 Likes