[Model] TIP - How to configure custom TRAIN/TEST/VALIDATE data splits

Often times, development teams want to control how their dataset is assigned into TRAIN/TEST/VALIDATION dataset splits.

Here’s how this can be accomplished in Labelbox programmatically (Model Run) and through the UI.

Python SDK

client.enable_experimental=True

dataset = client.get_dataset("<Dataset_id>") # Your training dataset 

# using data row ids 
model_run.assign_data_rows_to_split(
  data_row_ids=data_row_ids[:100],
  split="TRAINING",
)
model_run.assign_data_rows_to_split(
  data_row_ids=data_row_ids[100:150],
  split="VALIDATION",
)
model_run.assign_data_rows_to_split(
  data_row_ids=data_row_ids[150:200],
  split="TEST",
)

# using global keys 
model_run.assign_data_rows_to_split(
  global_keys=global_keys[:100],
  split="TRAINING",
)
model_run.assign_data_rows_to_split(
  global_keys=global_keys[100:150],
  split="VALIDATION",
)
model_run.assign_data_rows_to_split(
  global_keys=global_keys[150:200],
  split="TEST",
)

UI
ezgif.com-video-to-gif

2 Likes