Often times, development teams want to control how their dataset is assigned into TRAIN/TEST/VALIDATION dataset splits.
Here’s how this can be accomplished in Labelbox programmatically (Model Run) and through the UI.
Python SDK
client.enable_experimental=True
dataset = client.get_dataset("<Dataset_id>") # Your training dataset
# using data row ids
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[:100],
split="TRAINING",
)
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[100:150],
split="VALIDATION",
)
model_run.assign_data_rows_to_split(
data_row_ids=data_row_ids[150:200],
split="TEST",
)
# using global keys
model_run.assign_data_rows_to_split(
global_keys=global_keys[:100],
split="TRAINING",
)
model_run.assign_data_rows_to_split(
global_keys=global_keys[100:150],
split="VALIDATION",
)
model_run.assign_data_rows_to_split(
global_keys=global_keys[150:200],
split="TEST",
)
UI