I have users who created a project and created a purpose-built batch for running labeler’s agreement testing. They have consensus labels turned on (x/3 required) for the images in this batch. One of the labelers dropped out of the project. We would like to remove their labels from the batch without deleting the labels from other users on the same rowkey. How can I delete a labels filtered to a specific user on a specific batch?
Hello Samuel! There is no filtering and deletion per labeler per batch. You can use this doc and delete the labels but setting the set_labels_as_template=True
can help you keep the other labels in-tact and start labeling again.
Thanks for the response Naman. Unfortunately this doesn’t do what i’m looking for on several accounts:
- When deleting all labels and saving as template, only one of the labels is preserved
- I don’t seem to have a choice in which label is preserved (although it seems to save the most recent)
This is a problem because the purpose of this batch is specifically to test labeler’s agreement. We do not want each labeler to see a single template label as that would bias them when they create a new label. Only the author of the original template (which we have no choice over) is unbiased by seeing the template. Our images take significant amount of time to label due to the fine detail work required so it is extremely undesirable to ask the labelers to restart labeling from scratch.
I think there are several gaps in the SDK that, if patched, would let me hack around this. The primary one is the lack of association between Batch
objects and DataRow
objects. TBH this one is really confounding to me as this association should be the fundamental thing Batches record. If this association existed I could:
- get a list of all data rows in the batch of interest →
datarows_in_batch
- fetch all labels from the project using
Project.labels()
- filter to only the labels with
label.created_by() == <user>
→labels_by_user
- use the
Labels.bulk_delete()
method for labels that are inlabels_in_batch
and whoselabel.data_row
is indatarows_in_batch
Otherwise, i will need to manually go into the UI and identify the label UID on a per label basis and manually create my list of labels to bulk delete. This is very tedious for something that seems like it should be a clear use-case for the sdk. LMK if the association b/t Batch
(s) and DataRow
(s) exists - this would immediately clear up my issues.
Hello Samuel!
Here is a script to fetch and filter data row by batches in a project. This will list all the batches in the project and all the data rows. The project_details
key contains a batch_id
key which can be used to filter data rows by passing in a batch_uid
retrieved from the list of batches above! Example: ('project_details': {'task_name': 'Initial labeling task', 'batch_id': '<id>'
) :
%pip install labelbox
import labelbox as lb
import json
API_KEY = "" # replace with your API Key
client = lb.Client(API_KEY)
project = client.get_project("<>") # replace with project ID and remove angle brackets
batches = project.batches()
batch = next(batches)
for batch in batches:
print(batch)
list(batches)
# Export parameters
export_params = {
"data_row_details": True,
"project_details": True
}
export_task = project.export(params=export_params)
export_task.wait_till_done()
# List to store filtered results
filtered_data_rows = []
TARGET_BATCH_UID = "" # Replace with the batch you want to filter
# Streaming export
def json_stream_handler(output: lb.BufferedJsonConverterOutput):
data_row = output.json
project_details = data_row.get("projects", {}).get("", {}).get("project_details", {}) # pass project ID here as well in the .get("")
if project_details.get("batch_id") == TARGET_BATCH_UID:
filtered_data_rows.append(data_row)
print(filtered_data_rows) # Print only filtered results
export_task.get_buffered_stream(stream_type=lb.StreamType.RESULT).start(stream_handler=json_stream_handler)
print(len(filtered_data_rows))
Hope that helps!
This worked for me with some minor modifications. Thank you!