I was wondering if it was possible to only export labels from a project that have a particular status. On the web UI, labels have a Workflow Status attached to them, such as “Done”.
I’ve looked over the documentation for exporting, and the following is mentioned:
“When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history”
I’m not sure if I’m missing something obvious, but how can I only export labels with a certain status, for example “Done”? Whether this is via the export call itself, or by processing the JSON returned by the call.
I’ve been using project.export_v2(params=export_params, filters=filters) to do the export. Looking at the documentation on params and filters, I didn’t see mentions of filtering by status. I’m also unable to find any status related data in the JSON returned.
Apologies if this is covered somewhere in the docs and I’ve missed it.
Apologies for the late reply.
Currently, you can’t filter on the task before the export.
I am suggesting the following, which will group the export data rows by task queue.
Let me know if that helps.
Best,
Paul N.
Labelbox Support
import labelbox as lb
def get_data_rows_per_task_queue(export_results: list) -> dict[str, list]:
"""Group the result of export_v2 with the project details by task queue
Input:
NDJSON (collection of JSON strings): Results of export_v2(params={"project_details": True})
Output:
Dictionary(string: list) : Dictionary where each key is a task queue name and each
value is a list of JSON entries for each data row
"""
# Prepare the output dictionary with the different task queues
data_rows = {task_queue.name: [] for task_queue in project.task_queues()}
data_rows["Done"] = []
for data_row in export_results:
# Data rows to label don't have a workflow history yet
if not data_row["projects"][PROJECT_ID]["project_details"]["workflow_history"]:
data_rows["Initial labeling task"].append(data_row)
else:
# Done is the ultimate task name so if if there is no "next_task_name" then the data row is done
next_task_name = data_row["projects"][PROJECT_ID]["project_details"]["workflow_history"][0].get(
"next_task_name", "")
if next_task_name:
data_rows[next_task_name].append(data_row)
else:
data_rows["Done"].append(data_row)
return data_rows
if __name__ == "__main__":
API_KEY = "<YOUR API KEY>"
PROJECT_ID = "<YOUR PROJECT ID>"
client = lb.Client(API_KEY)
project = client.get_project(PROJECT_ID)
export_job = project.export_v2(params={"project_details": True})
export_job.wait_till_done()
results = export_job.result
# Retrieve data rows
data_rows = get_data_rows_per_task_queue(results)
# Display the number of data rows per task queue
for dr in data_rows.keys():
print(dr, len(data_rows[dr]), sep="\t")
It’s reassuring to see that we came up with a similar solution to the one you’ve proposed.
The initial confusion on my part had come from a project where we had programmatically moved labels into “Done” when uploading the labels. This meant there were labels in “Done” with empty workflows.
Thanks again for your help, this issue is resolved for us.