Exporting labels from a Project that have a particular status (Workflow Status)

I was wondering if it was possible to only export labels from a project that have a particular status. On the web UI, labels have a Workflow Status attached to them, such as “Done”.

I’ve looked over the documentation for exporting, and the following is mentioned:

“When you export data rows from a project, you can narrow down your data rows by label status, metadata, batch, annotations, and workflow history”

I’m not sure if I’m missing something obvious, but how can I only export labels with a certain status, for example “Done”? Whether this is via the export call itself, or by processing the JSON returned by the call.

I’ve been using project.export_v2(params=export_params, filters=filters) to do the export. Looking at the documentation on params and filters, I didn’t see mentions of filtering by status. I’m also unable to find any status related data in the JSON returned.

Apologies if this is covered somewhere in the docs and I’ve missed it.

Thanks

Hi @t.evans ,

Apologies for the late reply.
Currently, you can’t filter on the task before the export.

I am suggesting the following, which will group the export data rows by task queue.

Let me know if that helps.

Best,

Paul N.
Labelbox Support

import labelbox as lb


def get_data_rows_per_task_queue(export_results: list) -> dict[str, list]:
    """Group the result of export_v2 with the project details by task queue
    
    Input:
        NDJSON (collection of JSON strings): Results of export_v2(params={"project_details": True})
    Output:
        Dictionary(string: list) : Dictionary where each key is a task queue name and each
            value is a list of JSON entries for each data row
    """
    
    # Prepare the output dictionary with the different task queues
    data_rows = {task_queue.name: [] for task_queue in project.task_queues()}
    data_rows["Done"] = []
    
    for data_row in export_results:
        
        # Data rows to label don't have a workflow history yet
        if not data_row["projects"][PROJECT_ID]["project_details"]["workflow_history"]:
            data_rows["Initial labeling task"].append(data_row)
        else:
            # Done is the ultimate task name so if if there is no "next_task_name" then the data row is done
            next_task_name = data_row["projects"][PROJECT_ID]["project_details"]["workflow_history"][0].get(
                "next_task_name", "")

            if next_task_name:
                data_rows[next_task_name].append(data_row)
            else:
                data_rows["Done"].append(data_row)

    return data_rows


if __name__ == "__main__":

    API_KEY = "<YOUR API KEY>"
    PROJECT_ID = "<YOUR PROJECT ID>"

    client = lb.Client(API_KEY)
    project = client.get_project(PROJECT_ID)

    export_job = project.export_v2(params={"project_details": True})
    export_job.wait_till_done()
    results = export_job.result

    # Retrieve data rows
    data_rows = get_data_rows_per_task_queue(results)

    # Display the number of data rows per task queue
    for dr in data_rows.keys():
        print(dr, len(data_rows[dr]), sep="\t")

Hey Paul,

Thanks for your reply.

It’s reassuring to see that we came up with a similar solution to the one you’ve proposed.

The initial confusion on my part had come from a project where we had programmatically moved labels into “Done” when uploading the labels. This meant there were labels in “Done” with empty workflows.

Thanks again for your help, this issue is resolved for us.

1 Like