Is there a way to pull the number of images currently in each workflow for a particular project?

Hi!

Is there a way to pull the number of images currently in each workflow for a particular project project? Instead of using exportv2 to iterate through each data row in a project and count the workflow statuses of each datarow, I was wondering if there was a workaround or a method I am not aware of that could retrieve this information only?

Thank you.

Hi @EC123456789 ,

Happy New Year!

You may be interested in the following:

import labelbox as lb

API_KEY = "<YOUR API KEY>"
PROJECT_ID = "<YOUR PROJECT ID>"

client = lb.Client(API_KEY)
project = client.get_project(PROJECT_ID)

for t in project.task_queues():
  print(f"{t.name}: {t.data_row_count}")

print(f"To label: {len(project.export_queued_data_rows())}")

Best regards,

Paul N.
Labelbox Support

2 Likes

Happy New Year! This was very helpful, thank you. I will mark this as the solution but I have one more question. Is there any easy way to get the count of images in the ā€˜Doneā€™ Workflow?

1 Like

Hello,

You might need to run a export for this setting the filters, ā€œworkflow_statusā€: ā€œDoneā€.

export_params= {
  "attachments": True,
  "metadata_fields": True,
  "data_row_details": True,
  "project_details": True,
  "label_details": True,
  "performance_details": True,
  "interpolated_frames": True
}

# Note: Filters follow AND logic, so typically using one filter is sufficient.
filters= {
  "last_activity_at": ["2000-01-01 00:00:00", "2050-01-01 00:00:00"],
  "workflow_status": "Done"
}

export_task = project.export_v2(params=export_params, filters=filters)
export_task.wait_till_done()

if export_task.errors:
  print(export_task.errors)

export_json = export_task.result
print("results: ", len(export_json))

Thanks!

Hi @EC123456789 ,

I canā€™t think of a direct way to retrieve the number of data rows with the ā€œDoneā€ status. But, as mentioned by my colleague, you could use the filter "workflow_status": "Done" with project.export_v2().

If you donā€™t want to use export, you can calculate the difference between the individual counters and the total number of data rows attached to the project:

import labelbox as lb

API_KEY = "<YOUR API KEY>"
PROJECT_ID = "<YOUR PROJECT ID>"

# Connect to the Labelbox platform
client = lb.Client(API_KEY)
project = client.get_project(PROJECT_ID)

# Display individual task queues and the number of data rows
# Keep track of the total number of data rows
dr_counter = 0

for t in project.task_queues():
  print(f"{t.name}: {t.data_row_count}")
  dr_counter += t.data_row_count

# To Label and Done are not task queues per se and need a different logic

print(f"To label: {len(project.export_queued_data_rows())}")

# Calculate the number of data rows attached to a project
nb_data_rows = sum([1 
                    for b in project.batches() 
                    for dr in b.export_data_rows()])

print(f"Done: {nb_data_rows - dr_counter}")

Thank you so much, both of you.

2 Likes