Best way to stage invalid data rows for deletion?

I have an annotation project where I’m marking up CT scan images. Some of the images that get added to the project are unsuitable for annotation. In this case we need to be able to identify these images, delete them from LabelBox, reprocess the CT data upstream to make a better image, and re-upload to LabelBox.

My process for doing this today is really inefficient. When I come across an invalid image, either during initial labeling or more often during review after a labeler attempted to annotate the image, I copy the data row ID into a CSV file and periodically I run a script that bulk deletes the invalid data rows in the CSV.

I’m wondering if anyone can recommend a more efficient process. For instance, is it possible to create a workflow step where if a labeler or reviewer encounters an invalid image they move it into a “to delete” status or something? In that case I would be able to grab all data rows in this specific status via the SDK and bulk delete.

I think I have found a solution.

  • We can create a new Workflow task that comes after the review step called “invalid images” or something.
  • We put a filter on the new task to look for an issue with an “Invalid Image” category.
  • If an invalid image is encountered, the labeler is instructed to mark it with the appropriate issue type and “Skip” the image.
  • During review, approving an image marked with an issue moves it to the new invalid image step. Without an issue it moves to done. The reviewer can also add an Invalid Image Issue if the labeler missed it.
  • We can then create an automated process to look for any data row in the “Invalid Image” task, log those images for reprocessing and bulk delete them from LabelBox.

@mbelltitan Smart idea, the idea is to do the maximum amount of curation in the Catalog but if you are automating the process via SDK I can understand this might not be practical.
Thank you for sharing your “solution”!