We have a group of 10 graders and we would like all of them to classify our image datasets which have ground truths associated to get a glimpse on their performance.
How can one do so without uploading the datasets multiple times, ie, once for each grader?
To be clear, we do not want to establish ‘consensus’ - we already have ground truths for our images. We just want to evaluate performance of our graders.
By default project are set to quality setting Benchmark, so every asset is reserved by 1 labeler.
Consensus would be the way to go here (even you mentioned you are not up to), if you want your 10 labelers to be evaluated for the same asset.
If you have created labels (ground truths) I would do a prediction import instead so every labelers have the same predictions and are evaluated in the same conditions.
# upload MAL labels for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
client=client,
project_id=project.uid,
name="mal_job" + str(uuid.uuid4()),
predictions=label
)
upload_job.wait_until_done()
print(f"Errors: {upload_job.errors}", )
print(f"Status of uploads: {upload_job.statuses}")