"Benchmarks" or "Consensus"? And how to add gold standard labels using Python programmatically

pedram · January 15, 2024, 7:46pm

Hi, I have a dataset with gold standard labels. Now I want to annotate them with at least two annotators and calculate the agreement 1) among the annotators and 2) between the annotators (when there’s agreement) and the gold standard labels, and export the data. Few questions here:

First of all, should I pick the “Benchmarks” or “Consensus” project type for this? I think technically I may be able to do it with either, but I’m wondering which project type is a better choice in my scenario?
If I pick the “Benchmarks” project, how can I mark the gold standard in the project? I know how to do this with the UI, but let’s say we have thousands of data rows, we cannot go and “add to benchmark” one by one on the UI. So I wonder how can I do it using Python SDK programmatically?
Let’s say I already specified the gold standard data rows in the “Benchmarks” project (they move to the Done section now), how can I assign at least two annotators to these gold standard data rows for annotation? For “Consensus,” when we set up the project, I see we have an option to choose # of annotators, but I wonder how we can do it for Benchmarks after specifying the gold standard data rows.

PT · January 18, 2024, 3:51pm

Hi @pedram ,

Both quality method would be valid in your case

Consensus → import ground truth via the SDK, consensus would calculate IOU from there however you would not have a visual “winner” but you can filter by the creator_id.
Benchmark → do the same as Consensus but this time you can choose a Winner label programatically : create_benchmark() (ref : Labelbox Python API reference — Python SDK reference 3.59.0 documentation)

Now, if you need a specific set of labels (2) consensus would be best fitted here.

Hope this helps.

Many thanks,
PT

Topic		Replies	Views
How To: Benchmark vs Consensus Scripts / Others project	0	211	June 20, 2024
How to assign all data rows to all labelers? Annotate data-row	3	363	July 28, 2023
Cannot share "In review" items Python SDK import , datasets , data-row	1	312	May 30, 2023
Batch with annotations Python SDK import , annotations	1	30	July 21, 2025
Is it possible to have multiple graders classify the same image? Annotate annotations	2	291	September 26, 2023

"Benchmarks" or "Consensus"? And how to add gold standard labels using Python programmatically

Related topics