How to: Import Annotations as Pre-labels

janny · May 21, 2024, 5:20am

Hello Labelbox Community!

This tutorial is designed to walk you through the process of importing annotations as prelabels, significantly simplifying your labeling tasks.

Set up environment

Before diving into the specifics, ensure your development environment is ready. You’ll need certain Python libraries installed:

import uuid
from PIL import Image
import requests
import base64
import labelbox as lb
import labelbox.types as lb_types
from io import BytesIO

Next, initialize your Labelbox client with your API key:

api_key = "API_KEY"
client = lb.Client(api_key)

Crafting Supported Annotations

Let’s start by defining the types of annotations you’ll be working with. We’ll cover radio buttons, bounding boxes (BBoxes), and polygons. Each type requires careful attention to naming conventions, especially ensuring they align with your ontology features.

Here’s how to create a radio button annotation, both in Python and in NDJSON format:

# Python annotation
radio_annotation = lb_types.ClassificationAnnotation(
    name="Is it daytime or nighttime?",
    value=lb_types.Radio(answer=lb_types.ClassificationAnswer(
        name="Daytime")))

# NDJSON
radio_annotation_ndjson = {
    "name": "Is it daytime or nighttime?",
    "answer": {
        "name": "Daytime"
    }
}

For bounding box annotations, remember to match the annotation name with your ontology feature’s name:

# Python annotation
bbox_annotation = lb_types.ObjectAnnotation(
    name="Human",  # must match your ontology feature's name
    value=lb_types.Rectangle(
        start=lb_types.Point(x=55.0, y=670.0),
        end=lb_types.Point(x=151.0, y=956.0)
    ))

# NDJSON
bbox_annotation_ndjson = {
    "name": "Human",
      "bbox": {
          "top": 670.0,
          "left": 55.0,
          "height": 286.0,
          "width": 96.0
    }
}

Polygon annotations require specifying multiple vertices:

# Python annotation
polygon_annotation = lb_types.ObjectAnnotation(
    name="Shopping store",  # must match your ontology feature's name
    value=lb_types.Polygon(  # Coordinates for the vertices of your polygon
        points=[
            lb_types.Point(x=1.2, y=1.2),
            lb_types.Point(x=380.4, y=0.0),
            lb_types.Point(x=568.8, y=426.0),
            lb_types.Point(x=562.8, y=692.4),
            lb_types.Point(x=0.0, y=889.2),
            lb_types.Point(x=1.2, y=1.2)
        ]))

# NDJSON
polygon_annotation_ndjson = {
  "name": "Shopping store",
  "polygon": [
    {"x": 1.2, "y": 1.2},
    {"x": 380.4, "y": 0.0},
    {"x": 568.8, "y": 426.0},
    {"x": 562.8, "y": 692.4},
    {"x": 0.0, "y": 889.2},
    {"x": 1.2, "y": 1.2}
  ]
}

Importing Data Rows into Catalog

Now, let’s move on to importing data rows into your catalog. This involves creating a dataset, uploading an image, and handling potential errors:

# send a sample image as batch to the project
global_key = "stanford-test-image"

test_img_url = {
    "row_data":
        "https://labelbox-jannybucket.s3.us-west-2.amazonaws.com/stanford-shopping-center-06.jpg",
    "global_key":
        global_key
}

dataset = client.create_dataset(name="stanford-demo-dataset")
task = dataset.create_data_rows([test_img_url])
task.wait_till_done()

print(f"Failed data rows: {task.failed_data_rows}")
print(f"Errors: {task.errors}")

if task.errors:
    for error in task.errors:
        if 'Duplicate global key' in error['message'] and dataset.row_count == 0:
            # If the global key already  exists in the workspace the dataset will be created empty, so we can delete it.
            print(f"Deleting empty dataset: {dataset}")
            dataset.delete()

Create or select an ontology. Ensure your project has the correct ontology set up, matching the tool names and classification instructions with your annotations:

ontology_builder = lb.OntologyBuilder(
    classifications=[  # List of Classification objects
        lb.Classification(class_type=lb.Classification.Type.RADIO,
                          name="Is it daytime or nighttime?",
                          options=[
                              lb.Option(value="Daytime"),
                              lb.Option(value="Nighttime")
                          ]),
    ],
    tools=[  # List of Tool objects
        lb.Tool(tool=lb.Tool.Type.BBOX, name="Human"),
        lb.Tool(tool=lb.Tool.Type.POLYGON, name="Shopping store"),
    ])

ontology = client.create_ontology("stanford-test-ontology",
                                  ontology_builder.asdict(),
                                  media_type=lb.MediaType.Image
                                  )

Create a labeling project.

# Project defaults to batch mode with benchmark quality settings if this argument is not provided
project = client.create_project(name="stanford demo",
                                media_type=lb.MediaType.Image)

project.setup_editor(ontology)

Send the batch of data rows to the project. Prepare your data rows for submission:

batch = project.create_batch(
    "stanford-demo-batch",  # each batch in a project must have a unique name
    global_keys=[
        global_key
    ],  # paginated collection of data row objects, list of data row ids or global keys
    priority=1  # priority between 1(highest) - 5(lowest)
)

print(f"Batch: {batch}")

Creating Annotations Payload

Both Python and NDJSON formats are supported for annotations:

Python annotations

label = []
annotations = [
    radio_annotation,
    bbox_annotation,
    polygon_annotation,
]

label.append(
    lb_types.Label(data={"global_key" : global_key},
                   annotations=annotations))

NDJSON annotations

label_ndjson = []
annotations = [
    radio_annotation_ndjson,
    bbox_annotation_ndjson,
    polygon_annotation_ndjson,
]
for annotation in annotations:
    annotation.update({
        "dataRow": {
            "globalKey": global_key
        }
    })
    label_ndjson.append(annotation)

Uploading Annotations as Prelabels

Finally, upload your annotations to the project:

# upload MAL labels for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
    client=client,
    project_id=project.uid,
    name="mal_job" + str(uuid.uuid4()),
    predictions=label
)
upload_job.wait_until_done()

print(f"Errors: {upload_job.errors}")
print(f"Status of uploads: {upload_job.statuses}")

Good thing to note:

Pre-label: (aka model-assisted labeling, MAL) are for assets (data row) that have no current a label allocated (ground-truth).
Ground-truth (GT): Creates a label on a given data row

With this guide, you’re well on your way to enhancing your labeling process with prelabelled annotations on Labelbox. Happy annotating!

Try it out yourself!

Import Image Annotations
Colab Notebook

b.combs · May 21, 2024, 11:27am

@janny love this tutorial, I’ve been practicing my python skills and working in my terminal in general to then be more skilled with doing this type of work in labelbox.

Quick question, does this pre-labeling also work for text (named entity recognition)?

JT_V · May 21, 2024, 3:27pm

Yes!

Here is a guide that can walk you through NER MAL!

b.combs · May 21, 2024, 4:08pm

Sweet, looking forward to giving this a shot this week.

ncotoni · May 22, 2024, 11:40am

Hello !
I have a kind of specific question which I couldn’t find any answer in the documentation.
Janny shared a code snippet on how to create annotations with the API based on a global_key

janny:

Creating Annotations Payload

Both Python and NDJSON formats are supported for annotations:

Python annotations
label = []
annotations = [
    radio_annotation,
    bbox_annotation,
    polygon_annotation,
]

label.append(
    lb_types.Label(data=lb_types.ImageData(global_key=global_key),
                   annotations=annotations))

I wonder if it is possible to be more specific on the data_row concerned by the new annotation.
In my Labelbox project I get, on some images, two annotations.
For example :

How can I add an additional annotation of any feature on an already existing annotation ?

ptancre · May 22, 2024, 1:02pm

Hey @ncotoni ,

You could use data row id if you prefere :

    # The data row has a label: the collection of all the bbox objects
    labels_list.append(
        lb_types.Label(
            data=lb_types.ImageData(uid=dr.uid),
            annotations=obj_list
        )
    )

Once created, pre-labels cannot be amended programmatically however you can amend those in the labeling editor.

ncotoni · May 22, 2024, 1:20pm

Hi @ptancre, thank you for your answer.
I’m currently working with uid and not global_key as I quoted, and it doesn’t let me choose which annotations to modify while uploading ground truth.
Pre-labels can’t be the solution I am looking for, this would force annotators to review the entire data set to accept those pre-labels as labels.

ptancre · May 22, 2024, 2:14pm

I think there might be some confusion we want to address here :

Pre-label : (aka model-assisted labeling, MAL) are for asset (data row) that have no current a label allocated (ground-truth).
Ground-truth (GT) : Creates a label on a given data row

Seems you are importing GT here and as mentioned before if you trying to send an import once again you are going to create duplicate labels.

Curious into what is your current use case? You mention annotators, are they expected to review the GT you are sending?

ncotoni · May 22, 2024, 2:51pm

I didn’t know that importing GT would create duplicate labels.

With my annotators, we need to add a marker to the ontology, when a combination of points is present or another is absent.

I wanted to do it directly with the labelbox API by uploading GTs depending on the presence or absence of a certain point.
But, as you pointed out, this will duplicate the labels instead of modifying it by adding only this marker

ptancre · May 22, 2024, 3:34pm

Interesting, you could use GT if you don’t want to have your annotators to go through the whole project but there would be a need to review since the import will be biased.

You still gain time because no annotation are done from scratch but this would be error prone (if a review is missed) and can alter the quality of your model d
And a bit of erratum here, the previous code block provided contains method that will be deprecated, we actually simplified the format :

labels.append(
    lb_types.Label(
    data={"uid":data_row_ids},
    annotations=annotations))

You do not need to specify any longer the data type.

ncotoni · May 23, 2024, 2:44pm

Uploading as GT was the solution, it did work. I have now duplicate labels, but since I only save the latest one in my workflow, everything is running smoothly.
Thank you.

Topic		Replies	Views
Pre-annotations of text entities with radio sub-classifications Python SDK	1	154	January 19, 2024
Change class of bbox annotations for PDF documents Annotate annotations	9	272	June 25, 2024
What NDJSON do I use to create a 'Classification` annotation? Python SDK	0	579	May 4, 2022
Uploading historic ground-truth labels to New project WITH Existing Ontology [SOLVED] Python SDK	2	516	August 18, 2022
How to : Duplicate an Ontology in Labelbox Using SDK: A Step-by-Step Script Scripts / Others python-sdk , ontology	0	212	March 21, 2024