How to: Clone a Project and Copy Data Rows and Labels in Labelbox

smutta · June 14, 2024, 6:08am

Hello Labelbox Community!

In this post, we’ll guide you through the process of cloning an existing project and copying data rows along with their labels to a new project. This can be very useful when you need to duplicate a project setup and its data for different tasks or teams. Let’s get started!

Prerequisites

Before you start, make sure you have:

Labelbox API key.
Labelbox Python SDK installed. If not, install it using:

%pip install -q --upgrade "labelbox[data]"

Step-by-Step Guide

1. Set Up Your Labelbox Client

First, initialize your Labelbox client with your API key.

import labelbox as lb

API_KEY = 'YOUR_API_KEY_HERE'
client = lb.Client(api_key=API_KEY)

2. Clone the Source Project

Use the clone method to duplicate your existing project.

# Get the source project using its ID
source_project = client.get_project("YOUR_SOURCE_PROJECT_ID")

# Clone the source project
destination_project = source_project.clone()
print(f"Cloned project ID: {destination_project.uid}")

3. Copy Data Rows and Labels

To copy our data rows and labels to a different project from a source project, use the client.send_to_annotate_from_catalog method with our Labelbox client.

Note: Send to Annotate does not currently support consensus projects.

Parameters: When you send data rows with labels to our destination project, you may choose to include or exclude certain parameters inside a Python dictionary. At a minimum, a source_project_id will need to be provided:

annotation_ontology_mapping: A dictionary containing the mapping of the source project’s ontology feature schema IDs to the destination project’s ontology feature schema IDs. If left empty, only the data rows with no labels will be sent to our destination project.

{"<source_feature_schema_id>": "<destination_feature_schema_id>"}

override_existing_annotations_rule: The strategy defines how to handle conflicts in classifications between the data rows that already exist in the project and incoming labels from the source project.

Defaults to ConflictResolutionStrategy.KeepExisting. Options include:

ConflictResolutionStrategy.KeepExisting
ConflictResolutionStrategy.OverrideWithPredictions
ConflictResolutionStrategy.OverrideWithAnnotations

from labelbox.schema.conflict_resolution_strategy import ConflictResolutionStrategy

send_to_annotate_params = {
    "source_project_id": source_project.uid,
    "annotations_ontology_mapping": annotation_ontology_mapping,  # to be defined
    "exclude_data_rows_in_project": False,
    "override_existing_annotations_rule": ConflictResolutionStrategy.OverrideWithPredictions,
    "batch_priority": 5,
}

# Get task queue ID for manual review
queue_id = [queue.uid for queue in destination_project.task_queues() if queue.queue_type == "MANUAL_REVIEW_QUEUE"][0]

task = client.send_to_annotate_from_catalog(
    destination_project_id=destination_project.uid,
    task_queue_id=queue_id,  # ID of workflow task, set ID to None if you want to send data rows with labels to the Done queue.
    batch_name="Clone Demo Batch",
    data_rows=lb.GlobalKeys(global_keys),  # Provide a list of global keys from source project
    params=send_to_annotate_params
)

task.wait_till_done()

print(f"Errors: {task.errors}")

Conclusion

You have successfully cloned a project and copied data rows along with their labels to a new project. This approach ensures that you can easily replicate project setups and data for various purposes. If you have any questions or run into any issues, feel free to reach out to the community for support.

Reference📚 :

Happy labeling!

PT · September 11, 2024, 1:32pm

Here is an example using data row ids as data row identifier and doing a mapping of the same feature schema id from the origin and the destination project.

You can retrieve the feature schema ids via onto_normalized (returns a json file of the ontology)
You can retrieve data rows ids (or global_keys) from an export

import labelbox as lb
from labelbox.schema.conflict_resolution_strategy import ConflictResolutionStrategy

API_KEY = None
PROJECT_ID = 'clzscun5p07oh07338qo05497'
client = lb.Client(api_key=API_KEY)

project = client.get_project(PROJECT_ID)
clone_project = project.clone()

project_ontology = project.ontology()
onto_normalized = client.get_ontology(project_ontology.uid).normalized
onto_normalized['tools'][0]['featureSchemaId']

#given you are doing a 1:1 you need to map the ontology from the source project to the destination project with the same feature schema id

annotation_ontology_mapping = {"clk8ru1f8099u07yoejo913vi" : "clk8ru1f8099u07yoejo913vi"}
data_row_ids = ['clz1cvi7w02i00734bx07tdui']

send_to_annotate_params = {
    "source_project_id": project.uid,
    "annotations_ontology_mapping": annotation_ontology_mapping,
    "exclude_data_rows_in_project": False,
    "override_existing_annotations_rule": ConflictResolutionStrategy.OverrideWithPredictions,
    "batch_priority": 5,
}

# Get task id to workflow you want to send data rows. If sent to initial labeling queue, labels will be pre-labels. 
#queue_id = [queue.uid for queue in clone_project.task_queues() if queue.queue_type == "MANUAL_REVIEW_QUEUE" ][0]

task = client.send_to_annotate_from_catalog(
    destination_project_id=clone_project.uid,
    task_queue_id=None, # ID of workflow task, set ID to None if you want to send data rows with labels to the Done queue.
    batch_name="Prediction Import Demo Batch",
    data_rows=lb.DataRowIds(
        data_row_ids # Provide a list of global keys from source project
    ),
    params=send_to_annotate_params
    )

task.wait_till_done()

print(f"Errors: {task.errors}")

Topic		Replies	Views
Uploading historic ground-truth labels to New project WITH Existing Ontology [SOLVED] Python SDK	2	529	August 18, 2022
Adding annotations to existing labels in Labelbox project via SDK Python SDK import , annotations	1	65	February 20, 2025
Exporting a model Python SDK	1	372	July 14, 2023
How to: Convert Labelbox Image Annotations to YOLOV8 format Data Export exports , annotations	0	1255	March 7, 2024
How to : Duplicate an Ontology in Labelbox Using SDK: A Step-by-Step Script Scripts / Others python-sdk , ontology	0	252	March 21, 2024