How to: Clone a Project and Copy Data Rows and Labels in Labelbox

:books: Hello Labelbox Community! :tada:

In this post, we’ll guide you through the process of cloning an existing project and copying data rows along with their labels to a new project. This can be very useful when you need to duplicate a project setup and its data for different tasks or teams. Let’s get started! :rocket:

Prerequisites :clipboard:

Before you start, make sure you have:

  • Labelbox API key.
  • Labelbox Python SDK installed. If not, install it using:
%pip install -q --upgrade "labelbox[data]"

Step-by-Step Guide :memo:

1. Set Up Your Labelbox Client :key:

First, initialize your Labelbox client with your API key.

import labelbox as lb

API_KEY = 'YOUR_API_KEY_HERE'
client = lb.Client(api_key=API_KEY)

2. Clone the Source Project :arrows_counterclockwise:

Use the clone method to duplicate your existing project.

# Get the source project using its ID
source_project = client.get_project("YOUR_SOURCE_PROJECT_ID")

# Clone the source project
destination_project = source_project.clone()
print(f"Cloned project ID: {destination_project.uid}")

3. Copy Data Rows and Labels :page_facing_up::arrow_right::page_facing_up:

To copy our data rows and labels to a different project from a source project, use the client.send_to_annotate_from_catalog method with our Labelbox client.

:no_entry_sign: Note: Send to Annotate does not currently support consensus projects.

Parameters: When you send data rows with labels to our destination project, you may choose to include or exclude certain parameters inside a Python dictionary. At a minimum, a source_project_id will need to be provided:

  • annotation_ontology_mapping: A dictionary containing the mapping of the source project’s ontology feature schema IDs to the destination project’s ontology feature schema IDs. If left empty, only the data rows with no labels will be sent to our destination project.
{"<source_feature_schema_id>": "<destination_feature_schema_id>"}
  • override_existing_annotations_rule: The strategy defines how to handle conflicts in classifications between the data rows that already exist in the project and incoming labels from the source project.

Defaults to ConflictResolutionStrategy.KeepExisting. Options include:

  • ConflictResolutionStrategy.KeepExisting
  • ConflictResolutionStrategy.OverrideWithPredictions
  • ConflictResolutionStrategy.OverrideWithAnnotations
from labelbox.schema.conflict_resolution_strategy import ConflictResolutionStrategy

send_to_annotate_params = {
    "source_project_id": source_project.uid,
    "annotations_ontology_mapping": annotation_ontology_mapping,  # to be defined
    "exclude_data_rows_in_project": False,
    "override_existing_annotations_rule": ConflictResolutionStrategy.OverrideWithPredictions,
    "batch_priority": 5,
}

# Get task queue ID for manual review
queue_id = [queue.uid for queue in destination_project.task_queues() if queue.queue_type == "MANUAL_REVIEW_QUEUE"][0]

task = client.send_to_annotate_from_catalog(
    destination_project_id=destination_project.uid,
    task_queue_id=queue_id,  # ID of workflow task, set ID to None if you want to send data rows with labels to the Done queue.
    batch_name="Clone Demo Batch",
    data_rows=lb.GlobalKeys(global_keys),  # Provide a list of global keys from source project
    params=send_to_annotate_params
)

task.wait_till_done()

print(f"Errors: {task.errors}")

Conclusion :dart:

You have successfully cloned a project and copied data rows along with their labels to a new project. This approach ensures that you can easily replicate project setups and data for various purposes. If you have any questions or run into any issues, feel free to reach out to the community for support.

Reference📚 :

Happy labeling! :tada:

1 Like

Here is an example using data row ids as data row identifier and doing a mapping of the same feature schema id from the origin and the destination project.

  • You can retrieve the feature schema ids via onto_normalized (returns a json file of the ontology)
  • You can retrieve data rows ids (or global_keys) from an export
import labelbox as lb
from labelbox.schema.conflict_resolution_strategy import ConflictResolutionStrategy

API_KEY = None
PROJECT_ID = 'clzscun5p07oh07338qo05497'
client = lb.Client(api_key=API_KEY)

project = client.get_project(PROJECT_ID)
clone_project = project.clone()

project_ontology = project.ontology()
onto_normalized = client.get_ontology(project_ontology.uid).normalized
onto_normalized['tools'][0]['featureSchemaId']

#given you are doing a 1:1 you need to map the ontology from the source project to the destination project with the same feature schema id

annotation_ontology_mapping = {"clk8ru1f8099u07yoejo913vi" : "clk8ru1f8099u07yoejo913vi"}
data_row_ids = ['clz1cvi7w02i00734bx07tdui']

send_to_annotate_params = {
    "source_project_id": project.uid,
    "annotations_ontology_mapping": annotation_ontology_mapping,
    "exclude_data_rows_in_project": False,
    "override_existing_annotations_rule": ConflictResolutionStrategy.OverrideWithPredictions,
    "batch_priority": 5,
}

# Get task id to workflow you want to send data rows. If sent to initial labeling queue, labels will be pre-labels. 
#queue_id = [queue.uid for queue in clone_project.task_queues() if queue.queue_type == "MANUAL_REVIEW_QUEUE" ][0]

task = client.send_to_annotate_from_catalog(
    destination_project_id=clone_project.uid,
    task_queue_id=None, # ID of workflow task, set ID to None if you want to send data rows with labels to the Done queue.
    batch_name="Prediction Import Demo Batch",
    data_rows=lb.DataRowIds(
        data_row_ids # Provide a list of global keys from source project
    ),
    params=send_to_annotate_params
    )

task.wait_till_done()

print(f"Errors: {task.errors}")
1 Like