How to: Update Datasets Integration

How to: Update Datasets Integration

In this guide, we will walk you through the steps to update all datasets that use a particular IAM integration. This is useful when the IAM integration is going to be modified, and as a result, it needs to be updated across all connected datasets.

Steps Overview

  1. Retrieve the IAM integration object Source based on its name or known ID.
  2. Retrieve the IAM integration object Target based on its name or known ID.
  3. Retrieve the datasets connected to the IAM integration Source from step 1.
  4. Update the IAM integration of all the datasets from step 3 using add_iam_integration and the IAM integration Target from step 2.
  5. Full script

Prerequisites

  • Ensure you have the Labelbox Python SDK installed (3.72.0 or above).
  • Obtain your API key from the Labelbox account settings. see Labelbox documentation
  • The name or ID of both the Source and Target IAM integrations you intend to update.

Step 1: Retrieve the IAM Integration Object Source

First, we need to retrieve the IAM integration object using its name or ID. Here’s how you can do it:

import labelbox as lb

# Initialize the Labelbox client
client = lb.Client(api_key="YOUR_API_KEY")

# Retrieve the organization object
organization = client.get_organization()

# Retrieve IAM integration Source by name (only if name is unique)
iam_integrations = organization.get_iam_integrations()
source_iam_integration = next(iam for iam in iam_integrations if iam.name == "SOURCE_IAM_NAME")

# Alternatively, retrieve IAM integration Source by ID
# source_iam_integration = next(iam for iam in iam_integrations if iam.uid == "SOURCE_IAM_ID")

print(f"Retrieved Source IAM Integration: {source_iam_integration}")

Step 2: Retrieve the IAM Integration Object Target

Similarly, retrieve the IAM integration object for the target IAM integration using its name or ID:

# Retrieve IAM integration Target by name (only if name is unique)
target_iam_integration = next(iam for iam in iam_integrations if iam.name == "TARGET_IAM_NAME")

# Alternatively, retrieve IAM integration Target by ID
# target_iam_integration = next(iam for iam in iam_integrations if iam.uid == "TARGET_IAM_ID")

print(f"Retrieved Target IAM Integration: {target_iam_integration}")

Step 3: Retrieve the Datasets Connected to the IAM Integration Source

Next, we need to find all datasets that are connected to the retrieved Source IAM integration:

:clock1: Note that this step may take a significant amount of time.

# Fetch all datasets
datasets = client.get_datasets()

# Filter datasets connected to the Source IAM integration
connected_datasets = [
    dataset for dataset in datasets 
    if dataset.iam_integration() and dataset.iam_integration().uid == source_iam_integration.uid
]

# Optional - List all connected datasets
# for dataset in connected_datasets:
#     print(f"Dataset ID: {dataset.uid}, Dataset Name: {dataset.name}")

Step 4: Update the IAM Integration for All Datasets

Finally, we will update the IAM integration for all datasets retrieved in step 3 using the add_iam_integration method with the Target IAM integration:

:clock1: Note that this step may take a significant amount of time.

# Update the IAM integration for each dataset
for dataset in connected_datasets:
    try:
        dataset.add_iam_integration(target_iam_integration)
    except lb.exceptions.LabelboxError:
        print(f"Can't update integration for Dataset ID {dataset.uid}")

Full Script

Here is the complete script that combines all the steps mentioned above:

import labelbox as lb

# Initialize the Labelbox client
client = lb.Client(api_key="YOUR_API_KEY")

# Step 1: Retrieve IAM Integration Source
organization = client.get_organization()
iam_integrations = organization.get_iam_integrations()
source_iam_integration = next(iam for iam in iam_integrations if iam.name == "SOURCE_IAM_NAME")

# Alternative way to retrieve by ID
# source_iam_integration = next(iam for iam in iam_integrations if iam.uid == "SOURCE_IAM_ID")

print(f"Retrieved Source IAM Integration: {source_iam_integration}")

# Step 2: Retrieve IAM Integration Target
target_iam_integration = next(iam for iam in iam_integrations if iam.name == "TARGET_IAM_NAME")

# Alternative way to retrieve by ID
# target_iam_integration = next(iam for iam in iam_integrations if iam.uid == "TARGET_IAM_ID")

print(f"Retrieved Target IAM Integration: {target_iam_integration}")

# Step 3: Retrieve Datasets Connected to IAM Integration Source
datasets = client.get_datasets()
connected_datasets = [
    dataset for dataset in datasets 
    if dataset.iam_integration() and dataset.iam_integration().uid == source_iam_integration.uid
]

# List all connected datasets
for dataset in connected_datasets:
    print(f"Dataset ID: {dataset.uid}, Dataset Name: {dataset.name}")

# Step 4: Update IAM Integration for All Datasets
for dataset in connected_datasets:
    try:
        dataset.add_iam_integration(target_iam_integration)
    except lb.exceptions.LabelboxError:
        print(f"Can't update integration for Dataset ID {dataset.uid}")

Conclusion

By following these steps, you can update the IAM integration for all datasets connected to a particular IAM integration. This ensures that your data rows remain accessible using up-to-date IAM configurations.

For more detailed information, refer to the Labelbox documentation.

2 Likes