How to: Update Datasets Integration
In this guide, we will walk you through the steps to update all datasets that use a particular IAM integration. This is useful when the IAM integration is going to be modified, and as a result, it needs to be updated across all connected datasets.
Steps Overview
- Retrieve the IAM integration object Source based on its name or known ID.
- Retrieve the IAM integration object Target based on its name or known ID.
- Retrieve the datasets connected to the IAM integration Source from step 1.
- Update the IAM integration of all the datasets from step 3 using
add_iam_integration
and the IAM integration Target from step 2. - Full script
Prerequisites
- Ensure you have the Labelbox Python SDK installed (3.72.0 or above).
- Obtain your API key from the Labelbox account settings. see Labelbox documentation
- The name or ID of both the Source and Target IAM integrations you intend to update.
Step 1: Retrieve the IAM Integration Object Source
First, we need to retrieve the IAM integration object using its name or ID. Here’s how you can do it:
import labelbox as lb
# Initialize the Labelbox client
client = lb.Client(api_key="YOUR_API_KEY")
# Retrieve the organization object
organization = client.get_organization()
# Retrieve IAM integration Source by name (only if name is unique)
iam_integrations = organization.get_iam_integrations()
source_iam_integration = next(iam for iam in iam_integrations if iam.name == "SOURCE_IAM_NAME")
# Alternatively, retrieve IAM integration Source by ID
# source_iam_integration = next(iam for iam in iam_integrations if iam.uid == "SOURCE_IAM_ID")
print(f"Retrieved Source IAM Integration: {source_iam_integration}")
Step 2: Retrieve the IAM Integration Object Target
Similarly, retrieve the IAM integration object for the target IAM integration using its name or ID:
# Retrieve IAM integration Target by name (only if name is unique)
target_iam_integration = next(iam for iam in iam_integrations if iam.name == "TARGET_IAM_NAME")
# Alternatively, retrieve IAM integration Target by ID
# target_iam_integration = next(iam for iam in iam_integrations if iam.uid == "TARGET_IAM_ID")
print(f"Retrieved Target IAM Integration: {target_iam_integration}")
Step 3: Retrieve the Datasets Connected to the IAM Integration Source
Next, we need to find all datasets that are connected to the retrieved Source IAM integration:
Note that this step may take a significant amount of time.
# Fetch all datasets
datasets = client.get_datasets()
# Filter datasets connected to the Source IAM integration
connected_datasets = [
dataset for dataset in datasets
if dataset.iam_integration() and dataset.iam_integration().uid == source_iam_integration.uid
]
# Optional - List all connected datasets
# for dataset in connected_datasets:
# print(f"Dataset ID: {dataset.uid}, Dataset Name: {dataset.name}")
Step 4: Update the IAM Integration for All Datasets
Finally, we will update the IAM integration for all datasets retrieved in step 3 using the add_iam_integration
method with the Target IAM integration:
Note that this step may take a significant amount of time.
# Update the IAM integration for each dataset
for dataset in connected_datasets:
try:
dataset.add_iam_integration(target_iam_integration)
except lb.exceptions.LabelboxError:
print(f"Can't update integration for Dataset ID {dataset.uid}")
Full Script
Here is the complete script that combines all the steps mentioned above:
import labelbox as lb
# Initialize the Labelbox client
client = lb.Client(api_key="YOUR_API_KEY")
# Step 1: Retrieve IAM Integration Source
organization = client.get_organization()
iam_integrations = organization.get_iam_integrations()
source_iam_integration = next(iam for iam in iam_integrations if iam.name == "SOURCE_IAM_NAME")
# Alternative way to retrieve by ID
# source_iam_integration = next(iam for iam in iam_integrations if iam.uid == "SOURCE_IAM_ID")
print(f"Retrieved Source IAM Integration: {source_iam_integration}")
# Step 2: Retrieve IAM Integration Target
target_iam_integration = next(iam for iam in iam_integrations if iam.name == "TARGET_IAM_NAME")
# Alternative way to retrieve by ID
# target_iam_integration = next(iam for iam in iam_integrations if iam.uid == "TARGET_IAM_ID")
print(f"Retrieved Target IAM Integration: {target_iam_integration}")
# Step 3: Retrieve Datasets Connected to IAM Integration Source
datasets = client.get_datasets()
connected_datasets = [
dataset for dataset in datasets
if dataset.iam_integration() and dataset.iam_integration().uid == source_iam_integration.uid
]
# List all connected datasets
for dataset in connected_datasets:
print(f"Dataset ID: {dataset.uid}, Dataset Name: {dataset.name}")
# Step 4: Update IAM Integration for All Datasets
for dataset in connected_datasets:
try:
dataset.add_iam_integration(target_iam_integration)
except lb.exceptions.LabelboxError:
print(f"Can't update integration for Dataset ID {dataset.uid}")
Conclusion
By following these steps, you can update the IAM integration for all datasets connected to a particular IAM integration. This ensures that your data rows remain accessible using up-to-date IAM configurations.
For more detailed information, refer to the Labelbox documentation.