Data upload fails silently on Enum and Embeddings

blake · September 2, 2022, 7:26pm

Hi I’m trialing the free version of the tool as my team explores potential labeling options. I’ve integrated my account with aws and successfully created a small dataset with some images using the python sdk.

I’m interested in using your data explorer, specifically the embedding/similarity search capabilities. I’ve embedded my images into 128 dim space per https://github.com/Labelbox/labelbox-python/blob/develop/examples/basics/data_row_metadata.ipynb

I’ve also created a metadata schema for a enum in my account “label_id”.

I have a create_metadata function for creating metadata for a row. When I don’t add the ‘label_id’ and ‘embedding’ metadata fields the dataset is appropriately uploaded and I can see it in the catalog view. When I add either of the ‘label_id’/‘embedding’ metadata fields (together or separately) the code completes successfully, but the created dataset in the catalog has nothing in it. When I dig into the code, the task from wait_til_done() has the status ‘FAILED’ which is returned from the labelbox server.

Can you please explain what I am doing wrong?

In addition, it would be nice if the server sent back a more descriptive error status.

Below I’ve put the relevant parts of my code.

Thank you!


def create_row(lb_client, t):
    metadata_fields: List[DataRowMetadataField] = create_metadata(lb_client, t)
    row =  {
        "row_data": t.s3_web_url,
        "external_id": str(uuid4()),
        "metadata_fields": metadata_fields,
    }
    return row

def create_metadata(lb_client, t) -> List[DataRowMetadataField]:
    ## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
    metadata_ontology = lb_client.get_data_row_metadata_ontology()

    # List all available fields
    metadata_ontology.fields
    metadata_fields = []

    # Construct a metadata field of Enums options
    train_schema = metadata_ontology.reserved_by_name["split"]["train"]
    split_metadata_field = DataRowMetadataField(
        schema_id=train_schema.parent,  # specify the schema id
        value=train_schema.uid,  # typed inputs
    )
    metadata_fields.append(split_metadata_field)

    label_schema = metadata_ontology.custom_by_name["label"][ClassLabels.from_guid(t.label).name.lower().replace(' ','').replace('-','')]
    label_metadata_field = DataRowMetadataField(
        schema_id=label_schema.parent,
        value=label_schema.uid
    )
    metadata_fields.append(label_metadata_field)        

    embedding: np.ndarray = np.load(t.embedding_path)
    embedding_metadata_field = DataRowMetadataField(
            schema_id=metadata_ontology.reserved_by_name["embedding"].uid,
            value=embedding.tolist(),  # convert from numpy to list
        )
    metadata_fields.append(embedding_metadata_field)
    return metadata_fields

rows = [create_row(lb, t) for t in list(df.itertuples())[:10]][:1]
dataset = lb.create_dataset(name="Test")
task = dataset.create_data_rows(rows)
task.wait_till_done()

Topic		Replies	Views
Global keys not found in batch update with python SDK Python SDK datasets , data-row	2	372	March 29, 2023
Issues with uploading text from a .csv file Python SDK import , datasets , data-row	3	372	October 27, 2023
MALPredictionImport.create_from_objects failing without producing error codes aside from AnnotationImportState.Failed Python SDK import , annotations	5	50	November 8, 2024
AWS Integration not working on new data rows Catalog data-row	5	306	October 18, 2023
Export_v2 fails Python SDK exports , import	5	437	May 26, 2023

Data upload fails silently on Enum and Embeddings

Related topics