Hi I’m trialing the free version of the tool as my team explores potential labeling options. I’ve integrated my account with aws and successfully created a small dataset with some images using the python sdk.
I’m interested in using your data explorer, specifically the embedding/similarity search capabilities. I’ve embedded my images into 128 dim space per https://github.com/Labelbox/labelbox-python/blob/develop/examples/basics/data_row_metadata.ipynb
I’ve also created a metadata schema for a enum in my account “label_id”.
I have a create_metadata
function for creating metadata for a row. When I don’t add the ‘label_id’ and ‘embedding’ metadata fields the dataset is appropriately uploaded and I can see it in the catalog view. When I add either of the ‘label_id’/‘embedding’ metadata fields (together or separately) the code completes successfully, but the created dataset in the catalog has nothing in it. When I dig into the code, the task from wait_til_done()
has the status ‘FAILED’ which is returned from the labelbox server.
Can you please explain what I am doing wrong?
In addition, it would be nice if the server sent back a more descriptive error status.
Below I’ve put the relevant parts of my code.
Thank you!
def create_row(lb_client, t):
metadata_fields: List[DataRowMetadataField] = create_metadata(lb_client, t)
row = {
"row_data": t.s3_web_url,
"external_id": str(uuid4()),
"metadata_fields": metadata_fields,
}
return row
def create_metadata(lb_client, t) -> List[DataRowMetadataField]:
## Fetch metadata schema ontology. A Labelbox workspace has a single metadata ontology.
metadata_ontology = lb_client.get_data_row_metadata_ontology()
# List all available fields
metadata_ontology.fields
metadata_fields = []
# Construct a metadata field of Enums options
train_schema = metadata_ontology.reserved_by_name["split"]["train"]
split_metadata_field = DataRowMetadataField(
schema_id=train_schema.parent, # specify the schema id
value=train_schema.uid, # typed inputs
)
metadata_fields.append(split_metadata_field)
label_schema = metadata_ontology.custom_by_name["label"][ClassLabels.from_guid(t.label).name.lower().replace(' ','').replace('-','')]
label_metadata_field = DataRowMetadataField(
schema_id=label_schema.parent,
value=label_schema.uid
)
metadata_fields.append(label_metadata_field)
embedding: np.ndarray = np.load(t.embedding_path)
embedding_metadata_field = DataRowMetadataField(
schema_id=metadata_ontology.reserved_by_name["embedding"].uid,
value=embedding.tolist(), # convert from numpy to list
)
metadata_fields.append(embedding_metadata_field)
return metadata_fields
rows = [create_row(lb, t) for t in list(df.itertuples())[:10]][:1]
dataset = lb.create_dataset(name="Test")
task = dataset.create_data_rows(rows)
task.wait_till_done()