Data latency of dataset.export_data_rows()

mathias.ronnlund · June 29, 2022, 2:55pm

We have some code that syncs new images to Labelbox. It gets the files from the source storage, gets the files in Labelbox by using dataset.export_data_rows() and adds the missing files.

We sometimes get duplicates in the Labelbox dataset, which seems to come from the api not returning the “real-time” situation byt having some lag.

When we create the datarows in the dataset we set the files original url (in Azure blob storage) to the External id -field. This helps us compare the source blob storage to what is already in Labelbox.

Is the update latency for Labelbox api known and is there a way to make this shorter, preferably real-time? Or is there some better way to accomplish this?

rfekry · June 29, 2022, 4:29pm

Hello!

I’m Ramy from Labelbox Support! I would like to help with the issue here. I see that you are having some issues with the number of data rows not being up to date with the recently appended data rows. I have a couple of suggestions for you:

I would suggest using the two lines of code below to create and upload data, the second line will help make sure that the upload has been fully completed when it’s time to pull the data rows.

task = dataset.create_data_rows(assets)
task.wait_till_done()

In terms of using dataset.export_data_rows() to process data in the SDK I would highly recommend using dataset.data_rows() instead. There should not be latency between uploading and the list of rows pulled showing the data rows. Was there a particular reason why you chose to use dataset.export_data_rows()?

Thank you!

mathias.ronnlund · August 4, 2022, 1:14pm

I can confirm that data_rows() is working. Not sure why I used export_data_rows().

Topic		Replies	Views
Getting delay in data row processing Catalog import , data-row	3	278	June 27, 2023
Very slow export times via export_v2 Python SDK exports	6	132	April 16, 2024
Data Upload Stucked in Data Row processing on Free Tier Catalog import	4	306	June 21, 2023
How to: Parallel Operations with Labelbox SDK How To exports , data-row	0	184	January 12, 2024
Labels API timeout and labels disappeared from GUI Python SDK exports , datasets , annotations	6	282	September 20, 2023

Data latency of dataset.export_data_rows()

Related Topics