Are there any plans for adding webhooks for when batches are created in the Labelbox app?
We want to automate ‘image layer’ creation / upload when a data row is queued in a batch. This is easy enough to do when batching is done from the SDK, but we plan to use the app for manual batching so we can take advantage of visual selection and similarity searches. However, in the app, nothing will trigger the external computation to create the image overlays and attach them to their respective data row for the Annotator tool.
Creating and uploading image layers could happen when data is first uploaded onto Labelbox into ‘datasets’ - as we plan to do this via the SDK - but we don’t expect to batch/queue all our uploaded data for labeling, thus there will be lots of wasted computation and memory in generating the extra image layers that are never utilized. Some of our image layers require simulation and rendering which would best be done on an as need basis.
If webhooks were implemented for batch creation, our automation tools would know exactly when to create & upload the image layers for what data rows. As opposed to constantly polling via the SDK or GraphQL.
Hi @ceubel, given a batch is created during the time of submission to a labeling project, is there any concern that the processing may take time while labeling has begun?
Adding to this - knowing what data rows are queued in a given batch could also be useful for inferencing and submitting ‘Model-Assisted Labels’
Hey @manu , good point. This is a fair concern and I will need to think more about the trade-offs - especially considering those processes that may take a bit.
However, I think this would only be an issue if the labeling queue is empty or if we select a high priority for the newest batch and the labelers are actively labeling. Otherwise, there would be enough time to compute & update the information for most, if not all, data rows batched into the queue.
Of course, the tradeoff comes down to each user’s computational costs vs computational time for each process (image layer, MAL, etc.).
I think doing this processing at dataset upload would be feasible for us, for now. But I think batch webhooks could still be a useful feature for added flexibility, in the future.
@ceubel We are about to release a new feature that may solve this problem albeit in a different way that you proposed.
Upcoming features and workflows:
1.Create and save a slice in Catalog. A slice is essentially a saved search filter. The results of the slice are computed in real-time.
3. Export data rows from a slice via python SDK
4. Apply a metadata to any data rows within the UI… Select a slice, select all data rows and apply a metadata
Instead of creating a batch first, you could first apply metadata to data rows that you want to submit as a batch. Use slice feature to dynamically filter data rows that contains those data rows. Use SDK to setup a cron job to look for any new data rows that do not have image layers…
Ah interesting, I think this would work well! Thank you
Will ‘slice’ become a new Labelbox object; something that I can query from the SDK to see all created slices to detect new ones?
Or, would I need to rely on a ad-hoc metadata tag to mark my ‘batch’? Say a metadata field called ‘batches’ and then the value is a string with date I created the slice + some ID that I know how to decode on SDK end. I could even update the metadata field from the SDK to show that ‘batch’ had been processed.
Speaking of metadata - I know there is a limit to how many fields you can create, but is there a limit to how many values you can have for 1 field?
Many great points in this thread!
We don’t expect to batch/queue all our uploaded data for labeling
You’re 100% right! Labelbox is all about helping your prioritize the right data to label, so we’re building the product such that stream all your data to Labelbox, and then decide what subset to label in priority
I think batch webhooks could still be a useful feature for added flexibility, in the future
Yes! Would you say your need is to
- access the list of data row ids in a batch, given a batch ID
- and/or a webhook to detect when a batch is created
In any case, what exactly do you need to access? The data row IDs? Anything else?
Will ‘slice’ become a new Labelbox object
Yes! A Slice will be a Labelbox object that you can access and edit in the Catalog UI. It will have a Slice ID. You will be able to export the Slice, based on its Slice ID, through the SDK.
Would I need to rely on a ad-hoc metadata tag to mark my ‘batch’
The solution you will be able to do in 6 weeks is:
- mark your batch with a metadata tag
- create a Slice that keeps/filters only the data with this tag
- export this Slice through the SDK
The solution you will be able to do later is:
- access the batches through the SDK
- webhooks to notify you when a new batch is created
Thank you for your thorough reply! We are looking forward to all the features to come. It looks like we will several options to implement what we want.
To answer your question - yes we would need to know the data row IDs the batch name/ID they’re attached to. It would also be helpful to have metadata like batch creation datetime, batch priority, maybe who created it, and the project that batch is sent to; in case we are doing any project-specific operations on the data-row. With the project ID and data row IDs, it seems like we would have all the info needed to do whatever we want for that batch.
Thanks for the video!
I think this is a very feasible workaround and we hope to implement it soon!
For now, we have it set up to do everything at dataset upload and will just have to bare a bit of extra computation