How to use Python SDK to filter and export uncertain predictions?

Hello

I am currently working with the Labelbox Python SDK & trying to streamline the process of identifying & exporting samples where my model seems unsure. :upside_down_face: I would like to use prediction confidence / metadata to tag and group these assets for further annotation cycles but I am unsure of the best way to structure this in Labelbox via the SDK. :innocent:

Has anyone here implemented a pipeline that programmatically filters out low-confidence predictions (perhaps below a set threshold) & then exports those as a new dataset or project? :thinking:I am particularly interested in how the SDK handles metadata updates or tagging in bulk and if there are any best practices for this. Checked Label Studio Documentation — Export Annotations guide for reference. :slightly_smiling_face:

I came across the idea of using model uncertainty techniques, which made me explore concepts like what is Perplexity AI, as it relates to understanding ambiguous / complex model decisions. :thinking:

Wondering if others here are applying similar ideas to guide labeling strategies in Labelbox? :thinking:

:slightly_smiling_face:
Thank you !!

Hey @ricaxa ,

You could use metadata per data rows and insert update, then if you do a subsequent prediction import.

Once you have establish this you can retrieve this the metadata via the export:

{
  "data_row": {
    "id": "clv28zgl61ndb07694b9ws3op",
    "row_data": "mirror3@mirrortd3bot"
  },
  "media_attributes": {
    "asset_type": "text",
    "mime_type": "text/plain"
  },
  "metadata_fields": [
    {
      "schema_id": "clv25mgnq00f007yz9630fe5j",
      "schema_name": "Confidence mean avg",
      "schema_kind": "number",
      "value": [
        {
          "schema_id": "clv25mgnr00f207yz7icseiue",
          "schema_name": "0.9",
          "schema_kind": "number"
        }
      ]
    }
  ]
}

retrieve the value and send to a project if you want to, now I need to clarify that predictions can only be imported to either a project or a model run.
So in order for your pipeline to work it would be something like:

→ Import data rows with a set of metadata (I would determined if there is a need to send the data based on that to avoid sending too much data).

→ Export and parse the confidence metadata

→ Create a project with those data rows + send the predictions (as pre-annotation)

→ Have some users correct the predictions

Hope this helps!

Many thanks,
PT

1 Like