Export v2 Format - Video Keyframes Information Incomplete!

I’m unable to generate valid training data from labeled video data because of incomplete/missing keyframe information in the export v2 JSON.

Our ontology has an optional radio category. We use an optional radio category to describe an activity that occurs in a classroom context. i.e. when a student walks off the frame, we choose NOT to apply a label from the respective radio category.

The keyframes data in the export JSON does not explicitly describe when a radio option is toggled on/off. Essentially, it only describes that a radio button option was clicked. I have no choice but to assume that every time a keyframe with the same category value is repeated in the export JSON, that that must be the user toggling on/off the value. That assumption is valid in most cases, but not all cases.

I captured this Loom video to demonstrate how the issue arises: Loom | Free Screen & Video Recording Software | Loom

And here’s a snippet of the problematic JSON. There should be two time periods/segments in which radio classification “A” applies: frames 1-9 and frames 15-240. Without providing the state of the radio classification button, the export v2 JSON formation lacks the necessary information for me to reconstruct these time periods/segments:

"annotations": {
  "frames": {
      "1": {
          "objects": {},
          "classifications": [
              {
                  "feature_id": "cloywq54x00053b6n7lpmmdpx",
                  "feature_schema_id": "cloyvg6200e3u072qfo24esaj",
                  "name": "Keyframe Bug Demo Radio",
                  "radio_answer": {
                      "feature_id": "cloywq54x00063b6nrw2b1mi0",
                      "feature_schema_id": "cloyvg6200e3v072qhxyk2wpc",
                      "name": "A",
                      "classifications": []
                  }
              }
          ]
      },
      "9": {
          "objects": {},
          "classifications": [
              {
                  "feature_id": "cloywq54x00053b6n7lpmmdpx",
                  "feature_schema_id": "cloyvg6200e3u072qfo24esaj",
                  "name": "Keyframe Bug Demo Radio",
                  "radio_answer": {
                      "feature_id": "cloywq54x00063b6nrw2b1mi0",
                      "feature_schema_id": "cloyvg6200e3v072qhxyk2wpc",
                      "name": "A",
                      "classifications": []
                  }
              }
          ]
      },
      "15": {
          "objects": {},
          "classifications": [
              {
                  "feature_id": "cloywq54x00053b6n7lpmmdpx",
                  "feature_schema_id": "cloyvg6200e3u072qfo24esaj",
                  "name": "Keyframe Bug Demo Radio",
                  "radio_answer": {
                      "feature_id": "cloywq54x00063b6nrw2b1mi0",
                      "feature_schema_id": "cloyvg6200e3v072qhxyk2wpc",
                      "name": "A",
                      "classifications": []
                  }
              }
          ]
      },
      "20": {
          "objects": {},
          "classifications": [
              {
                  "feature_id": "cloywq54x00053b6n7lpmmdpx",
                  "feature_schema_id": "cloyvg6200e3u072qfo24esaj",
                  "name": "Keyframe Bug Demo Radio",
                  "radio_answer": {
                      "feature_id": "cloywq54x00063b6nrw2b1mi0",
                      "feature_schema_id": "cloyvg6200e3v072qhxyk2wpc",
                      "name": "A",
                      "classifications": []
                  }
              }
          ]
      },
      "240": {
          "objects": {},
          "classifications": [
              {
                  "feature_id": "cloywq54x00053b6n7lpmmdpx",
                  "feature_schema_id": "cloyvg6200e3u072qfo24esaj",
                  "name": "Keyframe Bug Demo Radio",
                  "radio_answer": {
                      "feature_id": "cloywq54x00063b6nrw2b1mi0",
                      "feature_schema_id": "cloyvg6200e3v072qhxyk2wpc",
                      "name": "A",
                      "classifications": []
                  }
              }
          ]
      }
  },
  "segments": {
      "cloywq54x00053b6n7lpmmdpx": [
          [
              1,
              1
          ],
          [
              9,
              9
          ],
          [
              15,
              15
          ],
          [
              20,
              20
          ],
          [
              240,
              240
          ]
      ]
  },
  "key_frame_feature_map": {
      "cloywq54x00053b6n7lpmmdpx": [
          1,
          9,
          15,
          20,
          240
      ]
  },
  "classifications": []
}

The solution to this bug would be to include one of following options in the export v2 JSON:

  1. Include an attribute that describes the state of the radio category/button. i.e. if a keyframe represents a category option being selected/deselected or toggled on/off

  2. Provide a keyframe grouping for keyframes that have a repeated category option and occur in sequence

In the meantime, I feel I have no option but to execute a custom GraphQL query that I’m able to scrape from your frontend UI.

(NOTE, I struggled with the same problem here Checkbox JSON output format)

FYI. I realize I can use the Python SDK and it’s ORM to:

  1. Load the project
  2. Loop through all of the project’s batches
  3. Loop through each batch’s data rows
  4. Loop through every data row’s labels
  5. Download each label’s “frames” jsonl
  6. Loop through the frames JSON to build an accurate reflection of the labels our labelers applied

This will work, but is more complex, and I imagine will require many hundreds of API calls as opposed to the export v2 approach which only requires a couple of calls.

Here’s a bit of code I’ve come up with to get me started:

import json

import labelbox as lb
import requests

client = lb.Client(api_key=LABELBOX_API_KEY)
labelbox_project_name = LABELBOX_PROJECT_NAME

lb_project = client.get_projects(
    where=lb.Project.name == labelbox_project_name
).get_one()

for lb_batch in lb_project.batches():
    for lb_data_row in lb_batch.export_data_rows(include_metadata=True):
        for lb_label in lb_data_row.labels():
            label = json.loads(lb_label.label)

            response = requests.get(
                url=label['frames'],
                headers=client.headers
            )

            frames = []
            for json_l_line in response.iter_lines(decode_unicode=True):
                if json_l_line:
                    frames.append(json.loads(json_l_line))
            
            # Work with labeled data, frame by frame
 
2 Likes