Invalid after data rows are pulled directly from a project (MAL)

Hello, I am using the MAL pipeline to upload labels from an ML model. My pipeline is:

  • Find project with queued data rows and pull them
  • download any extra images
  • create annotation payload for each image
  • upload each label

When I upload (setting the value for the annotation to the id value pulled from the project) I get an invalid Error. When I go into labelbox and search for the id corresponding to the project however, I find it. Any advice on what the correct id format should be?

Here is the code:

import os
import requests
import queries as q
import psycopg2
import pandas as pd
from import storage
import torch
from PIL import Image
import cv2
import labelbox as lb
import as lb_types
import uuid
import numpy as np

storage_client = storage.Client()
API_KEY = os.environ['LABELBOX_API_KEY']
client = lb.Client(API_KEY)

os.makedirs('images', exist_ok=True)
out_paths = []

# Load the pre-trained YOLOv5 model
project = client.get_project('project id')
data_rows = project.export_queued_data_rows()

for image_dict in data_rows:
    image_url = image_dict['rowData']
    image_name = image_dict['externalId']

    if not os.path.exists('images/' + image_name):
        response = requests.get(image_url)
        if response.status_code == 200:
            with open('images/' + image_name, 'wb') as f:
                print(f"Successfully downloaded image {image_name}")
            print(f"Failed to download image {image_name}")

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True, force_reload=True)
label_export = []
datarows = []
i = 0

# build dataset to upload to labelbox
for file in os.listdir('images'):

    idval = False
    for a in data_rows:
        if a['externalId'] == file:
            idval = a['id']
    if not idval:

    # Run object detection on an image
    image ='images/' + file)
    results = model(image)

    # annotation labels
    annotations = []
    boxes = results.xyxy[0].cpu().numpy()

    for box in boxes:
        x1, y1, x2, y2 = box[:4].astype(float)
            'name': 'Human',
            'classifications': [],
            'bbox': {'top': x1, 'left': y1, 'height': y2-y1, 'width': x2-x1},
            'confidence': 0.5,
            'dataRow': {'id': idval}

# Upload MAL label for this data row in project
upload_job = lb.MALPredictionImport.create_from_objects(
    project_id='project id',
    name="mal_job" + str(uuid.uuid4()),


print("Errors:", upload_job.errors)
print("Status of uploads: ", upload_job.statuses)

and here is the error:

Errors: [{'uuid': 'f45d6b84-622c-4051-a487-a7e463903398', 'dataRow': {'id': 'clex13pci049207b52ejg4mkm', 'globalKey': None}, 'status': 'FAILURE', 'errors': [{'name': 'DataRowNotFound', 'message': ' clex13pci049207b52ejg4mkm invalid', 'additionalInfo': None}]},

I’m using the nd_json method. Confused as to why the id i pull from the project itself would not be found/invalid – unless its just not the same id.

After a lot of testing I’m thinking the dataRow id might be broken in some way

Hmm, thanks for reporting. While we investigate this and try to reproduce, I recommend using global keys as a way to identify data rows. You can create dataset, submit batch, upload MAL labels, Import GT labels using global keys. Global keys

Also, you could try using the new exporter and see if you get same error: Export from projects v2 (beta)

unfortunately I initially tried the global_key value. I think our global keys may not have been set on the initial data upload because when searching on the UI or pulling the data from queued there is no global_key available

same error from this method

Nevermind. I accidentally used a different project name for the download and upload. My apologies. Please feel free to close this thread!