Access Denied when creating data rows via SDK; rows successfully imported after reprocessing

ts · August 15, 2024, 1:40am

I am consistently unable to import Document data rows via create_data_rows in the Python SDK. All of the rows show the following error:
Access Denied: The data row could not be fetched because the user does not have access to it.

When I manually reprocess the rows, the access error goes away and the documents are successfully imported.

In my workspace settings, the associated bucket roles are indeed connected. I have double checked the associated policies, and they are also correct, and should theoretically allow access to all objects contained within the aforementioned bucket.

I have also confirmed that the URL is correct, does not contain any spaces or invalid characters.

Any thoughts? This is having an impact on our ability to automate labeling on a large scale.

PT · August 15, 2024, 8:30am

Hi @ts
Are the CORS policy good too? There is a retry mechanism in case failure, who you is your cloud provider and what type of asset are you using?

ts · August 15, 2024, 4:39pm

I have confirmed that the CORS policy is in place. I am using AWS and the assets are PDF documents. The PDFs are relatively small, so we’re not bumping up against size or page limits.

ts · August 15, 2024, 4:48pm

Can you also clarify what you mean by the retry mechanism? I can’t seem to find anything in the docs or the labelbox-python github repository.

PT · August 15, 2024, 4:53pm

Every time you import data to Labelbox will try to reach the data 3 times this is a backend process.
If you could leave one data row in an error state and reply here with the data row id I can take a look.

ts · August 15, 2024, 4:58pm

Thanks for the prompt response! I’ve got a data row right here for you:

id=clzviso9k3cvf0769dcgjq7ix
global_key=b6bf5ae5-eaa3-4595-bdc6-e99d8c60fa76

Is there any other information that you need in order to identify it?

PT · August 15, 2024, 6:25pm

I found the error, given we generate a layer to annotate PDF seems there is an issue here, I’m taking this for review internally, we can probably stop generating the layer if this helps? Let me know.

ts · August 15, 2024, 6:27pm

Interesting, so it’s the text layer that’s causing trouble? How can I disable generation?

PT · August 15, 2024, 6:28pm

You can’t but we can if you want.

ts · August 15, 2024, 6:29pm

If you disable it, will I still be able to provide my own custom text layer? I had been planning on doing that eventually once my internal OCR pipeline is working. If so, I’m fine with you disabling it for the purposes of unblocking me.

PT · August 15, 2024, 6:43pm

Yep you can still provide your layer! and I disabled the auto generation.
Give it a try to see if this works better.

ts · August 15, 2024, 6:46pm

Fantastic, it worked! Thank you very much!

ts · August 15, 2024, 6:50pm

Once the issue is resolved internally, would it be possible to re-enable text layer geneneration and to notify me? Another question, actually: is this setting on the account level? My coworkers will be using the workflow I am developing and I’d like for them to avoid running into the same error state.

PT · August 15, 2024, 9:29pm

Yes and yes we will looking into it, and this is at the workspace level so this covers everyone.

Topic		Replies	Views
AWS integration stopped working Catalog datasets , data-row	6	343	October 17, 2023
IAM integration status=Connected, but access is denied. Same bucket worked a few months ago Catalog annotations	6	35	October 30, 2024
Trouble accessing AWS S3 bucket data Python SDK	2	271	November 27, 2023
AWS Integration Error : Access Denied When Uploading Images Python SDK import , data-row	3	74	July 23, 2024
AWS Integration not working on new data rows Catalog data-row	5	316	October 18, 2023

Access Denied when creating data rows via SDK; rows successfully imported after reprocessing

Related topics