I am consistently unable to import Document data rows via create_data_rows in the Python SDK. All of the rows show the following error: Access Denied: The data row could not be fetched because the user does not have access to it.
When I manually reprocess the rows, the access error goes away and the documents are successfully imported.
In my workspace settings, the associated bucket roles are indeed connected. I have double checked the associated policies, and they are also correct, and should theoretically allow access to all objects contained within the aforementioned bucket.
I have also confirmed that the URL is correct, does not contain any spaces or invalid characters.
Any thoughts? This is having an impact on our ability to automate labeling on a large scale.
I have confirmed that the CORS policy is in place. I am using AWS and the assets are PDF documents. The PDFs are relatively small, so we’re not bumping up against size or page limits.
Every time you import data to Labelbox will try to reach the data 3 times this is a backend process.
If you could leave one data row in an error state and reply here with the data row id I can take a look.
I found the error, given we generate a layer to annotate PDF seems there is an issue here, I’m taking this for review internally, we can probably stop generating the layer if this helps? Let me know.
If you disable it, will I still be able to provide my own custom text layer? I had been planning on doing that eventually once my internal OCR pipeline is working. If so, I’m fine with you disabling it for the purposes of unblocking me.
Once the issue is resolved internally, would it be possible to re-enable text layer geneneration and to notify me? Another question, actually: is this setting on the account level? My coworkers will be using the workflow I am developing and I’d like for them to avoid running into the same error state.