Forbidden error on AWS integration while ROLE and External ID has configured

Hi,

I am using the Labelbox integration tool to create a dataset from an AWS S3 Bucket. I have set the Integration tool successfully thanks to this documentation, and checking the connection shows both Role successfully assumed and External ID configured securely.

However, when I try to add the dataset using the created JSON file (see below), it seems it generates the data-id (the hashes), but it gets a ‘Forbidden’ error on all of them. I have tested with a more straightforward JSON entry with the same result.

Here is the sample JSON file:

[
    {
        "externalId": "v11261171",
        "videoUrl": "https://my-dataset.s3.amazonaws.com/v_11261171_A+B.mp4"
    },
    {
        "externalId": "v115483513",
        "videoUrl": "https://my-dataset.s3.amazonaws.com/v_115483513_C+D.mp4"
    }
]

I have followed the documentation on the Policy, Role, and CORS. I also simulate access using PolicySim, and it seems to be working (according to the below image):

As additional information: Here is the JSON of each element:

The Policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": "arn:aws:s3:::nxxxxx-dataset/*"
        }
    ]
}

The Role:


{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::3406xxxxxxx2:root"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxh"
                }
            }
        }
    ]
}

And the CORS is copied from the documentation as follows:

[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "GET"
        ],
        "AllowedOrigins": [
            "https://app.labelbox.com",
            "https://editor.labelbox.com"
        ],
        "ExposeHeaders": []
    }
]

The only point I’m suspicious of is the region where S3 exists. My free root account has been on the Global region, and STS has been enabled. However, as far as I understand, the S3 bucket is in US-East-1, while in the documentation, it only needs the US-East-2 region to be enabled. I have already checked that, and US-East-2 is activated in the STS. But I’m not sure if I need to migrate the current S3 bucket exactly to the US-EAST-2 as well.

Hi @Didas,

Thanks for posting to the Community page!

In the sample JSON file you provided, there are unaccepted characters following the / after the amazonaws.com portion of the URL.

Characters such as /, +. and spaces cannot appear after the / that follows amazonaws.com, as this will throw off the decoding of the URL. In the case of your example, you have a + character in an unacceptable location.

Hope this helps!

1 Like

Thanks @Zeke,

I addressed that issue. Indeed there were spaces and other characters in the name (collected in the wild).

However, even as I reduce the dataset to a 1 sample dataset (video) as follows:

[
    {
        "externalId": "vimeo_11261171.mp4",
        "videoUrl": "https://the-dataset.s3.amazonaws.com/vimeo_11261171.mp4"
    }
]

I still can not access the video as I get following issue with dataset check connection:

My Amazon S3 role/policy has selected and seems to work correctly. The following is the integration connection check with both:

  • Role successfully assumed
  • External ID configured securely

I have no special ACL permission (other than my account access) in the S3 bucket. Don’t know if the Policy/role defined according to the documentation is enough or not. But I’m even wiling to publish the dataset publicly if this helps easy access by labelbox.

Another thing is the json url. In the doc template it has requested the part as well, while in my link there is no region (based on s3 object url). My s3 bucket region is US East (N. Virginia) us-east-1 but not sure how to put it in the URL if that helps though based on S3 object information, the region is not included in the URL (reflected in json sample above)

@Zeke Thanks for your feedback! Indeed the space and some special characters where problematic (such as ! and -, etc). This is a video dataset collected in the wild and the filenames are titles. I have created a test data and now it connect but it has some Failed Fetch error which connection check shows OK. I think that’s another issue and I will follow it there.