Named Entitiy labels are not exported from Document, only the Bounding Boxes

jerome.massot.78 · November 23, 2022, 12:20am

The payload contains only the bounding boxes information and the link to the temporary location of the pdf used for the labeling.

The entities objects are missing.

And I cannot see an easy solution to extract the labels using the entity characters index of the text entities when I will have them if I have only the pdf file.

The platform should provide the labeled entities using the classical SPAN format, with at minimum the following information:

text
category
start
end

By the way, the documentation page about the bounding boxes payload format seems deprecated when I compare with the json file I get with the API call.

Thanks

Jerome

Topic		Replies	Views
Labelbox should use PDF.js not Google Document AI Annotate	8	357	April 3, 2024
How to export labeled text in a pdf file Annotate exports , data-row	4	231	May 10, 2024
Request for Document Annotation - Bounding Box Method Annotate annotations	4	266	April 8, 2024
Rich text pdfs - custom text layer Python SDK import , data-row	1	64	August 1, 2024
OCR textract Annotate	17	220	December 30, 2024

Named Entitiy labels are not exported from Document, only the Bounding Boxes

Related topics