Rich text pdfs - custom text layer

Hello,
I am trying to use the platform to annotate pdf files, which are already true pdfs/rich text. However when uploading them as documents, Labelbox performs OCR which is not ideal (as there is also a 15 page limit). One alternative is to extract the text and import as text data rows, but then I lose all the formatting. I have been trying to generate a custom text layer in Python but can’t get it to match the Labelbox format, it always says the layer is invalid even though the metadata has “valid = True”. The sample scripts they provide only accept OCR outputs as inputs, not pdfs with rich text. Any suggestions?

Thanks

Hello there!

Currently, we only support OCR-generated forms. Your custom textLayer should conform to the schema outlined in our documentation at JSON schema Reference.

Thanks!