Using markdown to set up a text project to create a comparison data annotation task

pedram · January 23, 2024, 8:54pm

I’m setting up a project to create a comparison dataset which will eventually be used in RLHF, evaluation purposes (no model training,) or similar scenarios. Task is a simple standard pairwise comparison task: there’s a question and two responses to the question (generated either by human or different models). I want to show the question and responses in markdown format where the question is shown on top and there’s a table with two columns, each for one response (generally speaking, I want to use my markdown style for showing the text data for annotation).

Question is, can I do this in a Text Annotate project type? And generally, how can I import a markdown data for text project so that it can be rendered for annotation? I was thinking about HTML as an alternative, but my understanding is that we can’t import direct HTML string as data row?

PT · January 25, 2024, 12:07pm

Hey @pedram ,

Markdown would need to be pass as HTML correct, our text based editor would not render this natively.
And also correct with the file.

If you have a sample you would like us to take a look at feel free to share more of your use case!

Many thanks,
PT

pedram · January 25, 2024, 11:47pm

Thanks. Related to what you mentioned, I converted my markdown content to standard HTML content. Now I’m trying to import the HTML data to Catalog. For that, I set up an Azure integration since I’m reading the HTML files from my storage account (and it says: This integration is configured to read [url to my Azure container], so the integration validation is successful) so looks like the connection to Azure is set. Then I uploaded my *.html files to the Azure container and created assets based on the following format:

data_row = {
        "row_data": blob_url_to_html_file, 
        "global_key": a_global_key
 }

where my blobs URLs are in the following pattern:

https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>

Then created a dataset on Labebox using the assets. Now the problem is that the dataset gets created but the data is not shown on Labebox (looks like it can’t read content from HTML files uploaded to Azure.) I tested this with the sample URLs to HTML files on your website, and for those URLs, I see the HTML content. So I guess this might be a permission issue with Azure? I’ve exactly followed the steps here. So I wonder what I’m missing?

[UPDATE-1] Never mind, there was an issue with my HTML format, Azure integration is working fine.

[UPDATE-Final!] Issue solved but writing this for the future in case anyone else had the same issue: the problem did not have anything to do with my HTML format or Labelbox. When I was uploading my HTML files using Azure Python SDK to my container, apparently I should have specified the content type as well. For example:

from azure.storage.blob import ContentSettings

blob_client = container_client.get_blob_client(blob_name)
content_type = "text/html"
blob_client.upload_blob(blob_content, blob_type="BlockBlob", overwrite=True,
                                 content_settings=ContentSettings(content_type=content_type))

PT · January 26, 2024, 9:26am

Hey @pedram ,

Good to hear your effort paid off! I’m guessing your blobs may have some content type header that may have differed from the file extension :

Labelbox will take the file extension to set the type but it may happen that the header prevent us to enforce that.
Good mentioned on how to overwrite this programmatically for Azure blobs!

Many thanks,
PT

Topic		Replies	Views
Items matching project Annotate import	2	263	April 17, 2024
Batch with annotations Python SDK import , annotations	1	30	July 21, 2025
How can I use labelbox for row-based text classification? Annotate data-row	1	381	April 19, 2024
[Annotate] QUESTION - How does Labelbox enable annotation of semi-structured data? Annotate	2	356	April 17, 2024
How to conditionally create a new batch using an existing batch in Catalog to Annotate Python SDK data-row , annotations	3	178	March 13, 2024

Using markdown to set up a text project to create a comparison data annotation task

Related topics