Using markdown to set up a text project to create a comparison data annotation task

I’m setting up a project to create a comparison dataset which will eventually be used in RLHF, evaluation purposes (no model training,) or similar scenarios. Task is a simple standard pairwise comparison task: there’s a question and two responses to the question (generated either by human or different models). I want to show the question and responses in markdown format where the question is shown on top and there’s a table with two columns, each for one response (generally speaking, I want to use my markdown style for showing the text data for annotation).

Question is, can I do this in a Text Annotate project type? And generally, how can I import a markdown data for text project so that it can be rendered for annotation? I was thinking about HTML as an alternative, but my understanding is that we can’t import direct HTML string as data row?

Hey @pedram ,

Markdown would need to be pass as HTML correct, our text based editor would not render this natively.
And also correct with the file.

If you have a sample you would like us to take a look at feel free to share more of your use case!

Many thanks,
PT

1 Like

Thanks. Related to what you mentioned, I converted my markdown content to standard HTML content. Now I’m trying to import the HTML data to Catalog. For that, I set up an Azure integration since I’m reading the HTML files from my storage account (and it says: This integration is configured to read [url to my Azure container], so the integration validation is successful) so looks like the connection to Azure is set. Then I uploaded my *.html files to the Azure container and created assets based on the following format:

data_row = {
        "row_data": blob_url_to_html_file, 
        "global_key": a_global_key
 }

where my blobs URLs are in the following pattern:

https://<account-name>.blob.core.windows.net/<container-name>/<blob-name>

Then created a dataset on Labebox using the assets. Now the problem is that the dataset gets created but the data is not shown on Labebox (looks like it can’t read content from HTML files uploaded to Azure.) I tested this with the sample URLs to HTML files on your website, and for those URLs, I see the HTML content. So I guess this might be a permission issue with Azure? I’ve exactly followed the steps here. So I wonder what I’m missing?

[UPDATE-1] Never mind, there was an issue with my HTML format, Azure integration is working fine.

[UPDATE-Final!] Issue solved but writing this for the future in case anyone else had the same issue: the problem did not have anything to do with my HTML format or Labelbox. When I was uploading my HTML files using Azure Python SDK to my container, apparently I should have specified the content type as well. For example:

from azure.storage.blob import ContentSettings

blob_client = container_client.get_blob_client(blob_name)
content_type = "text/html"
blob_client.upload_blob(blob_content, blob_type="BlockBlob", overwrite=True,
                                 content_settings=ContentSettings(content_type=content_type))
1 Like

Hey @pedram ,

Good to hear your effort paid off! I’m guessing your blobs may have some content type header that may have differed from the file extension :

Labelbox will take the file extension to set the type but it may happen that the header prevent us to enforce that.
Good mentioned on how to overwrite this programmatically for Azure blobs!

Many thanks,
PT

2 Likes