Hello,
I would like to generate the same image embeddings as built-in in labelbox. I know that you are using CLIP model but when I used model from Hugging Face website, embeddings are slightly different. I cannot see any notes about process of generating embeddings in docs. Could you share what exact steps you take? I mean what pre/post processing, exact version of model or eventually additional steps?
1 Like
We made a guide some time ago about embeddings, AI foundations: Understanding embeddings
Labelbox generates embedding based on the thumbnail we are creating on asset upload.
The embeddings are computed with the original CLIP model from OpenAI: GitHub - openai/CLIP: CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
512
dimensions means that the embeddings vectors have 512 components. The input resolution of the images is not relevant for this, but fwiw, CLIP has an input resolution of 224x224
pixels.
About recreating the values: The model above will give you the same exact embeddings we use, but we do some quantization and scaling in order to optimize the vector comparisons that happen during similarity search.
Depending on what you are comparing with, this might be why you are not getting the same values.
1 Like