NanoVLM

Text ↔ Image Retrieval Playground

Synthetic shape world with ONNX encoders running fully in the browser.

This demo predicts the closest caption for an image using similarity search, not image generation. Try creating one on the left and run caption search.

Left Panel

Synthetic
red square left

Right Panel

Loading models...
embed_dim 32heads 4R@1 i2t 0.0%R@5 i2t 0.0%