🕹️ Try and Play with VAR! We provide a demo website for you to play with VAR models and generate images interactively. Enjoy the fun of visual autoregressive modeling! We provide a demo website for ...
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-Free Visual Document Understanding
Abstract: In Visual Document Understanding (VDU) tasks, finetuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision encoder to identify ...
Abstract: The remote sensing visual grounding (RSVG) task focuses on accurately identifying and localizing specific targets in remote sensing (RS) images using descriptive query expressions. Existing ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results