Abstract: To realize adaptive and robust manipulation, a robot should have several sensing modalities and coordinate their outputs to achieve the given task based on underlying constraint in the real ...
Abstract: Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results