News
A vision encoder is a necessary component for allowing many leading LLMs to be able to work with images uploaded by users.
The LLM is typically pre-trained. For instance, LLaVA uses the CLIP ViT-L/14 for an image encoder and Vicuna for an LLM decoder. Vicuna fine-tunes LLaMA on conversations from ShareGPT. Both the ViT ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results