Visual Language Model Icon

News

Google’s PaLM-E is a generalist robot brain that takes commands

On Monday, a group of AI researchers from Google and the Technical University of Berlin unveiled PaLM-E, a multimodal embodied visual-language model (VLM) with 562 billion parameters that ...

Ars Technica2y

Microsoft unveils AI model that understands image content, solves visual puzzles

a multimodal model that can reportedly analyze images for content, solve visual puzzles, perform visual text recognition, pass visual IQ tests, and understand natural language instructions.

The Verge2y

A Google DeepMind AI language model is now making descriptions for YouTube Shorts

Google just combined DeepMind and Google Brain into one big AI team, and on Wednesday, the new Google DeepMind shared details on how one of its visual language models (VLM) is being used to ...

VentureBeat8mon

Nvidia just dropped a bombshell: Its new AI model is open, massive, and ready to rival GPT-4

Benchmark results comparing NVIDIA’s NVLM-D model to AI giants like GPT-4, Claude 3.5, and Llama 3-V, showing NVLM-D’s competitive performance across various visual and language tasks.

Results that may be inaccessible to you are currently showing.

Hide inaccessible results