Elon Musk’s AI company–xAI—recently released Grok 1.5 a large language model (LLM) that was expected to rival OpenAI’s GPT-4, the model that powers ChatGPT-4. However, Grok-1.5 could only process text, not images, far behind OpenAI’s capability.
Now, Musk has launched Grok-1.5 Vision, or Grok-1.5V, an AI chatbot that can process both text and visual information.
Following new RealWorldQA benchmark tests that measure real-world understanding, Musk has revealed that Grok-1.5V has outperformed its competitors–which include OpenAI’s GPT-4, Anthropic’s Claude 3, and Google’s Gemini Pro–in its ability to process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.
Grok-1.5V can reason through complicated texts, science diagrams, charts, screenshots, and images in addition to responding to pictures and screenshots. It can perform tasks like translating a diagram into code, generating bedtime stories from hand-drawn pictures, pinpointing the largest object among a group of objects, and in the future, potentially even writing tweets for Premium X (formerly known as Twitter) users. That’s Musk’s end-goal, anyway, much to the discomfort of Grok-1.5 developers who are reportedly struggling to use xAI API because it’s so slow, and X employees who are concerned that Grok, who has previously created and promoted dangerous fake stories, could do the same, damaging the brand and causing harm to its users.
Although Grok-1.5V isn't currently available, it is “coming soon” to a handful of early testers and existing users as a preview. Musk remains tight-lipped about an official launch date, stating that xAI will make “significant improvements in both capabilities, across various modalities such as images, audio, and video" before it will be publicly available.