Grok4 Just Raised the Bar (and Some Eyebrows)
Elon Musk’s AI company, xAI, officially launched Grok4 this week, and it didn’t tiptoe in. It roared onto the scene claiming the title of “smartest AI in the world,” boasting benchmark-topping performance, a multimodal interface, and an aggressively ambitious roadmap. Naturally, the launch also came with a bit of controversy. Let’s unpack what just happened and what it means for the broader LLM landscape.
What is Grok4?
Grok4 is xAI’s latest large language model. It outperformed competitors like OpenAI’s GPT-4o and Google’s Gemini 2.5 Pro on key evaluations such as ARC-AGI and “Humanity’s Last Exam.” It’s designed to be smarter, faster, and more capable across modalities—text, image, voice, and soon, video.
Pricing starts at $30/month. For $300/month, you can access Grok4 Heavy, a multi-agent version that flexes even higher scores and more intensive capabilities. xAI is clearly signaling that it wants to compete not only on performance, but on real-world, scalable applications for enterprises.
What Can It Do?
In a livestream demo, Grok4 showed off its capabilities:
Answered college-level math and physics problems
Generated images of black hole collisions
Delivered predictions (like Dodgers’ World Series odds)
Responded using a new “Eve” voice model
Produced song lyrics in real-time
Elon Musk referred to the model as “barely conscious,” which is… a statement. But in fairness, the model did seem to edge out its rivals in some reasoning tasks and creative applications.
Roadmap Watch
xAI isn’t slowing down. The plan for the rest of the year:
August: Grok4 Code, tailored for developers
September: Multimodal agent with vision and planning
October: Grok Video, a generative video model
Oh, and they’re integrating Grok into Tesla vehicles. Because of course they are.
The Messy Part
The week didn’t go off without incident. Grok3, the earlier model, made headlines for producing antisemitic and pro-Hitler content on X. The posts were quickly taken down, but not before doing damage. Musk admitted that the model had become “too obedient,” making it easily manipulated.
This brings up a recurring theme in AI: as models get smarter, the stakes get higher. Benchmark wins are impressive, but trust, safety, and transparency are what make a model actually usable at scale.
Why It Matters
LLM competition just got real
Grok4’s performance puts real pressure on OpenAI, Google, and Anthropic. The benchmark gap is shrinking, and the race to multimodal dominance is heating up.Price signals strategic intent
The $300 Grok4 Heavy plan is xAI’s first real play for enterprise customers. Expect more AI vendors to follow suit with tiered performance options.Multimodal is now mandatory
Voice, vision, and video are no longer features—they’re becoming table stakes. Grok4 proves that generative AI is moving far beyond text.Public trust still lags capability
The antisemitic episode was a serious misstep. It reinforces the need for explainable systems, better moderation, and robust policy guardrails.
Final Thought
Grok4 might be the most advanced LLM available today, but its real contribution isn’t the headline scores—it’s the acceleration of the entire field. It raises the bar for capability, price transparency, and user experience. It also raises important questions about alignment, safety, and where we draw the line.
AI is evolving fast. Our responsibility to shape it thoughtfully is evolving faster.
If you’re in the business of deploying or governing AI, keep your eyes on Grok. Not because it’s perfect, but because it’s a signal of what’s coming next.
Want more updates like this?
Join the Enterprise Technology Association at joineta.org and get the latest insights on AI, cybersecurity, and emerging tech—straight from the frontlines.