Just a few months ago, DeepSeek shook the AI world with its V3, R1, and Janus models. And recently, Elon Musk’s Grok 3 has joined the party! With the claim of being the smartest AI on Earth, expectations from Grok 3 are quite high. As per the official benchmarks shared by the xAI team at the launch event, it seems to be a monster model, outperforming all its competitors on several benchmarks. But is it actually good enough for regular tasks you and I do in our daily lives? Is it worth spending $40 on an X Premium+ account to access the model? You will definitely know the answers by the end of this article. Let’s begin our competition, testing Grok 3 vs DeepSeek R1 on five diverse prompts!

Task 1: Python Code Generation
Prompt: “Write a Python function to generate a random maze using depth-first search and visualize it using Matplotlib”
Grok 3
import random
import numpy as np
import matplotlib.pyplot as plt
def generate_maze(width, height):
"""
Generate a random maze using Depth-First Search and visualize it with Matplotlib.
Parameters:
width (int): Width of the maze (number of columns)
height (int): Height of the maze (number of rows)
"""
# Initialize the maze grid with walls (1) and paths (0)
maze = np.ones((height * 2 + 1, width * 2 + 1), dtype=int)
maze[1::2, 1::2] = 0 # Set paths (odd rows/columns) to 0
def carve_path(x, y):
"""
Recursively carve paths in the maze using DFS.
"""
directions = [(0, 2), (2, 0), (-2, 0), (0, -2)] # Down, Right, Up, Left
random.shuffle(directions)
for dx, dy in directions:
new_x, new_y = x + dx, y + dy
if (0
DeepSeek R1:
import random
import matplotlib.pyplot as plt
def generate_maze(width, height):
# Initialize the grid with walls (1)
grid = [[1 for _ in range(width)] for _ in range(height)]
# Starting point
start_x, start_y = 1, 1
grid[start_y][start_x] = 0
# Stack for DFS
stack = [(start_x, start_y)]
# Directions: up, right, down, left
directions = [(0, -2), (2, 0), (0, 2), (-2, 0)]
while stack:
x, y = stack[-1]
random.shuffle(directions)
for dx, dy in directions:
nx, ny = x + dx, y + dy
if 0
Output:

Observation:
The DeepSeek R1 output presents a clear, well-structured maze that appears professionally generated. The sharp and consistent walls create a properly enclosed labyrinth, ensuring a challenging yet visually clean pathfinding experience. It adheres to good maze-generation principles, making it both functional and aesthetically pleasing.
In contrast, the Grok 3 output appears highly pixelated, with less distinguishable paths due to the thick black-and-white grid structure. The overall design feels more chaotic, with pathways that are less defined compared to DeepSeek R1. While it does utilize Depth-First Search (DFS) generation, the visual clarity and usability are significantly inferior, making it a less effective maze representation.
Verdict:
Grok 3 ❌ | DeepSeek R1 ✅
Task 5: Web Search
As DeepSeek R1 doesn’t allow web search, I have used DeepSeek search option for this task!
Prompt: “What are the latest advancements in nuclear fusion technology as of 2025? Fromat of ourput – List of advancements +source”
Observation:
Between DeepSeek R1 and Grok 3, DeepSeek R1 is the better model for answering research-heavy queries like advancements in nuclear fusion technology. The biggest advantage of DeepSeek R1 is that it provides direct, clickable source links for every claim, ensuring transparency and credibility. It also covers a wider range of advancements, including fusion fuel technology, private-sector investments, regulatory changes, and government initiatives, making its response more comprehensive. Additionally, its sources come from authoritative platforms such as ITER, UKAEA, and the U.S. DOE, which significantly improves trustworthiness.
On the other hand, while Grok 3 delivers better explanatory depth, it falls short in credibility because it does not include direct source links. Instead, it vaguely references news outlets like BBC News or MIT News without proper citations. A major weakness is its reliance on posts from X (Twitter), which often contain unverified claims. While Grok 3 does acknowledge these limitations, the lack of concrete sources makes it less reliable for factual research.
Verdict:
Grok 3 ❌ | DeepSeek R1 ✅
Task 3: Basic Animation Using HTML
Prompt: “Create HTML + CSS code for a red ball rotating inside a square continuously.”
Grok 3:
Rotating Red Ball in Square
DeepSeek R1:
Output:
Observation:
Clearly, Grok 3 is struggling to interpret the prompt and generate the correct response. DeepSeek R1 took about 8-10 minutes to reply, but its output is more precise and accurate.
Verdict:
Grok 3 ❌ | DeepSeek R1 ✅
Task 4: Image Analysis
Prompt: “Analyze this chessboard position. Suggest the best move for the current player (white) to checkmate black and explain the reasoning“

Grok 3:

DeepSeek R1:

Observation:
Both Grok 3 and DeepSeek R1 got the answer wrong. Grok 3 suggested e4-e5, which does not deliver checkmate or threaten the king. DeepSeek R1 suggested Qe1#, but this move is impossible in the given position, showing it misinterpreted the board. The correct move was Qf7#, where the queen delivers checkmate by trapping the Black king. Grok 3 failed to recognize an immediate checkmate, while DeepSeek R1 assumed the wrong board setup instead of analyzing the actual position.
Verdict:
Grok 3 ❌ | DeepSeek R1 ❌
Task 5: Logical Reasoning
Prompt: “Solve this zebra puzzle. Give me output in a table”

Grok 3:

Putting the generated response in the puzzle:

DeepSeek R1:

Putting the generated response in the puzzle:

Observation:
DeepSeek R1 again took a longer to respond but gave the correct answer. Grok 3 failed to understand the image and gave incorrect output.
Verdict:
Grok 3 ❌ | DeepSeek R1 ✅
Grok 3 vs DeepSeek R1: Result
Python Code Generation | ✅ DeepSeek R1 |
Web Search | ✅ DeepSeek R1 |
Basic Animation (HTML + CSS) | ✅ DeepSeek R1 |
Image Analysis (Chessboard Checkmate) | ❌ Both Failed |
Logical Reasoning (Zebra Puzzle) | ✅ DeepSeek R1 |
Also Read:
End Note
Elon Musk’s Grok 3 was hyped as a game-changer in AI, claiming to be the smartest model on Earth. However, in real-world testing, it failed to live up to the expectations. Across multiple tasks, Grok 3 struggled with accuracy, logical reasoning, and complex problem-solving, often producing incorrect or poorly structured responses. Meanwhile, DeepSeek R1 consistently outperformed it, delivering more accurate, structured, and verifiable answers in key areas like code generation, web search, and logical reasoning.
Despite the bold marketing claims, Grok 3 still has a long way to go before it can compete with top AI models. The fact that it failed basic reasoning tasks suggests that xAI needs major improvements in its training approach. However, given Musk’s track record of rapid iteration and improvements, it will be interesting to see if future updates can bridge this gap. Will Grok 3 evolve into the AI powerhouse it claims to be, or will it remain an overhyped experiment? Time will tell.
Stay tuned to Analytics Vidhya Blog to follow Grok 3 updates regularly!