
China is rapidly advancing in AI, releasing models like DeepSeek and Qwen to rival global giants. DeepSeek has gained widespread recognition, comparable to ChatGPT, while Qwen is making strides with its versatile chatbot, offering vision, reasoning, and coding capabilities in one interface. QwQ 32B is Qwen’s latest reasoning model. It is a medium-sized model, competes with top-tier reasoning models like DeepSeek-R1 and o1-mini, showcasing China’s impressive progress in AI innovation.
What is Qwen’s QwQ 32B?
QwQ-32B is a 32-billion-parameter AI model from the Qwen series. It uses Reinforcement Learning (RL) to improve reasoning and problem-solving skills, performing as well as larger models like DeepSeek-R1. It can adapt its reasoning based on feedback and use tools effectively. The model is open-weight, available on Hugging Face and ModelScope under the Apache 2.0 license, and can be accessed through Qwen Chat. It highlights how RL can boost AI capabilities in meaningful ways.
Also Read: How to Run Qwen2.5 Models Locally in 3 Minutes?
Performance
QwQ-32B has been tested across various benchmarks to evaluate its mathematical reasoning, coding skills, and problem-solving abilities. The results below compare its performance with other top models, such as DeepSeek-R1-Distilled-Qwen-32B, DeepSeek-R1-Distilled-Llama-70B, o1-mini, and the original DeepSeek-R1.

The LiveBench scores, which evaluate reasoning models across a broad range of tasks, show QwQ-32B performing between R1 and o3-mini – but at just 1/10th the cost. The pricing estimates are based on APIs or OpenRouter data, with QwQ-Preview priced at $0.18 per output token on DeepInfra. This makes QwQ-32B a highly efficient and cost-effective option compared to other leading models.
QwQ-32B by Alibaba scores 59% on GPQA Diamond for scientific reasoning and 86% on AIME 2024 for math. It excels in math but lags in scientific reasoning compared to top models.
It is also trending on #1 on HuggingFace.

Also Read: Qwen Chat: The AI Chatbot that’s Better than ChatGPT
How to Access QwQ 32B?
To access the QwQ-32B model, you have several options depending on your needs – whether you want to try it casually, run it locally, or integrate it into your projects.
Via Qwen Chat (Easiest Option)
- Go to https://chat.qwen.ai/.
- Create an account if you don’t already have one.
- Once logged in, look for the model picker menu (usually a dropdown or selection list).
- Select “QwQ-32B” from the list of available models.
- Start typing your prompts to test its reasoning, math, or coding capabilities.
Download and Run Locally via Hugging Face
Requirements:
- Hardware: A high-end GPU with at least 24GB VRAM (e.g., NVIDIA RTX 3090 or better). For unquantized FP16, you’d need around 80GB VRAM (e.g., NVIDIA A100 or H100). Quantized versions (like 4-bit) can run on less, around 20GB VRAM.
- Software: Python 3.8+, Git, and a package manager like pip or conda. You’ll also need the latest version of the Hugging Face transformers library (4.37.0 or higher).
Install dependencies:
pip install transformers torch
Download the model and tokenizer from Hugging Face:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Qwen/QwQ-32B"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
Run a simple inference:
prompt = "How many r's are in the word 'strawberry'?"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
Using Ollama for a Simpler Local Setup
- Download and install Ollama from ollama.com for your OS (Windows, macOS, or Linux).
- Open a terminal and pull the QwQ-32B model:
ollama pull qwq:32b
ollama run qwq:32b
- Type your prompts directly in the terminal to interact with it.
If you want to run it locally, checkout my Collab notebook here.
Let’s Try QwQ 32B
Prompt: Create a static webpage with illuminating candle with sparks around the flame
Prompt: Develop a seated game where you can fire missiles in all directions. At first, the enemy’s speed is very slow, but after defeating three enemies, the speed gradually increases. implement in p5.js
Prompt: Write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically.
Also Read: QwQ-32B vs DeepSeek-R1: Can a 32B Model Challenge a 671B Parameter Model?
End Note
QwQ-32B represents a significant leap in AI reasoning models, delivering performance comparable to top-tier models like R1 and o3-mini at a fraction of the cost. Its impressive LiveBench scores and cost-efficiency, priced at just $0.18 per output token, make it a practical and accessible choice for a wide range of applications. This advancement highlights the potential for high-performance AI to become more affordable and scalable, paving the way for broader adoption and innovation in the field.
Login to continue reading and enjoy expert-curated content.