Will Baidu’s ERNIE 4.5 & X1 Replace GPT-4.5 and DeepSeek-R1?


China has done it again with its AI models and this time the blow is bigger and better! Baidu – a Chinese AI company, recently released two large language models (LLMs) – ERNIE 4.5 & X1. Claiming to perform better than OpenAI’s latest & greatest model to date – GPT-4.5, these models are more cost-efficient than DeepSeek-R1! The models seem too good to be true – offering high quality at a fraction of the price. In this blog, we’ll explore the ERNIE 4.5 & X1 models, evaluate their benchmark results, and see how they perform in real-world applications. So, let’s begin.

What are ERNIE 4.5 & X1?

ERNIE 4.5 & X1 are the two latest multimodal LLMs developed by the leading Chinese tech company Baidu, specializing in internet services, artificial intelligence, and autonomous driving. It is best known for its dominant search engine in China and advancements in AI-driven innovations. Baidu launched its first LLM, ERNIE 3.0 Titan, back in December 2021. After that, it has released a few more models, while working simultaneously to build more robust LLMs. The result of all the research and continuous efforts is ERNIE 4.5 & X1.

ERNIE 4.5

ERNIE 4.5 is a multimodal foundation model capable of understanding and integrating various data types, including text, images, audio, and video. This diverse modeling approach enhances its ability to understand and generate different kinds of content.

Here are some of the key features of ERNIE 4.5:

  • ERNIE 4.5 shows comprehensive improvements in understanding, generation, reasoning, and memory over its predecessor, ERNIE 4.0.
  • It shows great abilities in hallucination prevention, logical reasoning, and coding, making it adept at handling complex tasks with higher accuracy. ​
  • The model even performs better than OpenAI’s GPT-4.5 in multiple benchmarks, while it only costs 1% of what it costs to use GPT-4.5!

ERNIE X1

ERNIE X1 is designed as a deep-thinking reasoning model with multimodal capabilities. It’s a first of its kind deep thinking model released by Baidu. Here are some of its key features:

  • ERNIE X1 excels in understanding context, planning its thought process, reflecting on its response, and evolving over time.
  • It is capable of autonomously utilizing various tools for tasks such as advanced search, image understanding, and complex calculations.
  • The model delivers performance on par with DeepSeek-R1 but at half the price, offering a cost-effective solution for enterprises seeking advanced AI capabilities.

How to Access ERNIE 4.5 & X1?

You can access ERNIE 4.5 & X1 either through their AI Chatbot – ERNIE Bot, or through APIs.

Access via Bot:

Both the models are freely accessible to individual users on Baidu’s ERNIE Bot platform. However, the registration for ERNIE Bot is currently limited to Chinese nationals.

Access via API:

  • Head to Baidu AI Cloud’s MaaS platform, Qianfan
  • Create your account on the platform to get started.

Currently, the platform can’t be accessed by all users. Also, only ERNIE 4.5 is available via API, while ERNIE X1 will soon be made available on the platform.

ERNIE 4.5 & X1 Performance Check

In this section, we’ll find out how these models perform at tasks involving multimedia, reasoning, document analysis, and more. Since the model interface only supports Chinese language, and the account creation is limited to Chinese nationals, we will look at some examples of how people are using the two models, and the outputs they have received. We will be covering some of the most common use cases of ERNIE 4.5 & X1 we have found online, including:

  1. Reasoning with Image Analysis
  2. Document Analysis and Summarization
  3. Audio Analysis
  4. Creativity and Image Generation

Task 1: Reasoning + Image Analysis

In this task, the model was asked to solve a mathematical problem which was given to it in the form of an image.

Model used: ERNIE 4.5

Output:

Just like most other multimodal LLMs, ERNIE 4.5 quickly analyses the video and solves the problem in the image. It takes all the questions in the image one by one, and finally summarizes them all. The speed and accuracy of its performance make it a useful tool for students, educators, researchers, and professionals who require fast and accurate problem-solving.

Task 2: Document Analysis + Summarization

Here, the model was given a document and it had to summarize the information on a particular topic from that document.

Model used: ERNIE 4.5

Output:

The model allows you to upload multiple files of various types, all at once. It is capable of processing files of different types, including docs, PDFs, PPTs, Excel sheets, and more. From the uploaded files, you can select the one (or more) that you wish to query the chatbot about and the model quickly summarizes the topic. Its quick processing of multiple files can be very useful for tasks like research analysis, legal document review, financial data extraction, and corporate reporting.

Task 3: Audio Analysis

For this task, the model had to analyze the given audio and find its source.

Model used: ERNIE 4.5

Output:

Audio analysis is a feature that none of the popular AI chatbots have incorporated within their interface, making ERNIE 4.5, the first of its kind. The model quickly analyzes the clip, determines its source, and then even goes on to describe the significance of the clip. Its quick analysis and the detailed description, make it a valuable tool for tasks like real-time transcription, voice-based search, deepfake detection, and sentiment analysis across media, customer service, education, and law enforcement.

Task 4: Creativity + Image Generation

For this task, the model had to analyze a room and suggest possible decorations that can enhance its overall appeal. It then had to generate an updated image of the room.

Model used: ERNIE X1

Output:

The model quickly processes the image. It then suggests the possible improvements to the room’s decor to enhance the overall appeal. Finally, it generates the image of the room with all the suggested enhancements. This feature is a great addition for tasks like interior designing, home renovation planning, real estate staging, and virtual decor visualization.

Note: We have taken the examples from this post on X.

Baidu’s ERNIE 4.5 & X1: Pricing

Both ERNIE 4.5 & X1 have all the features, and even more, compared to the top models by OpenAI, DeepSeek, Grok, Claude, etc. Here is a pricing breakdown of the two models:

Model Input Price(per million tokens) Output Price(per million tokens) Availability
ERNIE 4.5 $0.55 $2.20 Available
ERNIE X1 $0.28 $1.10 Not yet available

Compared to other top models, ERNIE 4.5 & X1 are significantly cheaper, making them a valuable asset in the advancement of generative AI.

Baidu's ERNIE 4.5 & X1 Pricing

ERNIE 4.5 & X1: Standard Benchmark Results

We have already seen the features, capabilities, and the pricing of the latest ERNIE models. Now let’s look at some performance numbers of these models against top models like GPT-4.5, GPT-4o, DeepSeek-R1, and more.

The graph below compares ERNIE 4.5 and GPT-4o across multiple benchmarks that test multimodal AI performance.

Baidu's ERNIE 4.5 & X1 Benchmark Performance

The graph shows that:

  • ERNIE 4.5 outperforms GPT-4o in most multimodal tasks.
  • The average score for ERNIE 4.5 is 77.77, which is higher than GPT-4o’s 73.92.
  • ERNIE 4.5 has a significant edge in MathVista and DocVQA, showing better math reasoning and document-based question-answering skills.
  • Both models perform similarly in OCRBench and MMMU, but ERNIE 4.5 still has a slight advantage.

The next graph compares ERNIE 4.5, DeepSeek V3 – Chat, GPT-4o, and GPT-4.5 across multiple benchmarks for text-based reasoning and problem-solving.

Baidu's ERNIE 4.5 & X1 Models Outperform OpenAI’s GPT-4.5 and DeepSeek-R1

Here are some key takeaways from the graph:

  • ERNIE 4.5 leads the pack with an average score of 79.6, narrowly surpassing DeepSeek V3 – Chat at 79.14.
  • It performs well across general knowledge, reasoning, and programming benchmarks such as MMLU-Pro, GSM8K, and HumanEval+.
  • GPT-4o and DeepSeek V3 also demonstrate strong results, with DeepSeek V3 performing competitively in Chinese benchmarks like CMMLU.
  • ERNIE 4.5 excels in GSM8K (math) and C-Eval (general reasoning), although DeepSeek V3 is very close in performance.

Future Impact

The race to be the top LLM is heating up and Baidu’s ERNIE 4.5 & X1 introduce serious competition for OpenAI, DeepSeek, Anthropic, and Meta. With Chinese AI labs delivering models that rival or surpass Western AI at a fraction of the cost, companies will be forced to innovate faster and lower their costs to stay competitive.

All these advancements will finally lead to:

  • Faster AI advancements across all major AI research centers.
  • More affordable AI for businesses and developers.
  • A new era of multimodal AI applications, expanding beyond traditional text-based AI.

Conclusion

Baidu’s ERNIE 4.5 & X1 models are not just another set of AI models – they are industry disruptors. Their superior multimodal and reasoning capabilities, low pricing, and deep integration into China’s digital ecosystem, signal a power shift in the global AI market.

If this trend continues, we would see a larger scale AI democratisation and outreach across various industries. This would also push many western companies to release cheaper models. Not only would this add to competitiveness in the market, but would also ensure that the users get the most value for their money.

Frequently Asked Questions

Q1. What are ERNIE 4.5 & X1?

A. ERNIE 4.5 & X1 are the latest large language models (LLMs) developed by Baidu, designed to rival top AI models like OpenAI’s GPT-4.5 and DeepSeek-R1. ERNIE 4.5 is a multimodal foundation model, while ERNIE X1 is a deep-thinking reasoning model with advanced capabilities.

Q2. How is Baidu’s ERNIE 4.5 different from ERNIE X1?

A. ERNIE 4.5 is optimized for multimodal understanding, capable of processing text, images, audio, and video with high accuracy. ERNIE X1, on the other hand, is designed for deep-thinking reasoning, excelling in context understanding, planning, and problem-solving with self-reflection.

Q3. How do ERNIE 4.5 & X1 compare to OpenAI’s GPT-4.5?

A. Baidu ERNIE 4.5 outperforms GPT-4.5 in multiple benchmarks, particularly in reasoning, multimodal understanding, and hallucination prevention, while costing only 1% of GPT-4.5’s price. ERNIE X1 delivers DeepSeek-R1 level performance at half the cost, making them highly competitive AI solutions.

Q4. What are the pricing details for ERNIE 4.5 & X1?

A. ERNIE 4.5: Input cost $0.55 per 1M tokens, output cost $2.20 per 1M tokens.
ERNIE X1: Input cost $0.28 per 1M tokens, output cost $1.10 per 1M tokens.
The ERNIE X1 model is not yet available via API but will be soon.

Q5. How can I access ERNIE 4.5 & X1?

A. You can access these models through:
1. ERNIE Bot (AI Chatbot) at yiyan.baidu.com (Only available for Chinese users).
2. Baidu AI Cloud’s MaaS platform, Qianfan, for API access (currently only ERNIE 4.5 is available).

Anu Madan is an expert in instructional design, content writing, and B2B marketing, with a talent for transforming complex ideas into impactful narratives. With her focus on Generative AI, she crafts insightful, innovative content that educates, inspires, and drives meaningful engagement.

Login to continue reading and enjoy expert-curated content.

We will be happy to hear your thoughts

Leave a reply

Som2ny Network
Logo
Compare items
  • Total (0)
Compare
0