- PatentBench-Novelty Search
- Methodology
- Use Cases
Overview
The Patsnap PatentBench is a benchmark specifically for novelty search tasks in real-world patent scenarios.
It evaluates the performance of three AI tools: Patsnap’s Novelty Search AI Agent, ChatGPT-o3 (with web search), and DeepSeek-R1 (with web search).
The benchmark is based on a group of test samples, each consisting of a “test question” and a “standard answer”—a curated set of X documents from various patent offices that closely represent the ideal references used in actual novelty searches.
Understanding Novelty Search
Novelty search is a key patent task that involves systematically identifying prior art worldwide to determine whether a technical solution is new and inventive under patent law.
It plays a critical role throughout the innovation process, including:
R&D planning: guiding the direction and feasibility of new developments
Pre-filing: verifying that an invention is patentable before submission
Patent examination: helping examiners assess the novelty of applications
Key Findings
The evaluation dataset is evenly distributed across IPC classifications, covering both mainstream technologies and niche domains. In terms of language, 68% of the data is in English and 32% of the data is in Chinese, ensuring the model performs well across multilingual patent content. For receiving-office distribution, applications from United States (US) and China (CN) each make up about 32%, while those from the European Patent Office (EP) and WIPO (WO) each account for roughly 18%. This balanced mix reflects the different examination styles across major patent jurisdictions and ensures more realistic, globally representative evaluation.
Language distribution of patent texts in 340 test samples
Distribution of IPC samples across 340 test samples
Distribution of receiving offices for 340 test samples
Benchmark results show that Patsnap’s Novelty Search AI Agent achieved a 81% X Detection Rate and a 36% X Recall Rate within the top 100 results—significantly outperforming two leading general-purpose AI tools.
1) X Hit Rate
Patsnap’s Novelty Search AI Agent successfully identified at least one relevant X document in 81% of test cases—an essential capability for speeding up decision-making in patent examination and early-stage R&D.
X Hit Rate
The percentage of tests with accurate hits in the top 100 results
2) X Recall Rate
Patsnap’s Novelty Search AI Agent retrieved 36% of all relevant X documents, enabling more thorough analysis and more informed patent claim drafting.
A high X Recall Rate is key during R&D planning and before filing a patent. Patsnap’s Novelty Search AI Agent helps teams—whether in-house researchers, patent professionals, or external agents—find more relevant X documents. This supports better technical decisions and stronger patent claims, increasing the chances of patent approval.
X Recall Rate
Share of X documents found in the top 100 results
3) Typical Test Result Sample
In this test, the patent specification (the “problem statement”) was submitted to each AI tool. Their results were then evaluated against a predefined set of X documents (the “model answer”).
Patsnap’s Novelty Search AI Agent successfully identified all four relevant patent families within the top 100 results, achieving an X Hit Rate of 100% and an X Recall Rate of 100%.
By comparison, both ChatGPT-o3 and DeepSeek-R1 also achieved a 100% X Hit Rate. However, ChatGPT retrieved only one relevant patent family, leading to a much lower X Recall Rate of 25%, while DeepSeek failed to retrieve any, resulting in an X Recall Rate of 0%.
These findings highlight that while general-purpose LLMs excel in reasoning, they struggle with highly specialized tasks like patent novelty search. In comparison, domain-specific AI tools like Patsnap’s Novelty Search AI Agent offer superior accuracy and relevance, underscoring their essential role in patent-focused workflows.
A single-sample benchmark test
Future Research
Future benchmarks will further expand this dataset and refine evaluation methods for greater accuracy and coverage.