AI detector accuracy claims: what should a buyer ask?
Last reviewed May 24, 2026
AI detector accuracy claims often compress a test result into one confident number. This page shows what a buyer should ask when a detector claims high accuracy across AI-generated and human writing.
Evidence buyers verify
- A benchmark that matches the buyer's content category, not only a convenient test set.
- False positive and false negative rates, not only one headline accuracy number.
- Disclosure of which AI models, languages, document lengths, and editing levels were tested.
Opens the checker for this claim type. Paste your vendor's exact wording there. Evidence questions only — not a blacklist or fraud detector. Not sure what a result looks like? See a sample receipt.
Sources this guide draws from
- · April 28, 2025
Source for the 98 percent detector accuracy claim and general-purpose content issue.
- · August 28, 2025
Case timeline and final source status.
- · Published June 25, 2025
Official research source for text-generation and discriminator evaluation design, benchmark limits, and detector-technology measurement context.
Public claims with documented evidence gaps
"98 percent accurate"
Accuracy / Performance- Source and date
- FTC Workado proposed order release · April 28, 2025
- Evidence signal
- Single accuracy number presented for a broad detection task.
- Evidence gap
- A buyer needs the test corpus, whether marketing copy was included, model list, language mix, and error breakdown.
- Buyer question
- For the 98 percent accurate detector claim, what false positive rate applies to human-written marketing copy?
"more accurate for the average user"
Accuracy / Performance- Source and date
- FTC Workado proposed order release · April 28, 2025
- Evidence signal
- Average-user framing without the exact user task or content mix.
- Evidence gap
- A buyer needs the intended user profile, document categories, and whether the detector was tested on non-academic content.
- Buyer question
- For the average-user claim, were blog posts, support content, product pages, and AI-edited drafts part of the evaluation?
"reliable indicator of AI-generated versus human-written content"
Accuracy / Performance- Source and date
- FTC Content at Scale AI case page · August 28, 2025
- Evidence signal
- Detector-output wording that can be read as a dependable content-origin judgment.
- Evidence gap
- A buyer needs the benchmark corpus, generator models, human-writing comparison set, threshold, false positive rate, and false negative rate.
- Buyer question
- For the reliable-indicator claim, what evidence shows the detector performs on the same writing category and AI model outputs we need to review?
Match each claim pattern to the evidence buyers need
| Claim pattern | Evidence needed | Buyer question |
|---|---|---|
| 99% accurate or near-perfect detector claim | Dataset source, sample size, model list, human baseline, confidence interval, and error rates. | What is the detector's false positive rate on human writing in your content category? |
| Works across ChatGPT, Claude, Gemini, and human writing | Model versions, prompt diversity, editing level, language coverage, and update cadence. | How quickly does the benchmark update when model outputs or writing styles change? |
| Score indicates whether text is AI-generated | Score interpretation, threshold setting, review workflow, and human override process. | Is the score an auxiliary signal or a final determination in your workflow? |
| Detector benchmark or discriminator-task result | Benchmark corpus, generator models, human-writing comparison set, scoring metric, threshold, test rounds, and content-type limits. | Does the benchmark match the document type, model outputs, editing level, and review decision we plan to use? |
Evidence to request
- A benchmark that matches the buyer's content category, not only a convenient test set.
- False positive and false negative rates, not only one headline accuracy number.
- Disclosure of which AI models, languages, document lengths, and editing levels were tested.
- A statement of how the detector should and should not be used in review decisions.
Questions to put in front of the vendor
- For this detector accuracy claim, what content type was tested: academic, marketing, support, reviews, or mixed text?
- What happens when AI text has been lightly edited by a human?
- What false positive rate should we expect for human-written copy in our use case?
- Does the vendor present the detector score as an auxiliary signal or as a final determination?
Wording boundaries to compare against
- Reported X% accuracy on a named benchmark covering specified models and document types.
- Provides a detector score to support human review, not a final determination.
- Evaluated on a named benchmark for specified discriminator tasks; do not treat the score as a standalone decision.
- Performance may vary for edited, short, non-English, or out-of-sample text.
Have your vendor's exact claim wording ready?
Check an AI detector accuracy claim How the evidence method works