AI detector accuracy claims: what should a buyer ask?

Last reviewed May 24, 2026

AI detector accuracy claims often compress a test result into one confident number. This page shows what a buyer should ask when a detector claims high accuracy across AI-generated and human writing.

Evidence buyers verify

  • A benchmark that matches the buyer's content category, not only a convenient test set.
  • False positive and false negative rates, not only one headline accuracy number.
  • Disclosure of which AI models, languages, document lengths, and editing levels were tested.

Opens the checker for this claim type. Paste your vendor's exact wording there. Evidence questions only — not a blacklist or fraud detector. Not sure what a result looks like? See a sample receipt.

Sources this guide draws from

  1. · April 28, 2025

    Source for the 98 percent detector accuracy claim and general-purpose content issue.

  2. · August 28, 2025

    Case timeline and final source status.

  3. · Published June 25, 2025

    Official research source for text-generation and discriminator evaluation design, benchmark limits, and detector-technology measurement context.

Public claims with documented evidence gaps

"98 percent accurate"

Accuracy / Performance
Source and date
FTC Workado proposed order release · April 28, 2025
Evidence signal
Single accuracy number presented for a broad detection task.
Evidence gap
A buyer needs the test corpus, whether marketing copy was included, model list, language mix, and error breakdown.
Buyer question
For the 98 percent accurate detector claim, what false positive rate applies to human-written marketing copy?

"more accurate for the average user"

Accuracy / Performance
Source and date
FTC Workado proposed order release · April 28, 2025
Evidence signal
Average-user framing without the exact user task or content mix.
Evidence gap
A buyer needs the intended user profile, document categories, and whether the detector was tested on non-academic content.
Buyer question
For the average-user claim, were blog posts, support content, product pages, and AI-edited drafts part of the evaluation?

"reliable indicator of AI-generated versus human-written content"

Accuracy / Performance
Source and date
FTC Content at Scale AI case page · August 28, 2025
Evidence signal
Detector-output wording that can be read as a dependable content-origin judgment.
Evidence gap
A buyer needs the benchmark corpus, generator models, human-writing comparison set, threshold, false positive rate, and false negative rate.
Buyer question
For the reliable-indicator claim, what evidence shows the detector performs on the same writing category and AI model outputs we need to review?

Match each claim pattern to the evidence buyers need

Claim pattern Evidence needed Buyer question
99% accurate or near-perfect detector claim Dataset source, sample size, model list, human baseline, confidence interval, and error rates. What is the detector's false positive rate on human writing in your content category?
Works across ChatGPT, Claude, Gemini, and human writing Model versions, prompt diversity, editing level, language coverage, and update cadence. How quickly does the benchmark update when model outputs or writing styles change?
Score indicates whether text is AI-generated Score interpretation, threshold setting, review workflow, and human override process. Is the score an auxiliary signal or a final determination in your workflow?
Detector benchmark or discriminator-task result Benchmark corpus, generator models, human-writing comparison set, scoring metric, threshold, test rounds, and content-type limits. Does the benchmark match the document type, model outputs, editing level, and review decision we plan to use?

Evidence to request

  • A benchmark that matches the buyer's content category, not only a convenient test set.
  • False positive and false negative rates, not only one headline accuracy number.
  • Disclosure of which AI models, languages, document lengths, and editing levels were tested.
  • A statement of how the detector should and should not be used in review decisions.

Questions to put in front of the vendor

  • For this detector accuracy claim, what content type was tested: academic, marketing, support, reviews, or mixed text?
  • What happens when AI text has been lightly edited by a human?
  • What false positive rate should we expect for human-written copy in our use case?
  • Does the vendor present the detector score as an auxiliary signal or as a final determination?

Wording boundaries to compare against

  • Reported X% accuracy on a named benchmark covering specified models and document types.
  • Provides a detector score to support human review, not a final determination.
  • Evaluated on a named benchmark for specified discriminator tasks; do not treat the score as a standalone decision.
  • Performance may vary for edited, short, non-English, or out-of-sample text.

Have your vendor's exact claim wording ready?

Check an AI detector accuracy claim How the evidence method works