AI claim substantiation: what evidence should support the wording?
Last reviewed May 24, 2026
AI claim substantiation starts with one specific statement: the accuracy number, compliance promise, bias claim, automation claim, or first-of-kind wording a vendor publishes. This guide maps those claim types to the evidence a buyer should ask for before relying on the wording.
Evidence buyers verify
- The exact claim text and the date it appeared.
- The evidence type that matches the claim type: benchmark, audit, subgroup test, customer sample, or workflow documentation.
- Scope limits: content type, language, model version, jurisdiction, user type, or deployment setting.
Opens the checker for this claim type. Paste your vendor's exact wording there. Evidence questions only — not a blacklist or fraud detector. Not sure what a result looks like? See a sample receipt.
Sources this guide draws from
- · September 25, 2024
Source for AI-powered business-opportunity and income claim evidence.
- · April 22, 2025
Source for AI-powered accessibility compliance claim evidence.
- · December 3, 2024
Source for facial recognition accuracy, bias, training-data, and anti-spoofing claim evidence.
Public claims with documented evidence gaps
"stores producing five-figure monthly income by the second year"
ROI / Outcome- Source and date
- FTC Operation AI Comply announcement · September 25, 2024
- Evidence signal
- Projected monthly income wording tied to AI-powered ecommerce tools without visible customer result distribution.
- Evidence gap
- The buyer needs the customer sample, total cost basis, median outcome, loss rate, time period, and evidence that the AI component caused the result.
- Buyer question
- For the five-figure monthly income claim, what median customer result and loss-rate data supports the outcome after all costs?
"make any website compliant with WCAG"
Compliance / Safety- Source and date
- FTC accessiBe · April 22, 2025
- Evidence signal
- Broad automated compliance result with no visible scope limit.
- Evidence gap
- The buyer needs the WCAG version, issue categories tested, audit method, manual remediation boundary, and maintenance responsibility.
- Buyer question
- For the any website compliant claim, which WCAG criteria are automated and which still require manual review?
"system can't be tricked by a photo or video image"
Accuracy / Performance- Source and date
- FTC IntelliVision · December 3, 2024
- Evidence signal
- Anti-spoofing wording with no visible attack-type or test-method boundary.
- Evidence gap
- The buyer needs spoofing attack types tested, field conditions, success rate, model version, update cadence, and known limitations.
- Buyer question
- For the can't-be-tricked claim, which photo, video, mask, replay, or presentation attacks were tested?
"no bias in recognition across all demographics"
Accuracy / Performance- Source and date
- FTC IntelliVision press release · December 3, 2024
- Evidence signal
- Zero-bias claim applied to all demographic groups without visible subgroup error rates, test population, or independent audit.
- Evidence gap
- The FTC found IntelliVision made demographic accuracy claims without the subgroup-level data needed to support them. A buyer needs false acceptance and false rejection rates by demographic group, the dataset used for bias testing, the performance gap between the best- and worst-performing subgroups, and whether testing was conducted internally or by an independent lab.
- Buyer question
- For the no-bias claim, which demographic groups were tested, what were the false acceptance and false rejection rates by group, and who conducted the bias evaluation?
Match each claim pattern to the evidence buyers need
| Claim pattern | Evidence needed | Buyer question |
|---|---|---|
| Accuracy or performance number | Benchmark source, sample size, input categories, model versions, error rates, and date tested. | Does the test match the same content, user, and workflow where we would rely on the product? |
| Compliance or safety result | Standard version, audit scope, issue coverage, exclusions, remediation steps, and maintenance boundary. | Which part of the claimed compliance result is produced by AI and which part requires human review? |
| Bias-free, fair, or safe output claim | Subgroup metrics, test population, failure examples, review process, and update cadence. | What failure rate appears for each subgroup or high-risk input category? |
| Automation or replacement claim | Task boundary, escalation path, human review point, unsupported use cases, and quality checks. | Where does the automated workflow stop before a qualified person reviews or acts? |
Evidence to request
- The exact claim text and the date it appeared.
- The evidence type that matches the claim type: benchmark, audit, subgroup test, customer sample, or workflow documentation.
- Scope limits: content type, language, model version, jurisdiction, user type, or deployment setting.
- Known exclusions, failure states, and human review boundaries.
- A wording boundary that names the tested scope instead of broad wording.
Questions to put in front of the vendor
- For this AI claim, what evidence existed before the wording was published?
- Does the evidence test the same use case, customer group, and input type named or implied by the claim?
- What metric would change the claim's evidence burden: accuracy rate, bias result, compliance promise, automation scope, or first-of-kind wording?
- Can the vendor provide the source, date, method, and limitations behind the claim in writing?
Wording boundaries to compare against
- Reported X% accuracy on a named benchmark covering specified content types and model versions.
- Supports selected WCAG remediation tasks; manual review and ongoing maintenance remain necessary.
- Tested across named demographic groups in a specified setting, with subgroup results available on request.
- Automates defined workflow steps and routes exceptions to qualified human review.
Have your vendor's exact claim wording ready?
Check what evidence an AI claim needs How the evidence method works