What evidence should a vendor provide to support an AI accuracy claim?

The FTC standard requires competent and reliable evidence to exist before the claim is made. For an AI accuracy claim, that means: a named definition of what accuracy means for this output type, the task type and inputs tested, sample size, error rate, failure categories, and whether the test conditions match typical user deployment. A percentage figure without this context cannot be independently evaluated.

Is a high accuracy rate in a vendor's own testing reliable?

Vendor-conducted accuracy tests are not independently checked and may use curated inputs, favorable conditions, or a narrow task scope. Before relying on a vendor's accuracy figure, ask for the test conditions, the input set used, who conducted the test, and whether the results have been replicated on a broader population. FTC enforcement actions against IntelliVision (December 2024) and Evolv Technologies (November 2024) show that internal accuracy claims were not supported by field evidence under real deployment conditions.

Does AI accuracy stay consistent after deployment?

AI model accuracy does not automatically stay consistent after deployment. Accuracy can degrade when real inputs—users, environments, formats, or language—differ from the training and test population. Ask the vendor for post-deployment performance data, the update cadence for retraining, and what the error rate looks like on inputs outside the original test scope.

AI accuracy claims: what field evidence should buyers ask for?

Last reviewed June 5, 2026

Broad AI accuracy claims describe recognition, prediction, classification, screening, or detection performance outside AI text detector tools. This page focuses on the field evidence a buyer should request before relying on accuracy wording in a safety-sensitive or operational workflow.

Check a broad AI accuracy claim How the evidence method works

Fastest path: copy one exact vendor sentence that matches this pattern, then open the checker. Add the public URL only if you want readable page context recorded alongside the wording. The result is an evidence-burden note you can reuse in vendor follow-up or internal review, not a verdict. Not sure what a result looks like? See a sample receipt.

What to verify before you rely on the claim

A benchmark or field test that matches the buyer's actual environment, not only a controlled demo.
False positive and false negative rates, with the threshold or sensitivity setting used to produce them.
Subgroup and edge-case performance where the claim mentions bias, fairness, safety, or broad population coverage.

Sources behind AI accuracy claims

FTC IntelliVision press release
· December 3, 2024
Source for facial recognition accuracy, bias, training-data, and anti-spoofing claim evidence.
FTC Evolv Technologies press release
· November 26, 2024
Source for AI-powered screening claims about detection, speed, false alarms, and comparison to metal detectors.

Documented AI accuracy claims examples

"one of the highest accuracy rates on the market"

Accuracy / Performance

Source and date: FTC IntelliVision press release · December 3, 2024
Evidence signal: Comparative accuracy wording without the comparison set visible to the buyer.
Evidence gap: A buyer needs the benchmark, market definition, tested population, sample size, and date of comparison.
Buyer question: For the highest accuracy claim, which products and test conditions were included in the market comparison?

Load this sample in the checker

"detect all weapons"

Accuracy / Performance

Source and date: FTC Evolv Technologies press release · November 26, 2024
Evidence signal: All-results wording in a safety-sensitive detection task.
Evidence gap: A buyer needs detection rates by item type, field conditions, sensitivity setting, and missed-item analysis.
Buyer question: For the detect-all-weapons claim, what item types, concealment methods, and environments were tested?

Load this sample in the checker

"reduce false alarm rates"

Accuracy / Performance

Source and date: FTC Evolv Technologies press release · November 26, 2024
Evidence signal: Improvement claim without the tradeoff between missed detections and extra alarms.
Evidence gap: A buyer needs false alarm rates, missed-detection rates, staffing impact, and comparison to the baseline system.
Buyer question: For the false-alarm claim, what sensitivity setting was used and how did it affect missed detections?

Load this sample in the checker

Evidence map for AI accuracy claims

Claim pattern	Evidence needed	Buyer question
Highest accuracy, best accuracy, or market-leading performance	Benchmark design, comparison set, test date, sample size, confidence interval, and model version.	What exactly was compared, and would that comparison still hold in our environment?
Field accuracy in a safety-sensitive workflow	Deployment setting, threshold setting, missed-event rate, false alarm rate, staffing impact, and update process.	What happened when the model was used in conditions that match our workflow, not only a controlled test?
Zero bias or performs equally across groups	Subgroup metrics, error-rate spread, demographic coverage, and post-deployment monitoring.	Which groups were tested, and where did the largest performance gap appear?
Detects all targeted objects, behaviors, or events	Target taxonomy, field test results, false negatives, false positives, and edge-case examples.	What target types were missed during testing or deployment?
Faster or more accurate than an existing process	Baseline process, side-by-side test, throughput, error tradeoffs, and staffing assumptions.	Did speed improve by changing the threshold in a way that increased errors or manual work?
Cannot be tricked, spoofed, or bypassed	Adversarial test method, attack types, success rate, update cadence, and known limitations.	Which spoofing or bypass methods were tested, and which were not tested?

Evidence buyers need for AI accuracy claims

A benchmark or field test that matches the buyer's actual environment, not only a controlled demo.
False positive and false negative rates, with the threshold or sensitivity setting used to produce them.
Subgroup and edge-case performance where the claim mentions bias, fairness, safety, or broad population coverage.
A comparison baseline that names the existing process, traditional tool, or competing product being compared.
A model version, test date, and update process so the buyer can tell whether the evidence is current.

Buyer questions for AI accuracy claims

For this AI accuracy claim, what was the exact task: recognition, screening, classification, prediction, or detection?
Was the claim tested in field conditions that match our workflow, or only in a controlled benchmark?
What are the false positive and false negative rates at the threshold we would use?
What changed between the benchmark setting and live deployment: lighting, users, language, sensor type, staffing, or threshold?
If the claim mentions bias or equal performance, what subgroup results can we review?
What baseline process or competing tool is the accuracy claim being compared against?

Safer wording for AI accuracy claims

Reported X% accuracy on a named test set under stated operating conditions.
Performance varies by population, environment, threshold, and item or event type.
Reduces selected false alarms in tested settings; buyers should review missed-detection rates separately.
Includes anti-spoofing tests for named attack types, with limitations stated.

AI accuracy claims questions

What evidence should a vendor provide to support an AI accuracy claim?: The FTC standard requires competent and reliable evidence to exist before the claim is made. For an AI accuracy claim, that means: a named definition of what accuracy means for this output type, the task type and inputs tested, sample size, error rate, failure categories, and whether the test conditions match typical user deployment. A percentage figure without this context cannot be independently evaluated.
Is a high accuracy rate in a vendor's own testing reliable?: Vendor-conducted accuracy tests are not independently checked and may use curated inputs, favorable conditions, or a narrow task scope. Before relying on a vendor's accuracy figure, ask for the test conditions, the input set used, who conducted the test, and whether the results have been replicated on a broader population. FTC enforcement actions against IntelliVision (December 2024) and Evolv Technologies (November 2024) show that internal accuracy claims were not supported by field evidence under real deployment conditions.
Does AI accuracy stay consistent after deployment?: AI model accuracy does not automatically stay consistent after deployment. Accuracy can degrade when real inputs—users, environments, formats, or language—differ from the training and test population. Ask the vendor for post-deployment performance data, the update cadence for retraining, and what the error rate looks like on inputs outside the original test scope.