FTC Operation AI Comply and enforcement actions: what evidence was missing?
Last reviewed May 30, 2026
FTC enforcement actions — including Operation AI Comply, which grouped AI lawyer, earnings, and review-generation claims into one 2024 enforcement sweep — are public buyer intelligence: each case names the claim that was made, the evidence that was missing, and what the consent order required. This guide turns four FTC enforcement patterns into evidence requests buyers can use in vendor review.
Evidence buyers verify
- A dated source for the exact claim wording as it appeared on a public page.
- Benchmark, test, or audit details that match the scope implied by the public copy.
- A record the vendor can produce: test set description, scope, model version, error rates, and date.
Opens the checker for this claim type. Paste your vendor's exact wording there. Evidence questions only — not a blacklist or fraud detector. Not sure what a result looks like? See a sample receipt.
Sources this guide draws from
- · September 25, 2024
Source for AI legal-replacement, professional-service substitution, and outcome-tied claims in the coordinated enforcement action.
- · November 26, 2024
Source for AI security-screening detection, false-alarm, and comparison-to-metal-detector claims in a safety-critical deployment.
- · April 22, 2025
Source for automated WCAG compliance and continued-compliance claims that resulted in a $1 million payment and claim prohibition.
- · August 28, 2025
Source for AI detector accuracy claims and the record-retention consent requirement that followed enforcement.
Public claims with documented evidence gaps
"sue for assault without a lawyer"
Automation / Replacement- Source and date
- FTC Operation AI Comply announcement · September 25, 2024
- Evidence signal
- No-professional-needed wording for a high-stakes legal task. FTC enforcement found the product lacked evidence the workflow was tested and limited to appropriate use cases.
- Evidence gap
- A buyer needs evidence that the exact task was tested, reviewed by a qualified professional, limited to stated use cases, and accompanied by escalation and error-handling steps for situations where AI output is relied on by a real user.
- Buyer question
- For the without-a-lawyer claim, what qualified review step or task boundary prevents a user from relying on AI output in a legal situation where errors carry real consequences?
"detect all weapons"
Accuracy / Performance- Source and date
- FTC Evolv Technologies press release · November 26, 2024
- Evidence signal
- All-results detection wording in a safety-critical screening context. FTC enforcement found the product did not detect all weapons in field testing.
- Evidence gap
- A buyer needs detection rates by item category and concealment method, missed-detection rate at the sensitivity setting used, field conditions matched to the actual deployment, comparison to the prior screening baseline, and staffing impact when alerts require follow-up.
- Buyer question
- For the detect-all-weapons claim, what item categories and field conditions were tested, and what missed-detection rate applies at the sensitivity setting we would use?
"can make any website compliant with Web Content Accessibility Guidelines (WCAG)"
Compliance / Safety- Source and date
- FTC accessiBe final order release · April 22, 2025
- Evidence signal
- Broad automated compliance promise with continued-compliance language. FTC enforcement resulted in a $1 million payment and a prohibition on making the same claim without adequate support.
- Evidence gap
- The consent order required naming which WCAG issue categories are automated and which require manual remediation. A buyer needs the same scope disclosure: covered criteria, excluded criteria, ongoing monitoring responsibility, and maintenance boundary.
- Buyer question
- For the any-website WCAG-compliant claim, which success criteria are automated, which still require manual remediation, and what ongoing review keeps the compliance claim current?
"98 percent accurate"
Accuracy / Performance- Source and date
- FTC Content at Scale AI case page · August 28, 2025
- Evidence signal
- Headline accuracy number without visible test scope. FTC enforcement required the vendor to create and retain benchmark records supporting any future accuracy claim.
- Evidence gap
- The consent decree specifically required record-keeping for benchmark evidence. A buyer needs the equivalent: test corpus, content categories, model versions, sample size, false positive and false negative rates, and a dated record the vendor can produce if the claim is questioned.
- Buyer question
- For the 98 percent accurate claim, what benchmark record, content category description, and error-rate data does the vendor retain that would support the number if questioned?
Match each claim pattern to the evidence buyers need
| Claim pattern | Evidence needed | Buyer question |
|---|---|---|
| AI replaces a professional, legal service, or expert role | Task scope tested, qualified professional review involvement, known failure cases, user-warning language, escalation path, and stated non-use cases. | Which professional tasks were tested against qualified review before this claim appeared in public marketing copy? |
| AI detects, screens, or protects in a safety-sensitive environment | Detection rate by item or threat category, field conditions, missed-detection rate, sensitivity threshold, comparison baseline, and staffing impact. | What missed-detection rate applies at the sensitivity setting we would use, and how does it compare to the prior screening process? |
| Automated compliance or safety result covering any case | Standard version, automated issue categories, excluded criteria, manual review boundary, ongoing monitoring cadence, and maintenance responsibility. | Which criteria are automated and which still require human review or ongoing manual remediation after deployment? |
| Accuracy number in headline or feature copy | Benchmark design, content or input categories, model versions, sample size, error rates, and a dated evidence record the vendor can produce. | If questioned, what dated benchmark record can the vendor produce showing the test set, scope, and error rates behind this accuracy number? |
Evidence to request
- A dated source for the exact claim wording as it appeared on a public page.
- Benchmark, test, or audit details that match the scope implied by the public copy.
- A record the vendor can produce: test set description, scope, model version, error rates, and date.
- Known exclusions, unsupported use cases, and where human review or manual remediation is still required.
- Scope limits that narrow the task, audience, or evidence base to what was actually tested.
Questions to put in front of the vendor
- For this FTC-style AI claim, what dated benchmark or test record existed when the claim was published?
- If an FTC-style review asked for supporting records, what documentation could the vendor produce today?
- Does the evidence scope match the same workflow, customer type, and environment described in the public copy?
- Which words in the claim carry the highest evidence burden: all, any, fully, first, without a professional, or a specific percentage?
- What narrower wording would be accurate if the evidence only covers a specific input type, sensitivity setting, or deployment context?
Wording boundaries to compare against
- Reported accuracy on a named benchmark covering specified content types and model versions, with error rates available on request.
- Tested for named threat categories in specified field conditions; buyers should review missed-detection rates separately.
- Automates selected accessibility remediation tasks; WCAG conformance for excluded criteria requires ongoing manual review.
- Drafts first-pass text for qualified professional review; not a substitute for legal, medical, or expert advice.
Have your vendor's exact claim wording ready?
Check an FTC-style AI marketing claim How the evidence method works