AI vendor due diligence: claims to verify before you buy
Last reviewed May 30, 2026
Use this checklist before you sign a contract or approve a purchase. It helps procurement teams, technical leads, and due-diligence reviewers compare AI vendor claims across sales decks, demos, and product pages against field evidence—not controlled demo results alone. To walk one product page sentence by sentence, use the vendor claim evidence checklist instead.
Evidence buyers verify
- Benchmark or field test conditions that match your actual deployment environment, not a controlled demo or lab setting.
- Third-party audit documentation for any compliance or certification claim, with scope boundaries and the auditor's name.
- False positive and false negative rates at the threshold or sensitivity setting you plan to use.
Opens the checker for this claim type. Paste your vendor's exact wording there. Evidence questions only — not a blacklist or fraud detector. Not sure what a result looks like? See a sample receipt.
Sources this guide draws from
- FTC v. accessiBe Inc. — Deceptive AI Accessibility Claims FTC enforcement· March 2024
FTC action on claims that an AI overlay automatically makes websites fully ADA and WCAG 2.1 compliant. Compliance certification claims for AI tools require audit scope, tested standards version, and named certification body.
- · December 3, 2024
FTC action on accuracy claims for an AI facial recognition product used in security and access control. Accuracy figures require benchmark scope, tested populations, and subgroup bias results.
- FTC v. DoNotPay Inc. — AI Professional Replacement Claims FTC enforcement· February 2025
FTC action on claims that AI could replace qualified legal professionals. Automation and replacement claims require scope, failure handling, and disclosure of when professional review is still needed.
Public claims with documented evidence gaps
"automatically makes any website fully ADA and WCAG 2.1 compliant"
Compliance / Safety- Source and date
- FTC v. accessiBe Inc. · March 2024
- Evidence signal
- Absolute compliance certification claim without named certification body, tested standards version, or audit scope.
- Evidence gap
- A buyer needs the standards version tested, what a passing audit covers, which elements the AI handles versus what still requires manual developer remediation, and whether the certification is verifiable by an independent auditor.
- Buyer question
- Which WCAG criteria does the overlay satisfy, which require manual work by our development team, and can an independent accessibility auditor verify our site's compliance using this product?
"no bias in recognition across all demographics"
Accuracy / Performance- Source and date
- FTC v. IntelliVision Technologies · December 3, 2024
- Evidence signal
- Zero-bias claim without named demographic groups, error rates by group, test dataset, or external audit.
- Evidence gap
- A buyer needs false acceptance and false rejection rates by demographic group, the dataset used for bias testing, and whether testing was conducted internally or by an independent lab.
- Buyer question
- What demographic groups were included in the bias test, and what were the false acceptance and false rejection rates for each group?
"the world's first robot lawyer that can handle any legal matter"
Automation / Replacement- Source and date
- FTC v. DoNotPay Inc. · February 2025
- Evidence signal
- Professional replacement claim without scope statement, failure-handling disclosure, or guidance on when human professional review remains necessary.
- Evidence gap
- A buyer needs a clear statement of which legal tasks the product performs, which it cannot perform, when its output must be reviewed by a licensed attorney, and what happens when the product makes an error.
- Buyer question
- Which specific legal tasks does the product complete accurately, and for which tasks does the output need to be reviewed by a licensed attorney before relying on it?
Claim review checklist
- 1. Inventory every AI-specific claim Claim inventory
List all claims on the vendor's product page, demo, and sales deck that involve accuracy, compliance, replacement, ROI, or uniqueness. Do not rely on any claim that does not have supporting evidence.
- 2. Ask for benchmark conditions that match your environment Accuracy / Performance
Request the benchmark dataset, task scope, sample size, threshold setting, and deployment conditions. If the vendor's test environment differs from yours, ask for evidence from conditions that match your workflow.
- 3. Verify compliance and certification claims independently Compliance / Safety
Ask for the certification body name, audit period, scope boundaries, and whether your configuration and data environment are covered. Self-attestation without an independent auditor is not the same as certification.
- 4. Request ROI evidence from deployments similar to yours ROI / Outcome
Ask for the customer case study conditions: company size, workflow, baseline process, and measurement period. A cost-saving figure from a different industry or deployment type may not apply to your environment.
- 5. Clarify what the AI automates and what still requires human review Automation / Replacement
Ask the vendor to state explicitly which decisions or tasks the AI handles autonomously, which require human review before acting, and what the process is when the AI makes an error or is uncertain.
- 6. Test vague AI-powered wording Vague AI-powered
Ask the vendor to explain what 'AI-powered', 'AI native', or 'built on AI' means in terms of the specific model, training data source, output type, and update process. Rules-based automation is not the same as a trained model.
Match each claim pattern to the evidence buyers need
| Claim pattern | Evidence needed | Buyer question |
|---|---|---|
| ADA, WCAG, HIPAA, SOC 2, ISO 27001, or similar compliance claim | Certification body name, audit period, scope statement, which system components and configurations are covered, and whether the buyer's configuration is included. | Who conducted the audit, what was the scope, and does our configuration fall within it? |
| Percentage accuracy, lowest error rate, or best-in-class performance | Benchmark dataset, task definition, sample size, threshold setting, false positive rate, false negative rate, and whether the test environment matches the buyer's deployment. | What threshold was used to produce this accuracy figure, and what was the false positive rate at that threshold? |
| Replaces or eliminates the need for a human professional or role | Explicit scope of what the AI does and does not do, documented failure-handling process, and clear statement of when professional review is still required. | Which decisions must still go to a human professional, and what is the workflow when the AI output is incorrect? |
| Saves X% cost, reduces hours by Y, or generates Z ROI | Deployment conditions of the reference customer, measurement method, time period, baseline process, and whether those conditions match the buyer's environment. | Can you share a case study from a company similar to ours, including the baseline process and measurement method used to calculate this figure? |
| No bias or performs equally across groups | Named demographic groups tested, error-rate spread by group, dataset coverage, independent or internal test, and post-deployment monitoring plan. | What were the false positive and false negative rates for each demographic group tested? |
| AI-powered, AI native, built on AI, or uses AI | Description of the specific model or technique used, training data source, how the AI output is used in the product, and how often the model is updated. | What exactly does the AI component do in this product, and how is it different from rules-based automation or simple pattern matching? |
Evidence to request
- Benchmark or field test conditions that match your actual deployment environment, not a controlled demo or lab setting.
- Third-party audit documentation for any compliance or certification claim, with scope boundaries and the auditor's name.
- False positive and false negative rates at the threshold or sensitivity setting you plan to use.
- Subgroup performance data for any bias, fairness, or equal-performance claim, covering the populations relevant to your use case.
- A clear description of what the AI handles autonomously versus what requires human review, and the escalation path when the AI is uncertain.
- Customer case study conditions — company size, baseline process, measurement period — for any ROI or cost-saving claim.
Questions to put in front of the vendor
- For each accuracy or performance claim, what were the exact benchmark conditions, and how closely do they match our deployment environment?
- For each compliance or certification claim, who conducted the audit, what was the scope, and does our configuration fall within it?
- For each automation or replacement claim, which tasks still require human review, and what is the process when the AI makes an error?
- For each ROI or cost-saving claim, what was the baseline process and the deployment conditions of the reference customer?
- For any bias or fairness claim, what demographic groups were included in testing, and what were the error rates by group?
- For any 'AI-powered' or 'AI native' wording, what specific model or technique is used, and how is it trained and updated?
Wording boundaries to compare against
- Reported X% accuracy on a named benchmark under stated conditions; buyers should verify performance in their deployment environment.
- Automates [named tasks] for [named workflow]; human review required for [named decision types] and when output confidence is below [threshold].
- Meets [named standard] requirements for [named scope]; audit conducted by [named body] covering [named systems] as of [date]; buyer's configuration may require separate assessment.
- Reduced processing time by X% in a deployment with [company type] using [named baseline process]; results vary by workflow.
Have your vendor's exact claim wording ready?
Check a vendor claim before you buy How the evidence method works