28 Apr 2026 4 min read

Steal My Prompt Vol. 31: The AI Output Validator

A reusable prompt that audits any AI-generated output for unsupported claims, weak reasoning, and hidden assumptions before you ship it. Built for leadership updates, client reports, and any high-stakes content where confident-sounding writing is not the same as accurate writing.

AI users have quietly adopted a bad habit. They ask AI a question, read the response, and move on. The output sounds confident. It looks polished. So they trust it. Then they paste it into an email, a report, or a slide deck without a second read.

That is how small errors become big mistakes. A mistaken statistic in a leadership update. An outdated best practice in a team memo. A wrong dosage, a wrong citation, a wrong conclusion buried in paragraph three where nobody thinks to look.

Confidence in language is not the same as confidence in accuracy. AI is trained to sound authoritative, not to flag its own weak spots. The biggest gap in professional AI use right now is not prompt writing. It is output checking. Tools like Word and Copilot generate content for you. They do not critique what the AI produced. Which means the burden falls on you, and most people skip that step because they do not have a process for it.

This prompt gives you the process. You paste in any AI-generated output (from any tool, any conversation, even output a colleague sent you) and the model runs a structured audit. It flags weak reasoning, unsupported claims, missing context, and hidden assumptions. It returns a short report with the three things you should verify or rework before you ship anything.

I built this after noticing that the AI outputs I trusted the least were often the ones that sounded the most authoritative.

What You Can Use This For

Auditing AI-generated output before a meeting where you plan to present it
Checking an AI-written summary of a meeting, document, or research study for missed context
Pressure-testing an AI-drafted strategy recommendation before sending it to leadership
Reviewing AI output from a colleague or vendor before acting on it
Fact-checking an AI response that includes statistics, citations, or specific claims
Catching weak reasoning in AI-generated content you plan to publish or act on in healthcare, finance, legal, or any regulated field

How to Use It

Open Claude, ChatGPT, Microsoft Copilot, or Gemini. All work on free tier.
For high-stakes output (regulatory, clinical, financial, legal, or leadership-facing), turn on the reasoning model option in your tool. "Think Deeper" in Copilot, "Extended thinking" in Claude, the reasoning model in ChatGPT, or "Deep Think" in Gemini. The extra reasoning step catches more of what your first read missed.
Copy the full prompt below and paste it in. Fill in the bracketed fields with the AI-generated output you want to audit and what you plan to use it for.
Read the audit carefully. The model will return a scored report with specific findings. Do not argue with the findings. Verify them. If the model flags a claim as unsupported, the fix is to check the source, not to defend the claim.
Use the audit as a checklist before shipping. Fix what it flagged. Then run the corrected version back through the prompt one more time if the stakes are high.

Pro tip: Run this prompt on outputs from a different AI tool than the one that generated them. Asking Claude to audit ChatGPT output, or Copilot to audit Gemini output, produces sharper critiques because each model has different blind spots.

The Prompt

You are a senior reviewer with a reputation for catching what other people miss. Your job is to audit AI-generated output before I ship it. You are not here to rewrite or improve it. You are here to tell me what is wrong with it.

Calibrate your review based on how I plan to use this output. Higher-stakes use cases (leadership, clinical, client-facing, regulatory) require stricter standards. Lower-stakes use cases (internal notes, first drafts) allow more tolerance. Weight your findings accordingly.

Review the output against these five categories:

1. TONE AND CONFIDENCE MISMATCH

Anywhere the output sounds more confident than the evidence allows. Flag authoritative language paired with weak support. This is the most common problem and the hardest to catch on a first read.

2. UNSUPPORTED CLAIMS

Any statement presented as fact that is not backed by a source, reasoning, or evidence. Flag statistics, historical claims, attributions, and specific numbers that could be wrong.

3. WEAK REASONING

Any logical jump, conclusion that does not follow from the premise, or argument that relies on hidden assumptions. Flag where the output moves too fast or connects dots it did not earn.

4. MISSING CONTEXT

Anything important the output left out that would change how someone reads it. Consider the intended audience, the stakes, and the domain.

5. HIDDEN ASSUMPTIONS

Anything the output treats as obvious or given that is actually debatable. Flag framing choices, definitional shortcuts, and value judgments dressed up as facts.

Deliver your findings in this format:

TOP 3 ISSUES TO FIX FIRST

[Rank the three most important problems. Be specific about where in the output they appear and what needs to change.]

FULL FINDINGS BY CATEGORY

[List every issue you found under the five categories above. If a category has no issues, say "none found" and move on.]

VERIFY BEFORE SHIPPING

[List the specific claims, numbers, or statements I should personally verify before using this output. Be concrete. "Verify the 2026 BCG study exists and check the actual percentages" not "verify statistics."]

Do not rewrite the output. Do not soften your findings. Be direct. If the output is solid, say so and explain why. Do not invent problems to look thorough.

Here is the output I want audited: [PASTE THE AI-GENERATED OUTPUT HERE]

Here is what I plan to use it for: [BRIEF DESCRIPTION: AN EMAIL TO LEADERSHIP, A CLIENT REPORT, A SOCIAL MEDIA POST, A CLINICAL HANDOFF, ETC.]

Transparency and Notes

Built and tested in Claude with extended thinking enabled. Works in ChatGPT, Microsoft Copilot, and Gemini on free tier.
Model-agnostic. No paid features or file uploads required.
Cross-tool auditing tends to produce the sharpest results. If your output came from ChatGPT, run the audit in Claude or Copilot. If it came from Copilot, run it in Gemini or Claude. Different models catch different blind spots.

What You Can Use This For

How to Use It

From User to Builder

The Prompt

Transparency and Notes

You might also like...

Steal My Prompt Vol. 30: The Plain English Translator

Steal My Prompt Vol. 29: Teach Your AI Who You Are

Steal My Prompt Vol. 28: The Anti-Sycophancy System

Steal My Prompt Vol. 27: The Session Saver

Steal My Prompt Vol. 26: The Brain Dump to Draft