14 min read

Volume 37: Better AI Is Discipline, Not a Larger Budget

Every commercial AI you use was trained to be helpful through a process most professionals have never heard of. It is why these tools feel useful instead of bizarre, and also why they sometimes tell you what you want to hear. Here is how that training works.

I was deep in a strategy session with my AI, one problem flowing into the next, when the tool locked me out. Session limit reached. I was already paying for the hundred-dollar plan, and the only way to keep going right then was to pay even more. Paying more would have fixed that night. It would not have fixed the problem.

🧭 Founder's Corner: Why your AI sessions run out faster even as tokens get cheaper, and the three habits that keep you building at full power without spending an extra dollar.

🧠 AI Education: The hidden training process behind every model you use, why it makes AI feel helpful, and why it is also the reason the model tells you what you want to hear.

✅ 10-Minute Win: Turn a buried research thread into a clean, shareable link a colleague can open without ever seeing the messy back-and-forth.

Let's dive in.

Enjoying the weekly content? Forward this volume to a colleague, friend, or family member to subscribe.

Signals Over Noise

We scan the noise so you don’t have to — top 5 stories to keep you sharp

1) Anthropic confidentially files IPO prospectus with SEC

Summary: Anthropic confidentially filed its IPO prospectus with the SEC, days after closing a $65 billion Series H that put its valuation at $965 billion, ahead of OpenAI. 

Why it matters: Anthropic is racing to be the first major AI lab to hit public markets. Once it files publicly, audited revenue and compute costs will reset what every enterprise believes "real" AI economics look like.

2) Mayo Clinic and Microsoft collaborate to develop a frontier AI model for healthcare

Summary: Mayo Clinic and Microsoft announced a strategic collaboration to build a frontier AI model designed to synthesize diverse clinical data to support earlier diagnoses, more personalized treatment decisions, and better patient outcomes. Mayo Clinic will own the model, with Microsoft distributing it through Azure Foundry APIs. 

Why it matters: Two of the biggest brands in their categories are building a healthcare-specific frontier model that hospitals can plug into. If this lands, expect general-purpose AI to lose ground in clinical settings and purpose-built healthcare models to become the new procurement default.

3) Biggest Microsoft Build 2026 announcements — agentic AI, GitHub Copilot app, new MAI models, and more

Summary: At Build 2026, Microsoft unveiled Project Polaris, its in-house coding model that will replace GPT-4 Turbo as the default in GitHub Copilot starting August, alongside the Windows Agent Framework, multi-agent VS Code, and Copilot Workspace general availability. 

Why it matters: Microsoft is moving from "AI inside Microsoft products" to "AI infrastructure you don't see." For the millions of professionals working in Office, Teams, and GitHub every day, more of the routine work is about to be done by background agents you approve, not features you click.

4) AI adoption surges, but providers worry about deskilling

Summary: A new Wolters Kluwer Health survey found nearly three-quarters of doctors and 70% of nurses used AI at least once a week for work, while 74% of clinicians said losing critical thinking or decision-making skills will be one of the greatest risks of adopting AI. 

Why it matters: Adoption has officially outpaced governance, and clinicians are flagging the right risk. For any leader rolling out AI, the question is not whether your team will use it, but whether you are preserving the judgment muscle that has to override it when it is wrong.

5) AI CEOs from OpenAI, Anthropic, and Microsoft set aside their rivalry to warn Congress AI is making it too easy to design and create bioweapons

Summary: Dario Amodei, Sam Altman, and Mustafa Suleyman signed a public letter to Congress urging mandatory screening for purchases of synthetic DNA and RNA. Some manufacturers, including Twist Bioscience and Ansa Biotechnologies, also signed the letter, signaling industry support for the regulation. 

Why it matters: When direct competitors agree publicly that their own technology is dangerous, regulators move. Anyone working in life sciences, biotech, or specialty pharma should expect new screening requirements on synthetic biology supply chains, and the compliance window to be short.

Missed a previous newsletter? No worries, you can find them on the Archive page.

Founder's Corner

The Cost of Using AI Is Moving From Dollars to Discipline

For a few weeks I had been mapping out a roadmap for AI agents, some for my personal portfolio, some to automate the growth work behind Neural Gains Weekly. This was strategic, reasoning-heavy work, all of it in the regular chat app, the same place most people sit down to solve problems of their own. I was working through one problem after another, and the work just flowed. Plan a piece, get a response, sharpen it, move to the next. The rhythm of solving problems with AI was pure bliss.

Then I hit the session limit. The tool locked me out, the screen told me to wait, and the work stopped cold. The ground came out from underneath me, mid-thought, with nowhere to put the momentum. I was already paying for the $100-a-month plan, and the only way to keep going right then was to pay even more.

Why Your Sessions Run Out Faster Even as Tokens Get Cheaper

The session limit is a time problem, not a money problem. When you have ample time, these limits are easy to dismiss. You hit the ceiling, you shrug, you pick the work back up tomorrow. But the last six weeks for me have been work, travel, and presentations stacked back to back, and time was the one thing I did not have. When time is compressed, the wait stops being a footnote and becomes the obstacle, and you need another game plan.

The strange part is that this squeeze is happening while the raw cost of AI keeps falling. Epoch AI, a research group that tracks these prices, found that in recent years the cost to run a model at a given level of capability has been falling by a median of about 50 times a year, and for some tasks by as much as 900 times. If you have been following the token mechanics, that trend is no surprise. On a per-token basis, AI has never been cheaper.

So why does the same work run out my session faster than it did six months ago? Because the cheap part was never the constraint. The era of effectively subsidized, all-you-can-use access for at-home power users is quietly ending, and the limits are where you feel it first.

You could start to see this trend take shape in May, when the squeeze stopped being a feeling and showed up in the open. Anthropic doubled Claude Code's five-hour rate limits for its paid plans, removed the peak-hour slowdowns there, and raised Opus rate limits on the API, all of it announced alongside a SpaceX compute deal of more than 300 megawatts and over 220,000 GPUs. Those increases went to Claude Code and the Opus API, while the regular chat session limit was not on the list. The relief went where the revenue is.

These companies are not villains. They are businesses making historic bets on compute, and bets that size have to be paid back. The new capacity flows to the products and customers funding the buildout, which is exactly what you would expect any business to do. Seeing it that clearly is what turns the frustration into a plan.

Do Not Burn Your Best Model on Your Smallest Tasks

Once I started treating the limit as a budget rather than something to fight, the first move was obvious. Stop running my most powerful model on work that does not need it.

Most of what I do in a week is light work, like drafting a newsletter intro, running a meta-tag pass, or cleaning up formatting. It runs fine on a faster, lighter model and leaves my premium capacity for the problems that actually need deep reasoning. For a long time I left the strongest model selected for everything, the way you leave a light on in a room you have already walked out of, and that default drains a session faster than anything else.

The makers of these models say the same thing. Anthropic's own guidance notes that the heaviest model uses far more of your usage per turn, and advises switching up to it only when a task needs it.

Opus uses meaningfully more of your quota, so switch to it when you need it rather than leaving it on by default.

The fix takes about three seconds. Before I start anything, I ask whether it really needs the top model. If not, I drop a tier and save the heavy lifting for later.

Walk Into Your Hardest Work With a Full Tank

My second habit is about how much usage I have left when the hard work starts.

Despite the name, the five-hour window is not really five hours of work. Your usage is capped by tokens, and a token-heavy session can burn through that cap in an hour, which means your time at the keyboard can run out long before you are done thinking. So when I know a session will be intensive, I do not walk into it having already spent half my budget on small stuff.

I start it with a full tank and give my heaviest sessions a clean start.

That one change has done more for my output than any prompt trick. The deep work gets the room it needs, and I rarely hit the wall in the middle of the problem I care most about solving.

When You Hit the Wall, Route Around It

Prevention only takes you so far. Some nights you do everything right and still run out, which brings me back to that night with the agent plan.

Rather than pay to push through, I copied my context into ChatGPT, the plan so far and the open questions, and kept building there.

It was not seamless. I had to reorient the new model and rebuild a little of where I was, but within minutes the momentum came back. I finished that night without spending an extra dollar.

You almost certainly have access to more than one capable model already, between the free tiers of ChatGPT, Claude, and Gemini and whatever plan you pay for. What you build in one can move to another in a couple of minutes, so when one tool cuts you off, take the work elsewhere instead of paying to break back in.

Stay Deliberate, Keep Building

Through some of the busiest weeks I have had, three habits kept me building at full capability from home, without paying a cent more. The cost of using AI well is shifting from dollars to discipline, and discipline is the one part of this you fully control.

None of these habits are clever, which is the point. Anyone can pick them up. If you are feeling the same squeeze, start with the one that fits your week.

Discipline beats spend, and from a home setup that is most of the game. The math changes when the money is a company's instead of your own, which is exactly where the next Founder's Corner is headed. For now, keep experimenting and keep building.

Share Neural Gains Weekly with your network to help grow our community of ‘AI doers’. You can also contact me directly at admin@mindovermoney.ai or connect with me on LinkedIn.

AI Education for You

RLHF: The Hidden Process That Shaped Every Model You Use

What Is Actually Going On Here

Right now, when a professional opens ChatGPT or Claude or Gemini and asks for help writing a meeting summary, something invisible is shaping the system's output before the first word lands on the screen. The model is not just predicting the most statistically likely next word given its training data. It has been trained to favor responses that match what a human evaluator, somewhere, sometime, said was a good answer. That preference has been baked into the model's weights through a process most professionals have never heard of. It is the reason commercial AI feels useful instead of bizarre, and the reason the model sometimes tells you what you want to hear instead of what is true. The process is called RLHF (Reinforcement Learning from Human Feedback), and it is the most consequential fine-tuning method ever developed.

The Problem That Made This Necessary

In 2017, a team at OpenAI and DeepMind published a paper called "Deep Reinforcement Learning from Human Preferences." The lead author was Paul Christiano. Co-authors included Jan Leike, Shane Legg, and Dario Amodei. Amodei would later co-found Anthropic.

The problem was not language. It was reward functions. In reinforcement learning, you train a system to maximize a numerical reward. For chess, the reward is obvious: did you win? For most real-world tasks, no such number exists. The team showed that humans could rank pairs of outputs, a separate model could learn the pattern in those rankings, and that learned pattern could then provide a reward signal for the main system to optimize against.

Five years later, a different OpenAI team led by Long Ouyang applied this idea to language models. The result was InstructGPT. The same training method would soon power ChatGPT. The headline finding still shocks people: a 1.3 billion parameter InstructGPT model produced responses that human evaluators preferred over those from the 175 billion parameter GPT-3, despite having one hundred times fewer parameters. Alignment with human intent mattered more than raw size.

How It Actually Works

RLHF works in three stages, applied after the base model has finished pre-training.

Stage one is supervised fine-tuning. Human contractors write high-quality example responses to a set of prompts, demonstrating the kind of answer the lab wants the model to produce. The model trains on these prompt-and-response pairs using standard supervised learning, learning the basic shape of a helpful response.

Stage two builds something called a reward model. Human evaluators look at multiple responses to the same prompt, generated by the model from stage one, and rank them from best to worst. A separate machine learning system trains on those rankings and learns to predict, for any new response, what score a human evaluator would probably give it.

Stage three is the reinforcement learning step. The model from stage one generates responses, and the reward model from stage two scores them. The model is then updated to produce responses that the reward model rates highly. This loop runs continuously, gradually shaping the model to behave in ways that match the aggregate preferences of the human evaluators who created the ranking data.

The whole pipeline is essentially a way of compressing the judgment of a small group of humans into a process that can be applied to hundreds of millions of responses. The labs hire dedicated teams of contractors to do the ranking. The contractors' preferences become the reward model's predictions. The reward model's predictions become the model's behavior.

The tradeoffs are real. The Ouyang paper called this an "alignment tax." RLHF sometimes hurts performance on certain technical benchmarks compared to the raw pre-trained model. Labs accept this cost because the resulting model is dramatically more useful in conversation. Some of the cost can be engineered down. None of it can be fully avoided.

Where It Still Breaks

In 2023, a team at Anthropic led by Mrinank Sharma published a paper called "Towards Understanding Sycophancy in Language Models." It demonstrated that five state-of-the-art AI assistants from Anthropic, OpenAI, and Meta all exhibited sycophancy, a tendency to give responses that match user beliefs over truthful ones, across four different free-form text tasks.

The mechanism was traced directly back to RLHF. Human evaluators, on average, preferred responses that agreed with their stated views. The reward models learned this preference. The language models then optimized for it. The model is good at giving you what humans wanted in the training data. It is not good at recognizing when what humans wanted was wrong.

What This Means for How You Work With It

A few real shifts at work. First, treat smooth agreement from any commercial AI as a signal to push harder, not a sign you got the answer right. If the model echoes your framing instantly, ask it to argue the opposing case. Second, recognize that personality differences between Claude, ChatGPT, and Gemini are largely shaped by how each lab ran its human feedback process. Different evaluator pools, different instructions, different outcomes. Third, use this when evaluating vendors. When a vendor pitches a model as "more helpful" or "more accurate," ask what the post-training process looked like and whose preferences are baked into the model. The answer reveals more than the marketing.

How This Connects

Vol 3 introduced the role of labeled data in training, and the human ranking data behind RLHF is one of the highest-value forms of labeling in the industry. Vol 10 walked through how large language models are built from the ground up, and RLHF is the last major training stage before a model is deployed. Vol 28 already covered AI sycophancy through the lens of how it shows up in your work, and this volume explains why that behavior exists in the first place. Last week's Vol 36 introduced the framing that fine-tuning shapes behavior rather than adds knowledge. RLHF is the canonical example of that principle. Next week, Vol 38 closes the fine-tuning series with the decision framework: when do you reach for fine-tuning, when for retrieval, when for prompting, and how do you tell them apart at work.

Part 2 of 3 in the Fine-Tuning series.

Your 10-Minute Win

A step-by-step workflow you can use immediately

Research You Can Send

You spend twenty minutes researching something inside Claude or ChatGPT. A vendor comparison, a market scan, a summary of a new regulation. The answer is good. Then a colleague asks the same question next week, and you have nothing clean to hand them. The research is trapped in a private thread. You cannot easily share it, and you cannot find it again yourself.

The fix is to stop treating the chat as the finished product. The chat is the workspace. The thing you send is a separate, formatted artifact you publish from that workspace. Claude and ChatGPT both do this on their free tiers.

Why this matters: a shared thread is your whole messy back-and-forth. A published artifact is a clean, standalone page someone can read without seeing how the sausage was made.

The Workflow

1. Frame the research and pick your tool (1 minute). Decide what you are researching and who will read the result. Open Claude or ChatGPT. Both work; the only difference is the final share step.

2. Run the research as a structured brief (3 minutes). Do not ask a vague question. Ask for a shareable output from the start. Paste this:

Copy/Paste Prompt: "I am researching [YOUR TOPIC] so I can share a clear summary with [WHO WILL READ IT]. Produce a structured brief with: a one-paragraph summary, the four to six key points each with a short explanation, any caveats or open questions, and the sources you drew from. Write it so someone who was not part of this research can understand it on its own."

3. Format it as a standalone document (2 minutes). Turn the brief into something that reads cleanly outside the chat. Paste:

Copy/Paste Prompt: "Format that brief as a clean, standalone document with a clear title and section headers, suitable for someone outside this conversation to read."

In Claude, add "put it in an artifact" and the document opens in the side panel. In ChatGPT, the formatted brief appears in the chat.

4. Publish or share to get your link (2 minutes). In Claude, click Publish at the bottom of the artifact panel and copy the public link. Anyone can open it without a Claude account. In ChatGPT, click Share to generate a read-only link. The ChatGPT link shows the thread, so make your formatted brief the last message before sharing.

5. Title it, sanity-check it, and send it (2 minutes). Give it a clear title. Open the link in a private browser window to confirm it reads on its own and shows nothing you would not want a stranger to see. Then paste the link into your email, Slack, or message.

The Payoff

You walk away with a shareable link to a clean research artifact, not a buried chat thread. The same move works for a market scan you send your team, a regulation summary you send a colleague, or trip research you send your family. Research becomes something you can hand off instead of something that dies in your history.

The AI Concept You Just Used

Artifacts versus threads. A thread is the conversation. An artifact is a durable output the AI produces that can stand on its own. Claude makes this explicit with a publishable artifact panel; ChatGPT blends it into the chat with a shareable snapshot. Once you see the distinction, you stop screenshotting AI answers and start publishing them.

Transparency & Notes

  • Claude: publishing artifacts is available on the free plan, and recipients do not need a Claude account to view a published document.
  • ChatGPT: sharing a conversation creates a free, read-only link. Canvas availability on the free tier has shifted over time, so this workflow does not depend on it.
  • Published links are public. Anyone with the link can open it, and published Claude artifacts may be indexed by search engines. Do not publish anything containing PHI, confidential metrics, or NDA-protected material.
  • Open every share link in a private browser window before sending, to confirm what a recipient will actually see.

Enjoy this? Get it in your inbox every Tuesday.

Practical AI workflows. No hype. No spam. Just receipts.

Subscribe Free

Before you go...

Get one practical AI workflow in your inbox every Tuesday. Free. No spam. Just receipts.

Subscribe Free