26 May 2026 13 min read

Volume 35: Build the System Before You Need It

Frontier AI models now ship with million-token context windows as the floor, not the ceiling. Independent testing shows the number on the spec sheet is rarely the number you can rely on. Here is how context actually works in 2026 and what to use instead of raw size.

For the last four weeks, my travel and presentation schedule erased the time I normally use to build the newsletter. The newsletter has still shipped on time. Not because I worked harder, but because the AI systems I built when I had room to build them are doing the work I no longer have time to do manually.

🧭 Founder's Corner: Why the AI systems you build during calm stretches are the only thing that keeps the work shipping when life takes your calendar back.

🧠 AI Education: Why the million-token context race is mostly a vanity metric, and the three patterns of context that actually shape how enterprise AI works today.

✅ 10-Minute Win: Turn your unread pile of articles, PDFs, and saved transcripts into a queryable research notebook and a 15-minute audio summary you can listen to today.

Let's jump in.

Enjoying the weekly content? Forward this volume to a colleague, friend, or family member to subscribe.

Signals Over Noise

We scan the noise so you don’t have to — top 5 stories to keep you sharp

1) Anthropic set to hit $10.9 billion in revenue during second quarter, source says

Summary: Anthropic is on track to generate $10.9 billion in revenue during the second quarter and post its first profitable quarter. Anthropic generated $4.8 billion in revenue during the first quarter, more than doubling quarter over quarter.

Why it matters: For two years the AI industry has been defined by "burn cash now, profits later." Anthropic crossing into operating profit, even briefly, changes the conversation with skeptical leadership. It shows that at enterprise scale, AI is starting to pay for itself, not just promise to.

2) 100 things we announced at Google I/O 2026

Summary: At I/O 2026, Google launched Gemini 3.5 Flash, combining frontier intelligence with action and available across Antigravity, the Gemini API, and Android Studio, plus Gemini Omni for video creation, Universal Cart for agentic shopping, and a redesigned Gemini app. Gemini Spark, a 24/7 personal agent, rolls out next week to AI Ultra subscribers in the US.

Why it matters: If you live in Gmail, Docs, Search, or Android, an agentic version of Gemini is about to start doing tasks for you, not just answering questions. The way work happens inside Google products is shifting from "search and click" to "ask and approve."

3) Bristol Myers taps Anthropic's Claude for enterprise-wide AI adoption to speed R&D, global workflows

Summary: Bristol Myers Squibb signed a strategic agreement to deploy Claude as a "shared intelligence platform" across the drugmaker's global operations, putting Claude's advanced reasoning and agentic capabilities in the hands of more than 30,000 BMS employees. One application already in use involves Claude writing regulatory reports based on the company's clinical trial data.

Why it matters: A top-tier pharma is moving past AI pilots and putting agentic AI inside the regulated workflows where drugs actually get developed, submitted, and approved. For anyone in life sciences or healthcare-adjacent work, this resets the bar for what an enterprise AI strategy is supposed to look like in 2026.

4) AI-enabled medical devices may fail on real-world patients, report cautions

Summary: A new Paragon Health Institute report warns that AI medical devices can perform well during testing yet still fail when used on real-world patients whose medical images differ from the data used to train the underlying models. The report recommends a voluntary "Digital Similarity Analysis" that compares a patient's image against the device's training data before use.

Why it matters: AI tools that look strong in pilot studies can quietly underperform once they reach a different patient population. For anyone working in clinical operations, the lesson is that vendor demos and FDA clearance are not the same thing as proven performance on the patients walking through your door.

5) OpenAI prepares confidential IPO filing

Summary: OpenAI is preparing to confidentially file a draft IPO prospectus with the SEC, with Goldman Sachs and Morgan Stanley leading the process and a public market debut targeted for September 2026 at a current private valuation around $852 billion. Confidential IPO filings are typically submitted a couple of months before publicly available S-1 filings.

Why it matters: A public OpenAI will eventually have to disclose audited revenue, compute costs, and its economic split with Microsoft, the data that has been the AI industry's biggest blind spot. For anyone budgeting AI spend or making the case for AI to leadership, those numbers are about to reshape what "reasonable" looks like.

Missed a previous newsletter? No worries, you can find them on the Archive page.

Founder's Corner

Time Is the Most Important Currency

I usually create the newsletter on weeknights, between 7 and 10 PM, after my day winds down. For the last four weeks, that window has been mostly gone. My travel and presentation schedule has been hectic, which means the content I would have written on a Tuesday night is now getting written on a Saturday. That kills my weekend flexibility and the time I had been using to build with AI. And with major life changes coming in November, I have been thinking more and more about the importance of time.

Time is the most important currency, and I had read that line for years without ever really feeling it. This stretch made me feel it. Every hour over the last four weeks has been spoken for, and the newsletter has still kept shipping. That outcome is not because I have been working harder. It is because the AI systems I built earlier, when I had room to build them, are doing the work I no longer have time to do manually. They are how you bank time when the rest of your calendar is taken.

Lessons From 30,000 Feet

I was on a Friday morning flight home to Orlando from Las Vegas, where I had presented the day before. I had not touched the newsletter all week because of the travel, and that weekend was the only window I had left to do the work. The long flight was my last shot at protecting my time.

So I opened the laptop and started working through the Claude skills I had built. By the time we descended into Orlando, the next issue was 90 percent done.

Two concepts landed during the execution, both of them things I had said for months without feeling them at this depth. The first was that AI is best used as a system that solves a problem, not a prompt that automates a task. I had been writing and saying that in various ways, but watching the workflow run on a tray table while the rest of my week was already spent was the first time it felt real for me. It was the first experience where I noticed getting my time back during a period of personal chaos.

The second was time. The hours the system was buying back compounded into a weekend with flexibility, and the sleep I had not been getting all week. Productivity advice never seems to capture this. Time is the only currency none of us get more of, and my AI-powered content creation system is the only thing in my stack that has helped me bank it without sacrificing anything.

When Life Gets Busy

This is not just my story. Every professional eventually hits a stretch where the time to think and the time to build both shrink at the same moment. The trigger looks different for everyone, whether that is travel, illness, a major project, or a child entering a more demanding phase. The pattern is the same. Real life arrives all at once and takes the room you used to have.

What I felt on the flight is showing up in the data. Microsoft's 2026 Work Trend Index reports that 66 percent of AI users surveyed say AI has given them more time for high-value work. That figure came from a Microsoft survey of 20,000 AI-using knowledge workers across ten markets. Professionals who have wrapped systems around AI are getting their hours back.

AI returns time when it runs as a system. Treating it as a chatbot or a smarter search engine sharpens a task but does not bank an hour. Real life is what happens when there is no surplus to optimize, only the existing output to preserve. The system you build when you have time is what keeps you productive and shipping when you do not.

Getting Your Time Back

Start with one question: where is your time leaking right now? Asking what task AI can do for you stops at the task, but many workers never get to a starting question at all. Per Gallup's most recent workplace survey, 49 percent of U.S. workers report they never use AI in their role. The standard AI pitch did not meet those workers where they were, because it never asked where they were losing time in the first place.

Your first system starts with a problem you already need help with. It could be the recurring client email, the weekly team summary, or the slide deck you rebuild from the same data each month. Write down your problem statement, the inputs, what good looks like, and hand that context over to the AI tool you already use. Run it once. Adjust it. Run it again. Iterate until you are comfortable with the output, then build a system that can replicate that output on an ongoing basis.

This is when compounding starts and you can feel the difference in your AI usage. A system you have used ten times has paid you back ten times over, with no extra labor on your part. Compounding is the entire case for building an AI system.

If you have not started yet, that is fine. No one is an AI expert, including the people who have been building with it for months. We learn by experimenting and thinking differently. Finding ways to get time back is a grounding concept for all of us, and the perfect entry point to your first AI system. The only thing that matters is starting.

Share Neural Gains Weekly with your network to help grow our community of ‘AI doers’. You can also contact me directly at admin@mindovermoney.ai or connect with me on LinkedIn.

AI Education for You

Tokens and Context Windows in 2026: What Has Changed and Why It Matters Now

The Number That Tells the Story

When Vol 6 introduced tokens and context windows in November 2025, the mainstream chat experience capped around 128,000 to 200,000 tokens. Gemini had pushed to 1 million as the outlier. Six months later, in May 2026, 1 million is the floor for frontier models, not the ceiling. Claude Opus 4.7, Gemini 3.1 Pro, Qwen 3.6 Plus, and Llama 4 Maverick all ship with 1M tokens as standard. GPT-5.4 supports up to 1 million experimentally in Codex, with a surcharge above 272,000 tokens. Gemini 1.5 Pro supports 2 million. Meta's open-weight Llama 4 Scout has held 10 million since April 2025. The mainstream frontier roughly quintupled in half a year.

If that were the whole story, the industry would have solved the context problem. Most users believe it did. The reality is more complicated, and the part that matters for how you actually use AI at work has almost nothing to do with the number on the spec sheet.

What the Race Did Not Deliver

Marketing departments wrote a clean narrative: more context equals better outputs. Upload your entire codebase. Feed in the full contract. Watch the model synthesize everything in one shot.

Independent testing tells a different story. Every major model shows meaningful accuracy degradation when the answer to a question sits in the middle of a long context, compared to the same answer placed at the beginning or end. The pattern, documented by Stanford researchers in 2023 and named "Lost in the Middle," has not been closed by three years of aggressive scaling. Larger windows show worse degradation, because larger windows mean more middle to get lost in. On OpenAI's MRCR benchmark, which tests how well models locate specific information across long contexts, Anthropic's published results for Claude Opus 4.6 show about 93 percent recall at 256,000 tokens and roughly 76 percent at 1 million. That is the most reliable model on the benchmark at the time of publication, and even it drops 17 points over the range.

The number on the product page is rarely the number you can rely on. Effective context (what the model can reason over reliably) lags advertised context, and the gap widens as the window grows.

How Context Actually Works Now

Three patterns of context consumption matter in 2026. Only one resembles what Vol 6 described.

Chat pattern. You paste documents into a chat window, the model loads everything at once, you ask a question. This is the pattern most consumer users default to. It is also where lost-in-the-middle hits hardest, where latency at maximum context can stretch into the minutes, and where a 900,000-token input on Claude Opus 4.7 costs about $4.50 before any output.

Agent pattern. Covered in the AI Agents series (Vol 24-27). An agent does not load everything at once. It builds context iteratively across a sequence of actions: search, read, plan, retrieve more, decide. Each step consumes a fraction of the window. Prompt caching (now standard on Claude, Gemini, and GPT) lets the agent reuse stable portions of context across steps at up to 90 percent less cost. The bottleneck is not how big the window is. It is how well the agent decides what to load next.

Grounded retrieval pattern. Covered in Vol 30 with Copilot's architecture. The system retrieves only the relevant pieces from a larger index, then runs the prompt against a tight context. The window stays small by design. The intelligence is in what gets retrieved, not in what gets stuffed in.

The chat pattern is what most marketing campaigns optimize for. The other two are what most enterprise AI actually runs on. That makes raw window size a vanity metric for the use cases that matter most at work.

What This Means for How You Work With AI Now

Stop choosing models by advertised context size. If the task involves real reasoning over a long document, prefer the model with the strongest recall at the length you actually need.

Structure long prompts for retrieval. Put the question and the most critical context at the beginning or the end. Avoid burying the key fact in the middle of a 50,000-word document and asking the model to find it. If you must work with large material, ask the model to extract the relevant sections first, then run your analysis on the extract.

When an enterprise tool feels surprisingly capable on a task it should not be able to handle, the explanation is usually retrieval, not raw context. Copilot does not load your entire SharePoint. It searches the index, finds what matches, and answers from that subset. The architecture is doing the heavy lifting, not the window.

How This Connects

Vol 6 introduced tokens and context windows as foundational concepts. Vol 8 showed how context shaped daily use at work. The AI Agents series (Vol 24-27) showed how agents consume context across multi-step actions. Vol 30 explained how Copilot uses Microsoft Graph to ground responses in retrieved data. This Flashback closes the loop. The fundamentals from Vol 6 and 8 still apply exactly as they did. The arms race that followed turned out to be less important than the architectural patterns built on top.

Vol 36 begins the Fine-Tuning series. Fine-tuning, RAG, and prompting are three different levers for output quality. Context window size is not one of them, because how you use context matters more than how big the window is.

Flashback to Vol 6 and Vol 8.

Your 10-Minute Win

A step-by-step workflow you can use immediately

Your Personal Research Assistant

You save articles you mean to read, download PDFs from conferences, and stack up transcripts from podcasts and YouTube. The pile grows. The reading does not. The information might be useful someday, but someday never comes because you cannot remember what is in any of it.

NotebookLM solves this problem. You may remember NotebookLM from Steal My Prompt Vol 8, where we used it to organize research for a long-form piece. Today we go structural and build a personal knowledge base from documents you already own. We treat NotebookLM as a research assistant that only knows what you have given it.

Why this matters: open chatbots search the public web and synthesize whatever they find. NotebookLM does the opposite. It refuses to answer beyond your uploaded sources, and it cites the paragraph it pulled the answer from. The result is research you can actually verify.

The Workflow

1. Open NotebookLM and create a new notebook (1 minute). Go to notebooklm.google.com. Sign in with your Google account. Click "Create new notebook." Free tier covers everything in this workflow.

2. Upload three to five sources (3 minutes). Click "Add source." NotebookLM accepts PDFs, Google Docs, Google Slides, websites, YouTube videos with transcripts, and pasted text. Pick documents from a single topic you care about right now: a clinical area you are following, a project you are scoping, a book you want to internalize.

3. Ask your first three questions (2 minutes). Each answer cites the exact paragraph from your sources, so verification takes one click.

Copy/Paste Prompt: "What are the three most important points across all of my uploaded sources, and where in each document did you find them?"

4. Generate an Audio Overview (2 minutes). Click "Audio Overview" in the right panel. NotebookLM creates a 10 to 15 minute conversational podcast where two AI hosts discuss your sources. Useful for commute listening, gym time, or sharing highlights with someone who will not read the documents themselves.

5. Pin a Note Card and save the notebook (2 minutes). Click any helpful answer and select "Save as note." Notebooks persist across sessions. Come back next month, add new sources, and the assistant now knows the combined collection.

The Payoff

You walk away with a notebook you can return to, a 15-minute audio summary you can listen to today, and a clear sense of how source-grounded AI differs from open chatbot use. The unread pile becomes a queryable resource.

The AI Concept You Just Used

Source-grounded AI, sometimes called Retrieval Augmented Generation (RAG). The AI does not synthesize from training data or the open web. It retrieves from a source set you control and answers only from there. Every answer cites its source paragraph. The same concept underpins enterprise AI: legal AI grounded in case law, healthcare AI grounded in clinical guidelines, internal AI grounded in company policy. NotebookLM is the consumer-grade version of that pattern.

Transparency & Notes

NotebookLM is free with a Google account. Audio Overview is also free.
Source caps: 100 sources per notebook, 50 notebooks per account. More than enough for personal use.
Privacy guidance: Do not upload PHI, confidential business documents, or NDA-protected material. Treat NotebookLM as a tool for public sources, conference notes, and content you would not mind a colleague seeing.
Audio Overview voices are AI-generated. Use it as a discovery aid, not as a citation source itself.
Source grounding is strong but not perfect. If an answer matters, click through the citation and read the original.

From User to Builder