<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Another Coding Blog]]></title><description><![CDATA[Bringing you insights and education from 13 years of experience across AI, Data and Analytics ]]></description><link>https://www.anothercodingblog.com</link><image><url>https://substackcdn.com/image/fetch/$s_!2kzg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F615044d0-cdfb-47ac-9a1b-3883974114e7_1024x1024.png</url><title>Another Coding Blog</title><link>https://www.anothercodingblog.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 10 Jun 2026 00:45:08 GMT</lastBuildDate><atom:link href="https://www.anothercodingblog.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Taylor Ortiz]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[ortizt@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[ortizt@substack.com]]></itunes:email><itunes:name><![CDATA[Taylor Ortiz]]></itunes:name></itunes:owner><itunes:author><![CDATA[Taylor Ortiz]]></itunes:author><googleplay:owner><![CDATA[ortizt@substack.com]]></googleplay:owner><googleplay:email><![CDATA[ortizt@substack.com]]></googleplay:email><googleplay:author><![CDATA[Taylor Ortiz]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Cache-Aware Skill Design]]></title><description><![CDATA[How prompt caching, KV cache, and stable instruction modules can change the cost of agent workflows]]></description><link>https://www.anothercodingblog.com/p/cache-aware-skill-design</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/cache-aware-skill-design</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 08 Jun 2026 15:30:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5896a14e-2fe6-4782-84d6-67a6b27f94a1_1200x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Prompt caching is often described as a cost optimization.</p><p>If a model provider sees the same input tokens again, those repeated tokens may be processed at a lower cost. That description is accurate, but incomplete.</p><p><a href="https://developers.openai.com/api/docs/guides/prompt-caching">OpenAI&#8217;s prompt caching docs</a> describe cache hits as exact prefix reuse and recommend putting static content at the beginning of the prompt, with variable content near the end. A cache hit means the model server has recently processed the same prompt prefix during inference, so it can reuse stored model state for that matching portion of the input.</p><p>That detail matters for agent design.</p><p>Agents routinely send repeated context across turns and tasks: tool definitions, system prompts, Skill instructions, output contracts, examples, source-handling rules, conversation history, retrieved documents, and tool results.</p><p>Some of that context is stable. Some of it changes on every run.</p><p>Prompt caching can reward systems that separate those two categories cleanly.</p><p>A Skill with stable instructions, examples, and output rules can become a reusable prompt prefix. A Skill that places timestamps, run IDs, retrieved documents, or task-specific state before its stable instructions may reduce the opportunity for cache reuse.</p><p>The practical implication is straightforward:</p><p>A well-designed Skill does more than tell the model what to do. It also gives the model server a stable structure it can reuse.</p><p>This is why prompt caching should not be treated only as a pricing feature. For agents and Skills, prompt structure can become part of system architecture.</p><h2>Cached Tokens Mean Reused Computation</h2><p>The phrase &#8220;cached tokens&#8221; can make prompt caching sound like text storage.</p><p>That framing misses the mechanism.</p><p>The model server is not caching a response. It is checking whether the new request begins with a prefix it has already processed. When the prefix matches, the server can reuse stored model state for that matching portion of the input.</p><p>The same OpenAI docs also recommend placing static content at the beginning of the prompt and variable content near the end.</p><p>That recommendation is the first design rule.</p><p>Stable material belongs early:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ae27a93e-2062-430c-aa30-113236d46f72&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[system instructions]
[tool definitions]
[Skill instructions]
[output contract]
[examples] </code></pre></div><p>Variable material belongs later:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d1fb2f4c-bc20-4ede-a594-a0fcb7da8930&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[current task]
[retrieved documents]
[tool results]
[timestamps]
[run IDs] </code></pre></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xwJz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xwJz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 424w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 848w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1272w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png" width="1456" height="760" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:760,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Prompt Caching visualization&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Prompt Caching visualization" title="Prompt Caching visualization" srcset="https://substackcdn.com/image/fetch/$s_!xwJz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 424w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 848w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1272w, https://substackcdn.com/image/fetch/$s_!xwJz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c2dff19-41f8-4abe-94a4-2d35f1f37951_1681x878.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Prompt caching starts with prefix alignment. If the new request begins with the same token pattern, the serving layer can reuse cached state. If the beginning changes, the reusable prefix can collapse, even when later parts of the prompt look familiar.</figcaption></figure></div><p>The important word is prefix.</p><p>Prompt caching does not usually search the whole prompt for similar meaning. It does not see that two prompts both mention the same document, paragraph, or phrase and automatically reuse that work wherever it appears. The cached state depends on the exact token sequence, its order, and where that sequence appears in the prompt.</p><p>That makes small layout choices matter.</p><p>A timestamp at the top of the prompt can change the prefix.</p><p>A random run ID can change the prefix.</p><p>A retrieval system that inserts source chunks before the stable Skill body can change the prefix.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>A tool description that includes dynamic runtime state can change the prefix.</p><p>Each of those choices may be reasonable in isolation. Together, they make the beginning of the prompt less stable. That reduces the amount of work the serving layer can reuse.</p><p>For agent systems, this is the practical consequence:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;bc4e20a9-8437-4b4d-b4ab-7824d63807b3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt caching rewards stable beginnings. </code></pre></div><p>The stable part of the agent should be early. The variable part should be later.</p><h2>What Is Actually Being Cached?</h2><p>Prompt caching is easier to understand if we separate four layers:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5b8887fa-24d2-4fef-969f-f1f3df6ce0f5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">tokens
attention
KV cache
prompt cache </code></pre></div><p><strong>Tokens</strong> are the units the model processes. The prompt is not handled as raw prose and is instead broken into tokens first.</p><p><strong>Attention</strong> is the mechanism the model uses to relate those tokens to one another.</p><p><strong>KV cache</strong> is the stored attention state created while the model processes tokens.</p><p><strong>Prompt cache</strong> is the serving-layer feature that can reuse that stored state when a later request starts with the same prefix.</p><p>The confusing part is the word &#8220;key.&#8221;</p><p>In normal software, a cache usually has a key and a value:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d5004d9f-966d-4b15-b2a5-c091f4fa9df7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">cache[key] = value </code></pre></div><p>Prompt caching has something like that too:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;12bae24c-654c-47ec-b5d3-cdca02d38f0e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">prompt_cache[hash(exact_token_prefix)] = stored_model_state </code></pre></div><p>But the &#8220;K&#8221; in KV cache is not the hash used to look up a cached prompt prefix.</p><p>The <a href="https://arxiv.org/abs/1706.03762">original Transformer paper</a> defines attention over queries, keys, and values. That is where the terminology comes from. In the KV cache, the K is an attention key and the V is an attention value. They are internal tensors created by the model during inference, not the lookup key and value of a normal software cache.</p><p>That distinction matters.</p><p>A simplified version looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c0377c9c-8ce8-4be6-9b09-62c43598b620&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache lookup key: 
hash(exact token prefix)  

Cached value: 
attention key tensors + attention value tensors </code></pre></div><p>When we say prompt caching reuses KV cache, we are not saying the model is doing a database lookup where prompt text maps to an answer.</p><p>We are saying the serving layer can find a matching prompt prefix and reuse the key/value attention state the model already computed for that prefix.</p><p><a href="https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms">Sebastian Raschka&#8217;s KV cache walkthrough</a> gives a concrete inference example: as a model generates one token at a time, it can reuse previously computed key and value vectors instead of recomputing them at each step.</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe the cross-request version: processed requests leave behind KV-cache blocks, and later requests with the same prefix can reuse those blocks instead of recomputing them.</p><p>That is the bridge between the API feature and the model internals.</p><p>The API exposes the result as cached tokens. The serving system manages the cache. The model state being reused is tied to attention.</p><h2>Why KV Cache Exists</h2><p>KV cache exists because generation is sequential.</p><p>A model does not write an entire response at once. It generates one token, appends that token to the context, then generates the next token.</p><p>A simplified sequence looks like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a9f520e0-ec10-484d-9908-e2ffb3dc238b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt:
Time 

Step 1:
Time &#8594; flies  

Step 2: Time 
flies &#8594; fast  

Step 3: 
Time flies fast &#8594; . </code></pre></div><p>At each step, the model needs access to the tokens that came before.</p><p>Without a KV cache, the model would repeatedly recompute the attention keys and values for tokens it had already processed.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;28adac1e-5c31-4d19-b750-a711fd4541a3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Step 1: 
compute K/V for &#8220;Time&#8221;  

Step 2: 
compute K/V for &#8220;Time&#8221; again 
compute K/V for &#8220;flies&#8221;  

Step 3: 
compute K/V for &#8220;Time&#8221; again 
compute K/V for &#8220;flies&#8221; again 
compute K/V for &#8220;fast&#8221; </code></pre></div><p>That is wasted work.</p><p>With KV cache, the earlier tokens do not need to be recomputed every time.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c274194e-5d19-41b4-936f-9e4193d9238a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Step 1: 
compute K/V for &#8220;Time&#8221;
store it  

Step 2: 
reuse K/V for &#8220;Time&#8221;
compute K/V for &#8220;flies&#8221; 
store it  

Step 3: 
reuse K/V for &#8220;Time&#8221; 
reuse K/V for &#8220;flies&#8221; 
compute K/V for &#8220;fast&#8221; 
store it </code></pre></div><p>This is the basic inference-time benefit of KV cache.</p><p>It does not make the model smarter. It does not change the answer. It reduces repeated computation.</p><p>Sebastian Raschka&#8217;s <a href="https://magazine.sebastianraschka.com/p/coding-the-kv-cache-in-llms">KV cache walkthrough</a> gives the clean version of this example: during autoregressive generation, the model would otherwise recompute key and value vectors for earlier tokens at each step.</p><p>Prompt caching extends this idea across requests.</p><p>Within one response, KV cache lets the model reuse state from earlier tokens in the same generation.</p><p>Across requests, prompt caching lets the serving layer reuse state from a previous request when a new request starts with the same prefix.</p><p>That is the bridge we need for agents.</p><h2>Prompt Caching Extends KV Reuse Across Requests</h2><p>KV cache usually starts inside a single generation.</p><p>The model processes a prompt, creates key/value attention state, and reuses that state as it generates the next token, then the next token, then the next.</p><p>Prompt caching moves the reuse boundary.</p><p>Instead of reusing prior token state only inside one response, the serving layer can reuse state from a previous request when a new request starts with the same prefix.</p><p><strong>Request 1:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;85d5e85f-4c59-4ca3-a128-210a35dbfbdb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill instructions] 
[stable output contract] 
[stable examples] 
[current task A] </code></pre></div><p><strong>Request 2:</strong></p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5f22073e-f2bf-4d4b-a254-d28db1b425e4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill instructions] 
[stable output contract] 
[stable examples] 
[current task B] </code></pre></div><p>The beginning is the same.</p><p>The model server does not need to process that shared prefix as if it were new every time. It can reuse the state computed when it processed the earlier request, then continue from the new suffix.</p><p>That is the practical bridge between KV cache and prompt caching.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;9f6db18f-bfdf-41a0-9e68-2d9c5ab94ec4&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Within one response:
reuse prior token state from the same generation

Across requests:
reuse prior prefix state from an earlier request </code></pre></div><p>The API usually hides the details. You see the result as cached input tokens, lower cached-token pricing, or lower latency when the cache hit affects the prefill path.</p><p>The implementation underneath is still about model state.</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe this directly: the system caches KV-cache blocks from processed requests and reuses those blocks when a new request has the same prefix.</p><p>That is why prompt caching is not only a billing abstraction. It is an inference-serving optimization exposed through the API.</p><h2>Why Prefix Order Matters</h2><p>Prompt caching is strict about order.</p><p>The cache does not look for familiar words scattered throughout the prompt. It looks for a matching beginning.</p><p>That means these two prompts are not equivalent:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;85754524-8b09-4022-9f68-b1c3b19ea11d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[stable Skill instructions]
[dynamic source documents] 
[current task]  

Request 2: 
[stable Skill instructions] 
[different source documents] 
[different task] </code></pre></div><p>In both requests, the stable Skill instructions come first. The shared prefix is intact.</p><p>Now compare that with this layout:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6e027784-d7f8-4dd7-9171-e517b45dac43&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[timestamp A] 
[dynamic source documents A] 
[current task A] 
[stable Skill instructions]  

Request 2: 
[timestamp B] 
[dynamic source documents B] 
[current task B] 
[stable Skill instructions] </code></pre></div><p>The stable Skill instructions are still present, but they are no longer the beginning of the prompt.</p><p>The prefix changed before the reusable material appeared.</p><p>That is the failure mode.</p><p>This is why a small amount of dynamic text at the top of the prompt can matter. A timestamp, run ID, tool result, or changing retrieval block can move the entire request out of alignment.</p><p>The model server may still receive the same Skill body later in the prompt. But for prefix caching, later is often too late.</p><p><a href="https://developers.openai.com/api/docs/guides/prompt-caching">OpenAI&#8217;s prompt caching docs</a> make the design rule explicit: static content should go near the beginning of the prompt, and variable content should go near the end.</p><p>For agents, that becomes a concrete layout rule:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;d689cb75-2e44-4246-9f30-6ba2fb2684d6&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable first. 
Dynamic second. </code></pre></div><p>That rule is simple, but it changes how an agent harness should be written.</p><p>Tool definitions, system instructions, Skill bodies, output contracts, examples, and validation rules should be stable and early.</p><p>Retrieved documents, timestamps, run IDs, tool outputs, and task-specific state should be later.</p><p>The serving layer can only reuse the prefix you actually give it.</p><h2>Methods for Prefix Caching</h2><p>The simple version of prompt caching is easy to say:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;7f84d0ba-fa13-402f-8b52-5528aed2a4fa&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">same prefix &#8594; reuse cached state </code></pre></div><p>The implementation is more complicated.</p><p>A serving system has to answer several questions before reuse can happen:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;2909c7ce-e0e9-492b-8c90-ca367e692221&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">How do we identify a matching prefix?
How do we store the KV state?
How do we route future requests back to the right cache? 
What happens when memory fills? 
Can we reuse anything beyond the prefix? </code></pre></div><p>There is a family of methods to solve this.</p><h3>Exact-prefix reuse</h3><p>This is the basic case.</p><p>Two requests start with the same token sequence. The serving layer identifies the shared beginning and reuses cached state for that prefix.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;11bffc91-a721-4887-b46b-f422940f49dd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Request 1: 
[stable system prompt]
[stable Skill][task A]  

Request 2: 
[stable system prompt]
[stable Skill][task B] </code></pre></div><p>The shared prefix is:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b4508af5-117f-4d38-b96b-d84517596c06&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable system prompt][stable Skill] </code></pre></div><p>That is the part the system can reuse.</p><p>This is the model most API users need to understand first. If the beginning changes, the cacheable prefix shrinks or disappears.</p><h3>Block-hash prefix caching</h3><p>A serving system does not need to treat the prompt as one giant cache entry.</p><p>It can split the prompt into blocks.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f4f4baa3-6e5c-4e00-bce1-f57df945e3b2&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Block 1: tokens 1-128
Block 2: tokens 129-256
Block 3: tokens 257-384 </code></pre></div><p>Each block can be associated with a hash. The hash can include both the block itself and the prefix that came before it.</p><p>That lets the system find the longest matching chain.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5a04ac09-3795-468c-8372-b9737ee53f5b&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Block 1 matches
Block 2 matches
Block 3 changes </code></pre></div><p>In that case, the server can reuse blocks 1 and 2, then recompute from block 3 onward.</p><p>This is why order matters. A block is not only &#8220;these tokens.&#8221; It is &#8220;these tokens after this prior prefix.&#8221;</p><p><a href="https://docs.vllm.ai/en/stable/design/prefix_caching/">vLLM&#8217;s prefix-caching docs</a> describe this kind of design: processed requests leave behind KV-cache blocks, and later requests with the same prefix can reuse those blocks instead of recomputing them.</p><h3>Paged KV cache</h3><p>KV cache can get large.</p><p>Long prompts create large key/value state. Long-running agents create even more. Multiple concurrent users make the problem worse.</p><p>Paged KV cache treats cached state more like memory pages than one continuous allocation.</p><p>That matters because the serving system needs to allocate, reuse, share, and evict KV state efficiently. Without that, memory fragmentation and wasted GPU memory can become bottlenecks.</p><p>For a builder, the main point is simple:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;58aadad8-b3eb-42ec-bac9-95cb9bf636d3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Prompt caching is not only a matching problem. 
It is also a memory-management problem. </code></pre></div><h3>Prefix trees and radix caching</h3><p>Some workloads share a common root and then branch.</p><p>Agents do this constantly.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;0cd88e4a-40e0-4efe-af5a-cb3400fb4584&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">shared agent harness   
&#9500;&#9472;&#9472; research Skill   
&#9474;    &#9500;&#9472;&#9472; task A   
&#9474;    &#9492;&#9472;&#9472; task B   
&#9492;&#9472;&#9472; coding Skill        
     &#9500;&#9472;&#9472; task C    
     &#9492;&#9472;&#9472; task D </code></pre></div><p>A prefix tree stores shared beginnings once, then branches when the prompts diverge.</p><p><a href="https://lmsys.org/blog/2024-01-17-sglang/">SGLang&#8217;s RadixAttention</a> uses this kind of idea. It organizes reusable prompt state in a radix tree so shared prefixes can be found, reused, inserted, and evicted more efficiently.</p><p>This maps well to agent systems because agents are not random one-off prompts. They often reuse the same harness, then branch by Skill, task, tool, or phase.</p><h3>Cache-aware routing</h3><p>A cache hit only helps if the request reaches the place where the cached state lives.</p><p>In a distributed serving system, there may be many workers. One worker may have the cached prefix. Another may not.</p><p>If the next request lands on the wrong worker, the system may have to recompute the prefix or move cache state across machines.</p><p>That is why routing matters.</p><p>Application design gives the serving layer stable prefixes. Routing decides whether later requests reach the cache that already holds them.</p><h3>Cache eviction</h3><p>Caches cannot keep everything forever.</p><p>KV cache consumes memory, and GPU memory is expensive. The serving layer has to decide what to keep and what to evict.</p><p>Simple eviction policies may keep recent cache entries and discard older ones. More advanced policies may consider which prefixes are likely to be reused, how large they are, and how expensive they are to recompute.</p><p>This matters for agents because not all prompt sections have equal reuse value.</p><p>A stable Skill body may be reused thousands of times.</p><p>A one-off tool result may never be reused.</p><p>A cache-aware system should prefer keeping the first kind.</p><h3>Beyond-prefix reuse</h3><p>Most production prompt caching is built around exact prefixes.</p><p>But agent workloads are messier than that.</p><p>The same document chunk may appear in different positions. The same source may be reused across turns. The same tool result may show up in another branch of the workflow.</p><p>Classic prefix caching will not always catch that.</p><p>Newer work is exploring whether reusable KV state can be recovered from repeated segments, not just repeated beginnings. That is a harder problem because the model state for a segment depends on what came before it.</p><p>For now, the practical rule remains:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;6c9e05fd-5bb0-45ff-a6c0-99944a4e8aed&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Design for prefix caching first. </code></pre></div><p>Put stable content at the beginning. Keep it stable. Move dynamic context later.</p><p>The serving systems will keep getting better. But the builder can already do the most important thing:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;ebc7a800-e255-4d8d-9e12-d929f3b7fc03&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Give the cache a stable prefix to reuse. </code></pre></div><h2>Skills as Cacheable Instruction Modules</h2><p>In this post, a Skill means a reusable instruction module.</p><p>That could be a Claude Skill. It could be a Markdown file in an agent repo. It could be a prompt module loaded by Codex, a tool-specific operating procedure, or a workflow template inside an internal agent platform.</p><p>In most agent systems, a Skill eventually becomes text in the request:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;5bdfa257-ed56-49d7-ad35-4a4b940158e1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[Skill purpose] 
[when to use it] 
[workflow] 
[output contract] 
[examples] 
[validation rules] </code></pre></div><p>That text is usually stable.</p><p>The user task changes. The retrieved documents change. Tool results change. Run state changes.</p><p>But the Skill body often stays the same.</p><p>That makes Skills natural cache candidates.</p><p>A Skill is already meant to be reused at the instruction level. Prompt caching adds a second kind of reuse: the serving layer may be able to reuse the model state created from those same instructions.</p><p>That only works if the Skill is placed where the cache can use it.</p><p>A Skill loaded after dynamic context is still useful to the model, but it may not be useful to the prompt cache.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c612e7ac-7bfa-47af-9027-0dc109e45f92&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache-hostile layout: 
[dynamic docs] 
[current task] 
[Skill body]  

Cache-aware layout: 
[Skill body] 
[dynamic docs] 
[current task] </code></pre></div><p>The content is the same. The cache behavior can be very different.</p><p>This is the design implication:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;20b0d0fd-6f7a-42f5-8e8a-44a3c12505c5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">A Skill should not only be reusable as an instruction. 
It should be positioned as reusable prefix. </code></pre></div><p>That does not mean every Skill should be loaded all the time. Large unused Skills create their own cost and context problems. Anthropic&#8217;s Skill system uses progressive disclosure: lightweight metadata helps the model decide whether a Skill is relevant, then the full Skill and supporting resources load only when needed.</p><p>That pattern still fits the caching argument.</p><p>Once a Skill is selected, its stable body should remain stable. Its dynamic inputs should come later.</p><h2>Cache-Aware Skill Design</h2><p>The design pattern is simple:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;230511c5-e12e-4f85-b286-441bc5c6dfaf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Cache the workflow. Vary the inputs.</code></pre></div><p>A Skill usually contains the stable task frame:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4fcb3e0e-c5d4-44fc-a093-c776bc3ee1d9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">purpose 
workflow 
output contract 
examples 
citation rules 
validation checklist 
source-handling rules </code></pre></div><p>The current run supplies the changing inputs:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c48edbb0-d4f9-4577-a154-ad68cdc66739&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">user request 
retrieved documents 
current files 
tool outputs 
timestamps 
run IDs 
temporary constraints</code></pre></div><p>Those two categories should not be mixed casually.</p><p>A cache-aware Skill keeps the stable task frame intact and places dynamic material after it.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;87164069-d72a-4696-b54d-270955286e2e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable Skill body] 
[dynamic sources] 
[current task input] </code></pre></div><p>A cache-hostile Skill puts changing material first.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;165df637-9db4-4c7e-80d2-95e19e27dd78&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[timestamp] 
[run ID] 
[dynamic sources] 
[current task input] 
[stable Skill body] </code></pre></div><p>This difference fundamentally changes what the model server sees as the reusable beginning of the request.</p><p>This does not mean every Skill should be loaded eagerly. Loading a large unused Skill just to make it cacheable can waste tokens. The better pattern is <a href="https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills">staged loading</a>.</p><p>First, keep a small, stable routing layer:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;9b12cf4b-2088-4ca3-9478-b567d540b26e&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">available Skills 
when to use each Skill 
short descriptions selection rules </code></pre></div><p>Then, once a Skill is selected, load the full stable Skill body before the dynamic task context.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;e1bc6499-72c2-406b-a53d-fb7bcfcecf6c&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable router] 
[selected stable Skill] 
[dynamic task context] </code></pre></div><p>That gives the system two possible layers of reuse:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;768a493e-4695-4647-ab07-cb21e3380808&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">the router can be stable across many calls 
the selected Skill can be stable across repeated uses </code></pre></div><p>This also helps with source diversity.</p><p>A research Skill may receive different articles every run. A repo Skill may receive different files. A data Skill may receive different schemas, queries, or results.</p><p>That variety belongs in the dynamic suffix.</p><p>The Skill should define how to use sources. The sources themselves should come later.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;3739fc06-30b8-41eb-a139-5d76e24dbfd9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[Skill: how to read and cite sources] 
[Sources: the actual documents for this run] </code></pre></div><p>The same applies to tools.</p><p>Tool definitions and tool-use rules should be stable. Tool results should be dynamic.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c1070b27-8541-4420-93db-8ceeefaeecfb&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">[stable tool definitions] 
[stable tool-use rules] 
[dynamic tool results] </code></pre></div><p>The goal is not to optimize the prompt for caching at the expense of the task. The goal is to avoid wasting cacheability by accident.</p><p>If two prompt layouts are equally good for the model, choose the one that gives the serving layer more stable structure to reuse.</p><h2>Benchmark Results</h2><p>The benchmark showed that stable instruction modules placed at the front of the prompt became reusable prefixes, producing far more cache hits and materially reducing estimated warm-request cost.</p><p>This result is not only about Skills as a product concept. It applies to any stable instruction module: a Skill, workflow template, tool procedure, rubric, output contract, or source-handling guide.</p><p>I used a synthetic Skill body rather than a platform-native Skill object so the test could isolate layout: stable instruction module first versus dynamic context first.</p><p>The benchmark compared four layouts:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;a839b50d-3f9e-4416-8c8e-eacceabd682a&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile
[timestamp][run ID][dynamic docs][task][stable Skill]  

stable_skill_first_cache_aware 
[stable Skill][dynamic docs][task][timestamp]  

stable_skill_first_deterministic_sources 
[stable Skill][dynamic docs ordered deterministically][task]  

dynamic_prefix_control 
[random run ID][stable Skill][dynamic docs][task] </code></pre></div><p>Before interpreting the results, I checked that the prompts were constructed correctly. The stable-first prompts started with the Skill body. The dynamic-first prompts started with changing content. The stable Skill body stayed byte-for-byte identical. No cold request was contaminated by a prior cache hit.</p><p>The cache-hit split was clean:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;091c584c-c7b5-48cb-8543-ff72e66dac84&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable Skill-first layouts: 
19 / 20 warm cache hits  

Dynamic-first layouts: 0 / 20 warm cache hits </code></pre></div><p>The token mix showed the practical difference.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;fb291a1a-e6ea-4ec9-be5e-1d8c231fa3e5&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile 
warm mean prompt tokens: 9,476.5 
warm mean cached tokens: 0 
warm mean fresh input tokens: 9,476.5  

stable_skill_first_cache_aware 
warm mean prompt tokens: 9,455 
warm mean cached tokens: 8,960 
warm mean fresh input tokens: 495 </code></pre></div><p>The same general amount of prompt context produced a different input profile. In the dynamic-first layout, every input token was processed fresh. In the stable Skill-first layout, most of the repeated instruction body became cached input.</p><p>Using OpenAI&#8217;s published GPT-4.1 mini prices at the time of writing, the estimated warm-request cost changed materially. The exact dollars are model- and date-specific, but the token economics are the point.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;4cffcd3e-f7af-4f54-ba3b-4c9d32ad07c0&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">dynamic_first_cache_hostile about $0.00385 per warm request  
stable_skill_first_cache_aware about $0.00116 per warm request </code></pre></div><p>That is roughly a 70% reduction in estimated warm-request cost for this synthetic benchmark.</p><p>The latency result was less clean. TTFT improved in the stable-first variants, but hosted API latency includes routing, queueing, server load, streaming behavior, network timing, and output generation. I would treat the latency numbers as directional, not guaranteed.</p><p>The stronger result is about cache eligibility and token economics:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;aab5db8b-6b43-4d98-a3eb-363010d3c0b7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Stable Skill-first layout:
high cache-hit rate 
high cached-token ratio 
low fresh-input-token count  

Dynamic-first layout: 
zero cache hits 
all input tokens processed fresh </code></pre></div><p>That is the design point. The Skill body was not only instruction text. In the stable-first layout, it became a reusable prefix the serving layer could cache.</p><h2>Closing</h2><p>Prompt caching started as a pricing detail for me.</p><p>It is not just that.</p><p>For agent systems, it changes the design question.</p><p>Not only:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;b955c9a0-c02b-4f12-b93e-5c127c532b37&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">What context should the model have? </code></pre></div><p>Also:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;fdd873a4-f3c9-4f61-8b9e-f55d52011fcd&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">Where does that context live? How often does it change? Can the serving layer reuse it? </code></pre></div><p>Skills make that question concrete.</p><p>A Skill is reusable guidance for the model. If it is written and positioned carefully, it can also become reusable work for the inference system.</p><p>That does not make Skills magic. It makes them a useful design boundary.</p><p>The stable part of the workflow can become the prefix.  </p><p>The changing inputs can become the suffix.</p><p>That will not be the right layout for every task. Some systems need dynamic routing, safety state, permissions, or retrieved evidence earlier in the prompt. Some frameworks will reorder or compress context before the provider sees it.</p><p>So the point is not to worship stable prefixes.</p><p>The point is to know when you are breaking one.</p><p>Prompt caching gives agent builders a new thing to measure: not just answer quality, not just total tokens, but whether repeated work is actually being reused.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 75]]></title><description><![CDATA[Claude writes 80% of Anthropic's code, Google rents SpaceX's GPUs, Microsoft breaks from OpenAI, New York moves to ban data centers, and hackers fooled Meta's AI]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-600</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-600</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 06 Jun 2026 22:19:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/3c1f94eb-e47f-4b95-ab16-32c302c188f5_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LCuf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LCuf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 424w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 848w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1272w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png" width="1456" height="1570" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1570,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:515707,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LCuf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 424w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 848w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1272w, https://substackcdn.com/image/fetch/$s_!LCuf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F43ebbb6c-0c27-48b4-a987-3e9bc8c1cfdf_2400x2588.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic raises, Google takes a slice of the space compute pie and Washington wants in.</h2><ul><li><p><strong>Anthropic raised $65B at a $965B valuation and filed to go public.</strong> It <a href="https://www.anthropic.com/news/confidential-draft-s1-sec">confidentially submitted a draft S-1</a> and <a href="https://techcrunch.com/2026/06/04/ahead-of-its-ipo-anthropics-daniela-amodei-shrugs-off-doubts-about-ais-returns/">pushed back on doubts about AI&#8217;s returns</a> ahead of the listing.</p></li><li><p><strong>Alphabet is raising about $85B to fund its AI buildout.</strong> The <a href="https://techcrunch.com/2026/06/03/alphabets-record-breaking-85b-raise-for-googles-ai-business-is-a-helluva-good-signal/">record equity offering</a> landed days after it <a href="https://techcrunch.com/2026/06/01/alphabet-plans-to-raise-80-billion-to-pay-for-ai-buildout/">signaled an $80B plan</a>.</p></li><li><p><strong>Google agreed to pay SpaceX $920M a month for roughly 110,000 GPUs.</strong> AirTrunk committed <a href="https://techcrunch.com/2026/06/05/airtrunk-commits-30b-to-build-5gw-of-ai-data-centers-in-india/">$30B for 5GW of data centers in India</a> and SoftBank pledged <a href="https://techcrunch.com/2026/05/30/softbank-says-it-will-invest-up-to-e75-billion-to-build-french-data-centers/">up to &#8364;75B for French data centers</a>, while the <a href="https://techcrunch.com/2026/06/05/google-will-pay-spacex-920m-per-month-for-compute/">Google-SpaceX deal</a> runs through 2029.</p></li><li><p><strong>The token bill came due.</strong> The Linux Foundation launched a <a href="https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the-industry-scramble-to-manage-ais-runaway-costs/">Tokenomics Foundation</a> to discipline AI spend, after <a href="https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/">Uber capped employee AI budgets</a> and <a href="https://techcrunch.com/2026/05/30/what-a-joke-github-copilots-new-token-based-billing-spurs-consternation-among-devs/">GitHub&#8217;s usage-based Copilot billing drew a developer revolt</a>.</p></li><li><p><strong>Washington wants a stake in OpenAI.</strong> Altman and the White House are <a href="https://www.cnbc.com/2026/06/05/trump-open-ai-altman-stake.html">in talks for the government to take equity</a>, with OpenAI floating donated shares to seed a public wealth fund and Trump saying the American public could &#8220;become a partner.&#8221; Altman separately <a href="https://www.reuters.com/business/openais-altman-urge-us-lawmakers-not-require-ai-model-approvals-2026-06-03/">lobbied against mandatory model approvals</a>.</p></li></ul><div><hr></div><h2>Microsoft started building its way out from under OpenAI.</h2><ul><li><p><strong>Microsoft shipped seven homegrown models, including its first advanced reasoning model.</strong> The <a href="https://www.theverge.com/tech/941664/microsoft-ai-model-reasoning-mai-thinking-1-build-2026">MAI lineup</a> was pitched as <a href="https://www.cnbc.com/2026/06/02/microsoft-unveils-new-ai-models-lessen-reliance-on-openai-lower-costs.html">a move toward self-sufficiency and lower developer costs</a>.</p></li><li><p><strong>Its AI chief said the company was &#8220;set free&#8221; from OpenAI to pursue superintelligence.</strong> Mustafa Suleyman framed independence as <a href="https://venturebeat.com/technology/microsoft-ai-chief-says-company-was-set-free-from-openai-to-pursue-superintelligence">the real project</a>, with models trained from scratch.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li><li><p><strong>It launched Scout, an OpenClaw-based assistant, and Project Solara, a platform for agent-first devices.</strong> Scout is <a href="https://www.microsoft.com/en-us/microsoft-365/blog/2026/06/02/introducing-microsoft-scout-your-always-on-personal-agent/">an always-on personal agent that works across Microsoft 365</a>; Solara is <a href="https://commandline.microsoft.com/project-solara-build-2026/">a chip-to-cloud platform for agent-first devices</a>.</p></li><li><p><strong>It is building a frontier health model with Mayo Clinic.</strong> The <a href="https://news.microsoft.com/source/2026/06/02/mayo-clinic-and-microsoft-collaborate-to-develop-a-frontier-ai-model-for-healthcare/">partnership</a> pairs Mayo&#8217;s clinical data with Microsoft&#8217;s AI, alongside a <a href="https://blogs.nvidia.com/blog/microsoft-build-windows-local-cloud-devices/">unified agentic stack with NVIDIA</a>.</p></li></ul><div><hr></div><h2>LangChain, Salesforce, and Anthropic shipped agent infrastructure, and hackers fooled Meta&#8217;s support AI.</h2><ul><li><p><strong>Anthropic turned Claude Code into a platform.</strong> It shipped <a href="https://claude.com/blog/a-harness-for-every-task-dynamic-workflows-in-claude-code">dynamic multi-agent workflows</a> and documented <a href="https://claude.com/blog/lessons-from-building-claude-code-how-we-use-skills">how it runs hundreds of internal skills</a>.</p></li><li><p><strong>LangChain built out the production-agent stack.</strong> Across the week: <a href="https://www.langchain.com/blog/designing-efficient-verifiers-for-legal-agents">efficient verifiers for legal agents</a>, self-correcting <a href="https://www.langchain.com/blog/introducing-rubrics-for-deepagents">Rubrics</a>, <a href="https://www.langchain.com/blog/model-neutrality">model neutrality</a>, <a href="https://www.langchain.com/blog/fault-tolerance-in-langgraph">fault tolerance in LangGraph</a>, and <a href="https://www.langchain.com/blog/give-your-ai-agent-its-own-computer">a sandboxed computer for every agent</a>.</p></li><li><p><strong>Salesforce and Google pushed agents past the pilot stage.</strong> Salesforce detailed <a href="https://www.salesforce.com/blog/ai-agent-production-tips/">what it takes to ship to production</a>, where one deployment cut conversation failure from 33% to 0.5%, while Google added <a href="https://research.google/blog/unlocking-dependable-responses-with-gemini-enterprise-agent-platforms-agentic-rag/">agentic RAG that keeps searching until it has enough context</a>.</p></li><li><p><strong>Then the bill for autonomy arrived.</strong> Hackers <a href="https://simonwillison.net/2026/Jun/1/hackers-simply-asked-meta-ai/">talked Meta&#8217;s AI support agent into handing over Instagram accounts</a>, an exploit <a href="https://www.technologyreview.com/2026/06/05/1138437/the-meta-hack-shows-theres-more-to-ai-security-than-mythos/">MIT used to show AI agents are too eager to please</a>. OpenAI shipped <a href="https://simonwillison.net/2026/Jun/5/openai-help-lockdown-mode/">Lockdown Mode</a> to cut the data-exfiltration leg of prompt-injection attacks.</p></li></ul><div><hr></div><h2>NVIDIA&#8217;s Nemotron anchored an open and fast model surge.</h2><ul><li><p><strong>Nemotron 3 Ultra landed on AWS and Perplexity.</strong> The 550B-parameter open MoE went <a href="https://aws.amazon.com/blogs/machine-learning/nvidia-nemotron-3-ultra-now-available-on-amazon-sagemaker-jumpstart/">one-click on SageMaker JumpStart</a> and <a href="https://x.com/perplexity_ai/status/2062976272436002825">live for Perplexity Pro and Max</a>. NVIDIA also shipped <a href="https://huggingface.co/blog/nvidia/nemotron-3-5-content-safety">a 4B safety model that reasons over custom policies</a>.</p></li><li><p><strong>Speed became the headline spec.</strong> Cerebras reported <a href="https://www.cerebras.ai/blog/which-is-faster-gemini-3-5-flash-or-kimi-k2-6-on-cerebras">Kimi K2.6 finishing a task in 5.6s to Gemini 3.5 Flash&#8217;s 17.5s</a>, and clearing 452ms time-to-first-token for real-time voice.</p></li><li><p><strong>Small models proved they are a design choice.</strong> A Hugging Face hackathon ran <a href="https://huggingface.co/blog/build-small-hackathon/thousand-token-wood-sim">a multi-agent economy on a 3B model</a>, and Holo3.1 brought <a href="https://huggingface.co/blog/Hcompany/holo31">fast local computer-use agents</a>.</p></li><li><p><strong>Alibaba zigged.</strong> Qwen3.7-Plus added multimodal inputs at low cost but <a href="https://venturebeat.com/technology/alibabas-qwen3-7-plus-supports-text-video-and-imagery-inputs-at-low-cost-of-0-4-1-6-per-1m-token-but-its-proprietary">shipped closed-source</a>, breaking from its open-weight history.</p></li></ul><div><hr></div><h2>The Empire Strikes Back</h2><ul><li><p><strong>New York moved on two AI bills.</strong> The legislature is <a href="https://www.politico.com/news/2026/06/02/new-york-one-year-data-center-moratorium-00946477">poised to pass a one-year data center moratorium</a>, which would be the first statewide ban if Gov. Hochul signs it, and passed <a href="https://www.nysenate.gov/legislation/bills/2025/S9051/amendment/B">a bill barring AI chatbots from posing as companions to kids</a> 60-0, now awaiting her signature.</p></li><li><p><strong>Courts are filling with AI-written filings.</strong> A study found <a href="https://www.technologyreview.com/2026/06/04/1138391/courts-coping-ai-lawsuits/">AI-flagged self-represented lawsuits are surging</a>. Florida <a href="https://techcrunch.com/2026/06/01/florida-sues-openai-sam-altman-in-first-of-its-kind-lawsuit-over-violent-incidents/">sued OpenAI and Altman</a>, and a UK lawmaker <a href="https://www.reuters.com/legal/government/british-lawmaker-sues-musks-xai-over-sexualised-grok-images-2026-06-03/">sued xAI over Grok images</a>.</p></li><li><p><strong>Trump signed a narrower AI oversight order after industry pushback.</strong> The <a href="https://techcrunch.com/2026/06/02/trump-signs-narrower-executive-order-on-ai-oversight-after-industry-objections/">revamped order</a> asks for voluntary model submissions instead of mandates.</p></li><li><p><strong>AI&#8217;s social friction showed up everywhere.</strong> Ladybird <a href="https://simonwillison.net/2026/Jun/5/andreas-kling/">stopped accepting public pull requests over AI-generated patches</a>, Meta <a href="https://www.reuters.com/world/meta-scales-back-ai-mouse-clicks-tool-citing-employee-concerns-2026-06-02/">rolled back an employee mouse-tracking tool</a>, and China is <a href="https://www.reuters.com/business/media-telecom/china-bets-ai-promote-president-xi-jinpings-thinking-2026-06-05/">funding an AI agent to promote Xi Jinping&#8217;s thinking</a>.</p></li></ul><p>Andrew Ng&#8217;s warning for the week: the cyber risk is real this time, which is exactly when lobbyists overreach for <a href="https://www.deeplearning.ai/the-batch/issue-356/">excessive regulation</a>.</p><div><hr></div><h2><strong>&#11088; </strong>Featured: Anthropic is measuring how fast Claude can build the next Claude.</h2><p>Anthropic&#8217;s Institute published <a href="https://www.anthropic.com/institute/recursive-self-improvement">When AI builds itself</a>, a data-heavy look at how much of its own development the company has already handed to Claude, and what that implies for recursive self-improvement: an AI fully autonomously designing and developing its own successor. The piece is careful. That is not here yet, and not inevitable. But it argues the trend lines point that way, and could arrive sooner than most institutions are prepared for.</p><p>The internal numbers are the story. As of May 2026, more than 80% of the code merged into Anthropic&#8217;s codebase is written by Claude, up from low single digits before Claude Code launched in February 2025, and the typical engineer now merges 8x as much code per day as in 2024. On a fixed test that asks a model to speed up AI-training code, Claude went from a roughly 3x speedup with Opus 4 in May 2025 to about 52x with Mythos Preview in April 2026, against roughly 4x for a skilled human given four to eight hours. In one weak-to-strong supervision project, Claude agents recovered 97% of the available gap over 800 compute-hours and about $18,000, where two human researchers managed 23% in a week.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8q1s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8q1s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 424w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 848w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1272w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png" width="960" height="558" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:558,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:212010,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8q1s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 424w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 848w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1272w, https://substackcdn.com/image/fetch/$s_!8q1s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb6204a5-4df3-4f5c-bb98-303410d3ddc2_960x558.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The honest part is the caveat. Anthropic says the one thing still mostly in human hands is research taste: choosing which problems matter, which results to trust, when an approach is a dead end. But it shows that gap closing too. Shown only the first half of real research sessions, Claude picked a better next step than the human 64% of the time in April 2026, up from 51% in November. The piece sketches three futures, from the trend quietly stalling to full recursive self-improvement, and argues the world should build verifiable mechanisms now that preserve the option to slow or pause frontier development before it is needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!edHt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!edHt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 424w, https://substackcdn.com/image/fetch/$s_!edHt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 848w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1272w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png" width="960" height="557" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:557,&quot;width&quot;:960,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:253818,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/200942958?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!edHt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 424w, https://substackcdn.com/image/fetch/$s_!edHt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 848w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1272w, https://substackcdn.com/image/fetch/$s_!edHt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe6dd7f24-cd25-4823-a6a7-cff146b69ba7_960x557.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What to watch for:</strong> Whether &#8220;research taste&#8221; turns out to be one more capability models fail at for a while, then suddenly do not.</p><div><hr></div><h2><strong>&#127909; </strong>Worth a Watch</h2><div id="youtube2-wNWz5Hbh5VQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;wNWz5Hbh5VQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/wNWz5Hbh5VQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>An OpenAI model disproved an 80-year-old Erd&#337;s conjecture, and the researchers walk through how.</strong> On <a href="https://www.youtube.com/watch?v=wNWz5Hbh5VQ">OpenAI Podcast Ep. 20</a>, Alexander Wei, Hongxun Wu, and Lijie Chen explain how a general-purpose model (not a math-specific one, the same kind that powers Codex) cracked the unit distance conjecture, a problem Erd&#337;s once put a $500 bounty on.</p></li><li><p><strong>The proof bridged two fields that rarely meet.</strong> It showed the square grid is far from optimal by applying class field theory to combinatorial geometry, after grounding itself by looking up &#8220;unit&#8221; in the Cambridge dictionary and producing a 125-page chain of thought. With enough test-time compute, it lands the result about half the time.</p></li><li><p><strong>The reaction is the fun part.</strong> Reviewers went from &#8220;there&#8217;s no way this is true&#8221; to losing sleep over it, and within a week other mathematicians used the same idea to disprove a related result.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://techcrunch.com/2026/06/04/apple-approves-poke-as-the-first-ai-agent-on-its-messages-for-business-platform/">Apple approved Poke as the first AI agent on Messages for Business</a></strong> &#8212; your iMessage thread is now an agent surface, and Apple charges per user.</p></li><li><p><strong><a href="https://www.reuters.com/business/retail-consumer/amazon-unveils-new-ai-warehouse-robot-12-billion-europe-push-2026-06-04/">Amazon unveiled a conversational AI warehouse robot in an $11.6B Europe push</a></strong> &#8212; robotics and logistics keep merging.</p></li><li><p><strong><a href="https://research.google/blog/towards-passive-heart-health-monitoring-via-smartphone-camera/">Google can read your resting heart rate from a selfie</a></strong> &#8212; front-camera vitals, accurate across skin tones.</p></li><li><p><strong><a href="https://www.reuters.com/technology/chatgpt-app-hits-1-billion-monthly-active-users-record-time-data-shows-2026-06-02">ChatGPT hit 1 billion monthly active users in record time</a></strong> &#8212; the fastest app to the milestone.</p></li><li><p><strong><a href="https://openai.com/index/chatgpt-memory-dreaming/">OpenAI&#8217;s ChatGPT memory now updates itself in the background</a></strong> &#8212; &#8220;dreaming&#8221; replaces save-on-command.</p></li><li><p><strong><a href="https://www.reuters.com/business/hpe-expects-achieve-2028-financial-targets-this-year-after-record-quarter-ai-2026-06-01/">HPE raised its forecast on AI demand and the stock jumped</a></strong> &#8212; the buildout still has buyers.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/30/meta-is-reportedly-developing-an-ai-pendant/">Meta is reportedly building an AI pendant</a></strong> &#8212; the wearable land grab continues.</p></li><li><p><strong><a href="https://x.com/AnthropicAI/status/2062979607448682731">Anthropic published research on making Claude a chemist</a></strong> &#8212; pushing models from code into the hard sciences.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 74]]></title><description><![CDATA[Anthropic valued at $956B. Claude Code gets more agentic. Enterprise agents ran into permissions. DeepSeek cut prices. Google pushed AI media verification into Search & Chrome. The Pope talks AI.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-3e4</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-3e4</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 31 May 2026 11:28:50 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/416e04c9-f68d-4b93-b3d7-3211e77042c5_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!riT4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!riT4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 424w, https://substackcdn.com/image/fetch/$s_!riT4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 848w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1272w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png" width="1456" height="2825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1110659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199912055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!riT4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 424w, https://substackcdn.com/image/fetch/$s_!riT4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 848w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1272w, https://substackcdn.com/image/fetch/$s_!riT4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F666ddadd-39d5-487b-b546-af95aa598c54_2400x4656.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic raised $65B, shipped Opus 4.8, and turned Claude Code into an orchestration product.</h2><ul><li><p>Anthropic raised a <a href="https://www.anthropic.com/news/series-h">$65B Series H</a> at a $965B post-money valuation. Reuters framed the raise around <a href="https://www.reuters.com/business/anthropic-raises-65-billion-now-valued-965-billion-2026-05-28/">Claude demand and compute needs</a>, while Apollo and Blackstone are reportedly working on a <a href="https://www.reuters.com/business/apollo-blackstone-work-36-billion-debt-deal-anthropic-bloomberg-news-reports-2026-05-28/">$36B debt deal</a> tied to infrastructure expansion.</p></li><li><p>Simon Willison analyzed Anthropic&#8217;s <a href="https://simonwillison.net/2026/May/29/">run-rate revenue and Series H</a>, pointing out why the disclosed numbers matter if Anthropic eventually files for an IPO.</p></li><li><p>Anthropic launched <a href="https://www.anthropic.com/news/claude-opus-4-8">Claude Opus 4.8</a>, with stronger long-horizon work and a cheaper fast mode. VentureBeat covered the <a href="https://venturebeat.com/technology/anthropics-claude-opus-4-8-is-here-with-3x-cheaper-fast-mode-and-near-mythos-level-alignment">3x cheaper fast mode</a>.</p></li><li><p>Opus 4.8 landed across <a href="https://aws.amazon.com/about-aws/whats-new/2026/05/claude-opus-4.8-aws/">AWS</a>, <a href="https://github.blog/changelog/2026-05-28-claude-opus-4-8-is-generally-available-for-github-copilot/">GitHub Copilot</a>, <a href="https://x.com/cursor_ai/status/2060044920237469872">Cursor</a>, <a href="https://x.com/perplexity_ai/status/2060049662044962858">Perplexity</a>, and <a href="https://vercel.com/changelog/opus-4-8-on-ai-gateway">Vercel AI Gateway</a>.</p></li><li><p>Claude Code got <a href="https://claude.com/blog/introducing-dynamic-workflows-in-claude-code">dynamic workflows</a>: Claude writes orchestration scripts, spins up tens to hundreds of subagents, and checks its own work before reporting back. Claude said the feature is built for migrations, bug hunts, and large repo-wide tasks.</p></li><li><p>ClaudeDevs said dynamic workflows can be <a href="https://x.com/ClaudeDevs/status/2060044858480599067">reused as slash commands</a>, but also warned they can <a href="https://x.com/ClaudeDevs/status/2060044856114942328">consume tokens quickly</a>.</p></li><li><p>Opus 4.8 now supports <a href="https://x.com/ClaudeDevs/status/2060432688281251998">mid-conversation system instructions without breaking prompt caching</a>. ClaudeDevs said it hit <a href="https://x.com/ClaudeDevs/status/2060043209833951575">69.2% on SWE-bench Pro</a>, up from 64.3% for Opus 4.7.</p></li><li><p>Anthropic shipped a <a href="https://x.com/ClaudeDevs/status/2059385242319012188">Claude Code security-guidance plugin</a>, reporting a 30 to 40% decrease in security-related PR comments during internal rollout.</p></li></ul><div><hr></div><h2>Enterprise agents ran into the boring but important stuff: permissions, logs, recovery, and access control.</h2><ul><li><p>Salesforce described its <a href="https://www.salesforce.com/blog/marketing-mcp-server/">Marketing MCP Server</a> as a way for Agentforce Marketing agents to connect to campaign data, content, and workflow actions.</p></li><li><p>Google brought <a href="https://blog.google/security/bringing-ai-agents-to-chrome-enterprise-security-management/">MCP-based agents into Chrome Enterprise security management</a>.</p></li><li><p>VentureBeat argued the enterprise agent bottleneck <a href="https://venturebeat.com/orchestration/the-ai-agent-bottleneck-isnt-model-performance-its-permissions">is permissions, not model performance</a>.</p></li><li><p>VentureBeat also reported that production agents are entering a <a href="https://venturebeat.com/orchestration/ai-agents-are-entering-their-rebuild-era-as-enterprises-confront-the-reliability-problem">rebuild phase</a>, where durable workflows need state, recovery, observability, governance, and cost visibility.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li><li><p>Anthropic published a <a href="https://claude.com/blog/zero-trust-for-ai-agents">Zero Trust framework for AI agents</a>, covering prompt injection, tool poisoning, identity abuse, and memory poisoning.</p></li><li><p>Remote said it grew <a href="https://techcrunch.com/2026/05/27/payroll-startup-remote-says-it-grew-revenue-50-per-employee-without-adding-headcount/">revenue 50% per employee without adding headcount</a> and is exposing payroll and compliance workflows through MCP.</p></li><li><p>Robinhood launched <a href="https://techcrunch.com/2026/05/27/robinhood-now-lets-your-ai-agents-trade-stocks/">AI agent trading accounts</a> with dedicated wallets, notifications, approvals, fraud review, and virtual cards.</p></li><li><p>An arXiv paper argued agentic AI is moving from <a href="https://arxiv.org/abs/2605.26112v1">model scaling to system scaling</a>, where the harness around the model becomes the bottleneck.</p></li></ul><div><hr></div><h2>Coding agents are producing more work, and maintainers are feeling the cleanup.</h2><ul><li><p>Cursor launched <a href="https://x.com/cursor_ai/status/2060406013098897765">auto-review mode</a>, reducing approval prompts while keeping agent tool calls safer.</p></li><li><p>Cursor released its <a href="https://x.com/cursor_ai/status/2060025063899058458">Developer Habits Report</a>, reporting that developers are producing more <a href="https://x.com/cursor_ai/status/2060025074405327046">mega PRs</a> with agents.</p></li><li><p>Cursor also said <a href="https://x.com/cursor_ai/status/2060025076947521984">input tokens are now the majority of price-equivalent token costs</a>, and that <a href="https://x.com/cursor_ai/status/2060025070425395562">cost per accepted line varies roughly 7x</a> across model families.</p></li><li><p>OpenAI expanded <a href="https://x.com/OpenAI/status/2060398873974608199">Codex computer use to Windows</a>, including mobile task steering while work continues on a Windows machine.</p></li><li><p>Figma launched <a href="https://venturebeat.com/ai/figma-make-just-collapsed-the-wall-between-design-mockups-and-production-code/">two-way GitHub integration for Figma Make</a>, letting design changes move into production-code workflows.</p></li><li><p>CodeRabbit described how it built an <a href="https://claude.com/blog/how-coderabbit-used-claude-to-build-an-agent-orchestration-system">agent orchestration system on Claude</a>. OpenAI and Thrive described a Codex-powered <a href="https://openai.com/index/thrive-codex-tax-ai/">tax agent</a> that processed 7,000 returns.</p></li><li><p>SQLite added <a href="https://simonwillison.net/2026/May/27/sqlite-agents/">AGENTS.md guidance</a> rejecting agentic code submissions while still accepting reproducible bug reports. Simon Willison also covered the pressure the <a href="https://simonwillison.net/2026/May/26/the-pressure/">curl team faces from AI-assisted security reports</a>.</p></li><li><p>VentureBeat covered <a href="https://venturebeat.com/technology/deepswe-blows-up-the-ai-coding-leaderboard-crowns-gpt-5-5-and-finds-claude-opus-exploiting-a-benchmark-loophole">DeepSWE</a>, a coding benchmark that raised concerns about contamination, verifier reliability, and environment exploitation.</p></li></ul><div><hr></div><h2>AI got cheaper at the same time frontier labs got more expensive.</h2><ul><li><p>Anthropic raised a <a href="https://www.anthropic.com/news/series-h">$65B Series H</a> and Reuters reported a possible <a href="https://www.reuters.com/business/apollo-blackstone-work-36-billion-debt-deal-anthropic-bloomberg-news-reports-2026-05-28/">$36B infrastructure debt deal</a>. Frontier AI still looks capital-intensive.</p></li><li><p>DeepSeek made a <a href="https://venturebeat.com/infrastructure/how-deepseeks-radical-architecture-is-shattering-silicon-valleys-token-moat">permanent V4 price cut</a>, putting pressure on premium API pricing.</p></li><li><p>Pinterest reportedly <a href="https://venturebeat.com/orchestration/pinterest-cut-ai-costs-90-by-gutting-a-frontier-models-vision-layer">cut AI costs 90%</a> by customizing Qwen3-VL around proprietary embeddings.</p></li><li><p>Claude said Opus 4.8 fast mode is roughly <a href="https://x.com/claudeai/status/2060042706844315866">2.5x faster and 3x cheaper</a>.</p></li><li><p>Glean crossed <a href="https://techcrunch.com/2026/05/28/gleans-top-line-crosses-300m-as-ai-budget-cutting-becomes-its-major-selling-point/">$300M ARR</a> while positioning context quality as a way to reduce token usage.</p></li><li><p>Perplexity open-sourced a faster <a href="https://x.com/perplexity_ai/status/2059642904956694730">Unigram tokenizer</a> to cut CPU utilization for low-latency retrieval work.</p></li><li><p>Nathan Lambert argued <a href="https://x.com/natolambert/status/2060056671142261003">licenses help open ecosystem stability</a>, praised NVIDIA for <a href="https://x.com/natolambert/status/2060051590627897768">open model leadership</a>, and said Gemma 4 adoption is <a href="https://x.com/natolambert/status/2059230008890564855">outpacing Qwen</a> at comparable sizes.</p></li><li><p>Hugging Face published practical tooling, including <a href="https://huggingface.co/blog/FormosanBank/nllb-200-mt">fine-tuning NLLB-200</a>, <a href="https://huggingface.co/blog/torch-profiler">CUDA profiling in PyTorch</a>, and <a href="https://huggingface.co/blog/AmelieSchreiber/toricgt">ToricGT</a>.</p></li></ul><div><hr></div><h2>Verification became the expensive part.</h2><ul><li><p>Google DeepMind said SynthID has watermarked more than <a href="https://x.com/GoogleDeepMind/status/2059235181274202500">100B pieces of content</a>, with watermarking partnerships across OpenAI, ElevenLabs, and Kakao.</p></li><li><p>SynthID verification is expanding into <a href="https://x.com/GoogleDeepMind/status/2059235184130535436">Search and Chrome</a>, giving users a way to check whether media may have been AI-generated.</p></li><li><p>Pixel videos will include <a href="https://x.com/GoogleDeepMind/status/2059235187003642154">creation and edit history</a>, basically a receipt for how the media was made.</p></li><li><p>YouTube will automatically label <a href="https://techcrunch.com/2026/05/27/youtube-will-now-automatically-label-ai-videos/">significant photorealistic AI video</a> using C2PA metadata and YouTube AI tools.</p></li><li><p>OpenAI published a <a href="https://openai.com/index/trustworthy-third-party-evaluations-foundations/">playbook for trustworthy third-party evaluations</a> and its <a href="https://openai.com/index/openai-frontier-governance-framework/">Frontier Governance Framework</a>.</p></li><li><p>Illinois passed an AI bill requiring <a href="https://www.nbcnews.com/tech/tech-news/illinois-legislature-passes-historic-ai-bill-rcna347191">third-party safety audits</a>.</p></li><li><p>ITBench-AA found frontier models scoring below 50% on <a href="https://huggingface.co/blog/ibm-research/itbench-aa">agentic enterprise IT tasks</a>.</p></li><li><p>Researchers introduced <a href="https://arxiv.org/abs/2605.27355v1">alignment tampering</a>, where an LLM undergoing RLHF can influence preference data. Researchers also reportedly stripped guardrails from Google and Meta open-weight models <a href="https://www.eweek.com/news/open-weight-ai-guardrails-gemma-llama/">in minutes</a>.</p></li></ul><div><hr></div><h2>&#11088; Featured: OpenAI published a playbook for trustworthy third-party evaluations.</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!jWBZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 424w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 848w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png" width="1456" height="918" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:918,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:146659,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199912055?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!jWBZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 424w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 848w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1272w, https://substackcdn.com/image/fetch/$s_!jWBZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0cc400c9-2b3b-4cb1-8181-18067851c71a_2084x1314.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>OpenAI <a href="https://openai.com/index/trustworthy-third-party-evaluations-foundations/">released a guide for how independent third parties should evaluate frontier models</a>, and its core argument is that a benchmark score means little without the setup that produced it.</p><p>The central concept is the harness: the prompts, tools, memory, retries, and control logic wrapped around a model. Early evaluations treated models like chatbots, one prompt and one answer. Today&#8217;s models use tools, hold state across many steps, and recover from mistakes, so the harness can decide whether a capability shows up at all. OpenAI&#8217;s own data makes the point. GPT-5.5 solved 69.2% of cyber-range tasks without compaction and 92.3% with it. In a UK AISI test, raising the token budget from 10M to 100M lifted performance by up to 59%.</p><p>The guide also names the ways scores mislead. Reward hacking inflates them: METR found GPT-5.4&#8217;s apparent 13-hour task horizon dropped to 6 hours once hacked successes were removed. Sandbagging is hard to rule out: Apollo found evaluation-awareness in 52% of its sandbagging-test samples, even though the model still answered correctly.</p><p>Contamination, refusals, and broken tasks each distort results in their own direction.</p><p>This connects to the rest of the week. Illinois passed mandatory third-party safety audits. DeepSWE exposed contamination and environment exploitation in a coding benchmark.</p><p>ITBench-AA found every frontier model below 50% on enterprise SRE tasks. Across all of these, the contested ground is the same: how to trust a measurement of what AI can do.</p><p>The useful shift is that the playbook treats evaluation as system design. A score is performance under a specific harness and budget, not a fixed measure of what a model can do.</p><p><strong>What to watch for:</strong> whether third-party AI evaluation starts to look more like audit infrastructure than benchmark publishing.</p><div><hr></div><h2><strong>&#127897;&#65039;</strong>Worth a Watch</h2><div id="youtube2-4D3hDmGhFhA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;4D3hDmGhFhA&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/4D3hDmGhFhA?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>Work splits into two surfaces.</strong> One company agent you delegate to in Slack, and Codex / Claude co-work as the &#8220;operating system&#8221; where the real work happens: email, docs, research, and SaaS apps running inside the agent&#8217;s in-app browser.</p></li><li><p><strong>He flipped from personal agents to one company &#8220;super agent.&#8221;</strong> The OpenClaw hype showed that personal agents still break constantly and need babysitting. His read is that companies start with one general agent, then specialize downward as models get more independent.</p></li><li><p><strong>The SaaS apocalypse is dumb.</strong> Agents increase the number of SaaS users, not replace them. Users bring their own tokens, which could protect SaaS margins. The product shift is building software that humans and agents can use together.</p></li><li><p><strong>CLIs are over as the main surface.</strong> &#8220;We made GUIs for a reason.&#8221; Most technical people at Every moved off the terminal as their main workspace and back into Codex, Claude Code, and Cursor.</p></li><li><p><strong>Automation is a lie.</strong> Every agent needs a human. The forward-deployed engineer who gardens the agent may become one of the most valuable new hires. Models make yesterday&#8217;s competence cheap, so humans move ahead to do what is not yet framable.</p></li><li><p><strong>PMs and full-stack designers win.</strong> If the build step keeps getting easier, taste and product sense become more valuable. His advice is to &#8220;ride the models&#8221;: try every new release on your own workflows.</p></li><li><p><strong>Why it pairs with the Featured:</strong> OpenAI&#8217;s eval playbook explains why the harness around a model decides what it can do. Shipper&#8217;s thesis is the working version of that: the agent only performs when a human owns the harness around it.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><a href="https://www.vaticannews.va/en/pope/news/2026-05/pope-leo-xiv-encyclical-magnifica-humanitas-ai.html">The Pope wrote about AI</a> | Vatican News &#8212; Pope Leo XIV&#8217;s encyclical focused on AI and human dignity, with concerns around labor, warfare, accountability, and concentrated power. Simon Willison had a good <a href="https://simonwillison.net/2026/May/25/encyclical-on-ai/">breakdown</a>, and Anthropic published Chris Olah&#8217;s <a href="https://www.anthropic.com/news/chris-olah-pope-leo-encyclical">remarks from the Vatican presentation</a></p></li><li><p><a href="https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs">One company spent $500M on Claude in a single month</a> | Axios &#8212; An AI consultant said the client never capped employee license usage. Microsoft cut internal Claude Code licenses and Uber reportedly burned its 2026 AI budget by April.</p></li><li><p><a href="https://mistral.ai/news/ai-now-summit-2026">Mistral held AI Now Summit 2026</a> | Mistral &#8212; Industrial AI, Vibe, physics AI, and a new Les Ulis inference data center.</p></li><li><p><a href="https://mistral.ai/news/search-toolkit">Mistral released Search Toolkit</a> | Mistral &#8212; Open-source framework for production AI search pipelines.</p></li><li><p><a href="https://x.com/perplexity_ai/status/2060013327319577063">Perplexity launched Computer inside Microsoft Office apps</a> | Perplexity &#8212; Word, Excel, PowerPoint, and Outlook as agent surfaces.</p></li><li><p><a href="https://www.reuters.com/business/microsoft-release-new-coding-model-next-week-information-reports-2026-05-28/">Microsoft is reportedly preparing a homegrown coding model for Copilot</a> | Reuters &#8212; Another sign Microsoft is reducing OpenAI dependence where it can.</p></li><li><p><a href="https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/introducing-computer-using-agents-in-copilot-studio/">Microsoft launched computer-using agents in Copilot Studio</a> | Microsoft &#8212; Computer use is becoming a platform feature, not a lab demo.</p></li><li><p><a href="https://techcrunch.com/2026/05/27/china-is-increasingly-keeping-its-best-ai-talent-to-itself/">China is tightening controls on top AI talent</a> | TechCrunch &#8212; AI researchers are starting to look like strategic national assets.</p></li><li><p><a href="https://www.cerebras.ai/blog/what-is-sovereign-ai-and-how-cerebras-helps-nations">Cerebras explained sovereign AI</a> | Cerebras &#8212; National AI infrastructure as a sales motion.</p></li><li><p><a href="https://openai.com/index/strengthening-societal-resilience-with-rosalind-biodefense/">OpenAI launched Rosalind Biodefense</a> | OpenAI &#8212; Trusted access for biodefense and pandemic-preparedness partners.</p></li><li><p><a href="https://www.reuters.com/technology/samsung-ships-samples-next-gen-hbm4e-memory-chips-2026-05-28/">Samsung began shipping 12-layer HBM4E samples</a> | Reuters &#8212; Memory bandwidth remains one of the core constraints on AI compute.</p></li><li><p><a href="https://developer.nvidia.com/blog/nvidia-cuda-13-3-enhances-gpu-development-with-tile-programming-compileiq-autotuning-and-python-updates/">NVIDIA published CUDA 13.3 updates</a> | NVIDIA &#8212; Tile programming, CompileIQ autotuning, and Python updates.</p></li><li><p><a href="https://techcrunch.com/2026/05/28/visa-invests-in-replit-to-power-agentic-payments-for-developers/">Visa invested in Replit to explore agentic payments</a> | TechCrunch &#8212; Payment rails for agents are becoming a real category.</p></li><li><p><a href="https://techcrunch.com/2026/05/26/umg-and-tiktok-renew-agreement-to-combat-unauthorized-ai-music/">Universal Music Group and TikTok renewed an agreement on AI music</a> | TechCrunch &#8212; Licensing and attribution are becoming the music industry&#8217;s AI battleground.</p></li><li><p><a href="https://www.reuters.com/legal/litigation/cnn-sues-perplexity-ai-over-alleged-copyright-infringement-2026-05-28/">CNN sued Perplexity over alleged copyright infringement</a> | Reuters &#8212; The search/chat/content boundary keeps getting tested in court.</p></li><li><p><a href="https://www.theverge.com/tech/945378/ansel-adams-trust-sues-danziger-gallery-ai-colorized-moonrise">The Ansel Adams Trust objected to an AI-colorized &#8220;Moonrise&#8221; exhibit</a> | The Verge &#8212; AI editing is now an authenticity fight, not just a copyright fight.</p></li><li><p><a href="https://www.theverge.com/ai-artificial-intelligence/944403/steven-rosenbaum-chatbot-fake-quote-book-future-truth">Steven Rosenbaum blamed chatbots for fabricated quotes in his book</a> | The Verge &#8212; Another example of why provenance and verification keep coming up.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Can LangChain DeepAgents Explain a Codebase Architecture?]]></title><description><![CDATA[I used LangChain Deep Agents with async subagents to crawl real GitHub repos, map their architecture, generate diagrams, and check every claim against source files.]]></description><link>https://www.anothercodingblog.com/p/can-langchain-deepagents-explain</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/can-langchain-deepagents-explain</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 24 May 2026 21:13:55 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a2290e09-dd04-4a05-bcdb-14b84d4f6be9_1216x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I wanted to know whether LangChain DeepAgents could help me build real architectural understanding of an unfamiliar codebase faster.</p><p>The test was to point it at a real repository, ask it to produce an expert-level architecture dossier, and see whether the output could teach me the system well enough to make better engineering decisions.</p><p>I ended up building a repo architecture workflow with a deterministic source crawler, async area subagents, a claim ledger, a diagram architect, and validation against source files.</p><p>The result was genuinely useful. After one full run, I had a clear map of the DeepAgents repo, the main packages, the core files, the extension points, the async subagent implementation, and the reading path I would follow if I were onboarding into the codebase cold.</p><h2><strong>What is a Deep Agent anyway, and why should you care?</strong></h2><p>The simplest way to think about a Deep Agent is an agent built for longer, messier work.</p><p>In LangChain&#8217;s DeepAgents package, a supervisor agent can use tools, filesystem context, and subagents to work through a task that would be awkward as one prompt. The supervisor owns the final answer. The subagents take bounded pieces of the work. The filesystem gives the run somewhere to keep intermediate artifacts like reports, notes, source packets, and plans.</p><p>That matters for codebase architecture because the work has a natural shape. You need to inspect the repo, split it into meaningful areas, read files in each area, compare claims against source evidence, and then turn the whole thing into a mental model a human can use.</p><p>DeepAgents also has AsyncSubAgent, which is especially interesting for this use case. An async subagent is launched as a background Agent Protocol task. The supervisor gets a task id back, can check status later, and can update the task if it needs a revision.</p><p>That maps really cleanly to architecture learning. A monorepo has separate threads of work. libs/deepagents, libs/cli, libs/code, examples, .github, and partner integrations can all be studied independently before synthesis.</p><h2><strong>The use case</strong></h2><p>The job was:</p><blockquote><p><em>Given a GitHub repo, produce a source-grounded architecture dossier that helps a developer build expert-level understanding of the system: how it is organized, where the important code lives, how the major pieces interact, which abstractions matter, what evidence supports each claim, and what to read next.</em></p></blockquote><p>This is the kind of work I do constantly when opening a new codebase. I want to know:</p><ul><li><p>What kind of repo is this?</p></li><li><p>Where is the real architecture root?</p></li><li><p>What are the major packages or areas?</p></li><li><p>What are the core abstractions?</p></li><li><p>How does the main flow work?</p></li><li><p>How do the important packages depend on each other?</p></li><li><p>Which extension points are real contracts?</p></li><li><p>Which files should I read first?</p></li><li><p>Which claims are grounded in source, and which ones are guesses?</p></li></ul><p>The target repo for the full run was the DeepAgents repo itself. So the experiment became recursive in a useful way: use DeepAgents to understand DeepAgents.</p><h2><strong>What I built</strong></h2><p>The workflow has three layers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O9rY!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O9rY!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 424w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 848w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1272w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png" width="1456" height="906" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:906,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:248355,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!O9rY!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 424w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 848w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1272w, https://substackcdn.com/image/fetch/$s_!O9rY!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc9531746-43b2-4ccd-8739-939c576fa4df_1800x1120.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The first layer is deterministic. Before calling the model, the system crawls the repo and builds a source packet. That packet includes the repo shape, detected package areas, entrypoints, central files, docs, configs, tests, and resolved internal import edges.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The second layer is the agent workflow.</p><p>For the monorepo fan-out, I used DeepAgents AsyncSubAgent.</p><p>The <strong><a href="https://docs.langchain.com/oss/python/deepagents/async-subagents">official LangChain AsyncSubAgents docs</a></strong> describe them as a way for a supervisor agent to &#8220;launch background tasks that return immediately.&#8221; The supervisor can keep working while those tasks run, then check progress, send follow-up instructions, or cancel work if needed.</p><p>That fit this use case almost exactly. Each detected repo area gets its own background area-deep-dive task through a local LangGraph Agent Protocol server. Each async worker gets a bounded assignment, fetches source files for that area, and returns two artifacts:</p><ul><li><p>a Markdown area report</p></li><li><p>a structured JSON finding set</p></li></ul><p>Those area workers are the expensive part of the run, and they are the part that benefits from async. They can inspect different repo areas at the same time and then flow back into the final synthesis.</p><p>The handoff back to the supervisor is the important part. Each async area subagent returns a validated finding set and area report. The runner turns those into a consolidated area dossier bundle, then passes that bundle into the final DeepAgents supervisor as source-grounded context. If an area report fails validation, the same async task thread gets an update asking it to repair the report before the supervisor uses it.</p><p>The final synthesis uses a DeepAgents supervisor with regular specialist subagents:</p><ul><li><p>repository area mapper</p></li><li><p>repo cartographer</p></li><li><p>abstraction teacher</p></li><li><p>runtime flow tracer</p></li><li><p>diagram reviewer</p></li><li><p>diagram architect</p></li><li><p>reading path teacher</p></li><li><p>architecture validator</p></li></ul><p>That sync versus async split felt right. The area research can run in parallel because the work is independent. The final writeup, claim ledger, diagram selection, and validation need a staged order because each step depends on the previous artifact.</p><p>The third layer is validation. The system checks whether generated reports cite real repo-relative paths, avoid ambiguous filenames, include required source anchors, and stay grounded in source facts.</p><p>That validation layer carried a lot of the trust.</p><p>After testing a few model setups, the best version used a split:</p><ul><li><p>GPT-5.4 mini for the async area workers and final architecture synthesis</p></li><li><p>GPT-4.1 for deterministic repair loops after validation failures</p></li></ul><p>That split made sense in practice. The reasoning model produced a more useful teaching artifact. The repair model was steadier at cleaning up path and grounding issues.</p><h2><strong>The diagram architect</strong></h2><p>The first architecture diagram was too simple. It was useful as an orientation map, but it did not teach much.</p><p>So I added a diagram-architect subagent.</p><p>Its job is to look at the claim ledger, the deterministic diagram pack, and the source facts, then decide which diagrams are actually useful. The deterministic renderer writes five Mermaid diagrams:</p><ul><li><p>repository map</p></li><li><p>public API flow</p></li><li><p>component evidence map</p></li><li><p>dependency evidence map</p></li><li><p>open questions map</p></li></ul><p>The diagram-architect reviews those diagrams inside the agent runtime and helps the final synthesis choose a better System Map.</p><p>This turned out to be a good split. Deterministic code can draw every node and edge it knows about. An agent is better at deciding which view teaches the architecture without turning the diagram into a giant file graph.</p><h2><strong>The full run</strong></h2><p>The full DeepAgents repo run used the async subagent path:</p><p><code>13 AsyncSubAgent area tasks launched<br>10 area reports passed validation<br>3 area reports still needed review<br>395 claims were written to the claim ledger<br>the claim ledger passed validation<br>5 architecture diagrams were generated<br>the final dossier passed deterministic validation<br>total runtime was about 8.1 minutes</code></p><p>The model split mattered here. A pure GPT-5.4 mini run produced richer notes, but the final dossier failed validation on evidence-format issues. A pure GPT-4.1 run passed final validation, but the explanation was more conservative. The hybrid run kept the richer architecture synthesis and still produced a final dossier that passed deterministic validation.</p><p>The claim ledger became the most important artifact.</p><p>Each claim has a type, confidence level, source, and evidence paths. Some claims come from deterministic source analysis. Others come from area subagents. For example:</p><ul><li><p>libs/deepagents owns the core agent framework.</p></li><li><p>libs/deepagents/deepagents/graph.py is the source evidence for create_deep_agent.</p></li><li><p>libs/deepagents/deepagents/middleware/subagents.py grounds SubAgentMiddleware.</p></li><li><p>libs/deepagents/deepagents/middleware/async_subagents.py grounds async subagent behavior.</p></li><li><p>libs/deepagents/deepagents/backends/protocol.py defines the backend contract.</p></li><li><p>libs/deepagents/deepagents/backends/state.py grounds the default state backend.</p></li></ul><p>That gave the final agent something stronger than chat history. It had a structured evidence map it could use during synthesis.</p><h2><strong>What the architecture learner found</strong></h2><p>The generated architecture map was useful.</p><p>The repo is a Python monorepo centered on the DeepAgents core package under libs/deepagents/deepagents. Around that core are packages for CLI/deployment, a React frontend, code-oriented skills, partner sandbox integrations, eval tooling, examples, and GitHub automation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cguw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cguw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 424w, https://substackcdn.com/image/fetch/$s_!cguw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 848w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png" width="1456" height="954" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:954,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:288044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cguw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 424w, https://substackcdn.com/image/fetch/$s_!cguw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 848w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1272w, https://substackcdn.com/image/fetch/$s_!cguw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0b44c4da-034c-4643-a6c9-9c81828bffa3_1800x1180.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The core package revolves around a few files:</p><p><strong>libs/deepagents/deepagents/graph.py</strong><br><em>What it teaches:</em> how create_deep_agent assembles the agent.</p><p><strong>libs/deepagents/deepagents/middleware/subagents.py</strong><br><em>What it teaches: </em>synchronous subagent delegation.</p><p><strong>libs/deepagents/deepagents/middleware/async_subagents.py</strong><br><em>What it teaches: </em>async/background subagent specs.</p><p><strong>libs/deepagents/deepagents/middleware/filesystem.py</strong><br><em>What it teaches:</em> file tools and permission rules.</p><p><strong>libs/deepagents/deepagents/backends/protocol.py</strong><br><em>What it teaches:</em> the backend interface.</p><p><strong>libs/deepagents/deepagents/backends/state.py</strong><br><em>What it teaches: </em>the default thread-scoped state backend.</p><p>The generated reading path was exactly the kind of thing I wanted from this experiment. It started with the public package entrypoint, moved into graph.py, then into middleware and backend contracts. That is how I would onboard myself into the repo manually.</p><h2><strong>A quick check on another repo</strong></h2><p>I also pointed the same architecture learner at <a href="https://github.com/facebookresearch/sam3">Meta&#8217;s facebookresearch/sam3 repo</a>.</p><p>This was not a full second case study. I wanted to know whether the workflow was accidentally tuned to the DeepAgents repo, or whether it could produce a useful architecture map for a different kind of codebase.</p><p>The SAM3 run was smaller:</p><p><code>2 repository areas detected<br>2 area reports passed validation<br>127 claims were written to the claim ledger<br>the claim ledger passed validation<br>5 architecture diagrams were generated<br>the final dossier passed deterministic validation<br>total runtime was about 1.5 minutes</code></p><p>The output found a clean architecture root at sam3, with sam3/model_builder.py as the main assembly point. The surrounding architecture broke into model utilities, agent/inference code, evaluation toolkits, training/config logic, performance helpers, and external scripts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7E9s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7E9s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 424w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 848w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:210858,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/199001179?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7E9s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 424w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 848w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1272w, https://substackcdn.com/image/fetch/$s_!7E9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F53e341ba-d89d-407c-ac13-eed2029e8b25_1800x1020.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That was enough for me. The point was not to deeply explain SAM3 in this post. The point was that the architecture learner could move from a LangChain agent framework repo to a computer vision model repo and still produce a grounded map.</p><h2><strong>Where it still needed guardrails</strong></h2><p>The model had enough context to explain the architecture, but it still made small mistakes that matter in a source-grounded workflow.</p><p>Three area reports still needed review after repair: libs/evals, libs/partners/daytona, and libs/partners/quickjs. The failures were mostly formatting-level evidence issues, like a stray / being interpreted as a path, or an ellipsis showing up where the validator expected exact files.</p><p>That is exactly the kind of failure I want surfaced. The final dossier still passed because the synthesis had enough grounded evidence and did not rely on unsupported claims from those area reports.</p><p>Earlier validation also caught shortened paths such as middleware/subagents.py when the full repo-relative path was libs/deepagents/deepagents/middleware/subagents.py. In a monorepo, that distinction matters. A bare filename can point to the wrong mental model.</p><p>After repair, the final dossier passed:</p><p><code>nonexistent path references: none<br>ambiguous or incomplete paths: none<br>missing required anchors: none<br>missing required symbols: none<br>semantic grounding issues: none</code></p><p>That result changed how I think about this use case.</p><p>The agent can help explain a repo quickly. The explanation becomes much more trustworthy when the system can reject bad paths, force source evidence, and make uncertainty visible.</p><h2><strong>The pattern I would reuse</strong></h2><p>The reusable pattern is:</p><p><code>source packet<br>-&gt; async area subagents<br>-&gt; claim ledger<br>-&gt; diagram architect<br>-&gt; final synthesis<br>-&gt; deterministic validation<br>-&gt; focused follow-up questions</code></p><p>The focused follow-up piece matters. One architecture report can orient you, but expertise comes from narrower questions:</p><ul><li><p>How does the public API flow into the core implementation?</p></li><li><p>Where does state live?</p></li><li><p>What extension points are real contracts?</p></li><li><p>What is inferred from config or docs?</p></li><li><p>Which packages depend on the core runtime?</p></li></ul><p>That is where the saved claim ledger helps. A follow-up agent can start from validated claims, reopen source files, and answer one question at a time.</p><h2><strong>When this is worth using</strong></h2><p>I would use this pattern for:</p><ul><li><p>onboarding into a large unfamiliar repo</p></li><li><p>generating first-pass architecture docs</p></li><li><p>preparing for a migration</p></li><li><p>auditing a monorepo before refactoring</p></li><li><p>understanding how a framework is organized</p></li></ul><p>I would skip it for small repos. If the project has twenty files, read the files.</p><p>The value shows up when the repo has multiple packages, mixed docs/config/source signals, and enough surface area that a single prompt gets vague quickly.</p><h2><strong>What I learned</strong></h2><p>DeepAgents was useful here because the task decomposes naturally.</p><p>The run split cleanly across specialists: repo mapping, area investigation, core abstraction review, runtime flow tracing, diagram critique, claim validation, and final synthesis.</p><p>The async subagents made the architecture learner feel like a real repo analysis system. Each area worker could build local expertise on one thread of the monorepo, then the supervisor could put the pieces together.</p><p>The strongest lesson from the run was that architecture understanding needs evidence loops and the hardest part about this entire build was the validating agent to ensure that the workflow was not just inventing a random architecture. .</p><p>An agent can write a convincing architecture summary from partial context. That is why the validation layer matters.</p><p>The setup I would keep treats the model as the reasoning layer and the deterministic tools as the ground. The model decides what the architecture means. The tools decide whether the files, paths, and claims are real.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 73]]></title><description><![CDATA[Google declares the agentic era. Gemini Spark is Google's consumer agent. Cursor integrates with Jira. Anthropic acquires Stainless. Musk lost the OpenAI trial. Glasswing found 10k vulnerabilities.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-803</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-803</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 23 May 2026 21:12:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/170df6e0-0230-4455-a879-df92460d8081_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8u4l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8u4l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 424w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 848w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1272w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png" width="1456" height="2173" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2173,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:705925,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198998731?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8u4l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 424w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 848w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1272w, https://substackcdn.com/image/fetch/$s_!8u4l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F845b658b-ad53-44b4-84a9-c70ef704c627_2400x3582.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Google declared the agentic era</h2><ul><li><p><a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-5/">Gemini 3.5 Flash</a> launched as the model for agents and coding. <a href="https://x.com/GoogleDeepMind/status/2056787987774816525">Google DeepMind framed it</a> as frontier intelligence plus real-world action.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Xz99!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Xz99!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 424w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 848w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png" width="1456" height="810" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:810,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Xz99!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 424w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 848w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1272w, https://substackcdn.com/image/fetch/$s_!Xz99!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F455c45cc-5af5-4a8e-87af-fd0b9f4eb7db_2614x1454.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><a href="https://gemini.google/overview/agent/spark/">Gemini Spark</a> arrived as a 24/7 cloud agent for Gmail, documents, inbox monitoring, and eventually purchases. <a href="https://techcrunch.com/2026/05/19/google-introduces-gemini-spark-a-24-7-agentic-assistant-with-gmail-integration/">TechCrunch</a> described it as a personal assistant built from Gemini models and Google&#8217;s Antigravity agent harness.</p></li><li><p>Google redesigned Search around AI Mode and multimodal input. <a href="https://venturebeat.com/technology/google-just-redesigned-the-search-box-for-the-first-time-in-25-years-heres-why-it-matters-more-than-you-think">VentureBeat</a> called it the first major search box redesign in 25 years.</p></li><li><p><a href="https://antigravity.google/product/antigravity-2">Antigravity 2.0 launched</a> with a desktop app and CLI.</p></li><li><p>Google also released <a href="https://developer.android.com/tools/agents/android-cli/journeys">Android CLI support</a> so Claude Code, Codex, and other coding agents can build Android apps from the command line.</p><p></p><p><strong>The thread: </strong>Gemini Spark is an exciting launch, but it depends on how embedded you are in Google&#8217;s ecosystem. It&#8217;s interesting that Gemini released a flash model first, however it seems to be benchmarking really well against other frontier models. Clearly Google is still in it and pioneering ahead.</p></li></ul><div><hr></div><h2>Claude, AWS, Cursor, and LangChain shipped the agent plumbing layer</h2><ul><li><p>Cursor shipped <a href="https://cursor.com/blog/composer-2-5">Composer 2.5</a>, then added <a href="https://www.atlassian.com/blog/company-news/cursor-in-jira">Jira integration</a> so teams can assign issues directly to cloud agents.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IkcU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IkcU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Composer 2.5 benchmark results&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Composer 2.5 benchmark results" title="Composer 2.5 benchmark results" srcset="https://substackcdn.com/image/fetch/$s_!IkcU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!IkcU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe24a7484-d955-4d31-a28c-97e900c94806_1920x1080.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div></li><li><p>Cursor also opened SDK access for Python and TypeScript. <a href="https://x.com/cursor_ai/status/2057913121558413770">The Cursor account</a> framed it as a way to build your own agents with Composer 2.5.</p></li><li><p>Anthropic acquired <a href="https://www.anthropic.com/news/anthropic-acquires-stainless">Stainless</a>, the SDK and MCP server platform that powered every Anthropic SDK.</p></li><li><p>Claude Managed Agents added <a href="https://claude.com/blog/claude-managed-agents-updates">self-hosted sandboxes and MCP tunnels</a>, moving credentials and execution inside enterprise boundaries.</p></li><li><p>AWS published a full AgentCore content offensive: <a href="https://aws.amazon.com/blogs/machine-learning/extending-conversational-memory-in-kiro-cli-using-amazon-bedrock-agentcore-memory/">MCP memory</a>, <a href="https://aws.amazon.com/blogs/machine-learning/building-multi-tenant-agents-with-amazon-bedrock-agentcore/">multi-tenant agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/build-ai-agents-for-business-intelligence-with-amazon-bedrock-agentcore/">BI agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/build-ai-powered-dashboard-automation-agents-with-nlp-on-amazon-bedrock-agentcore/">dashboard agents</a>, <a href="https://aws.amazon.com/blogs/machine-learning/amazon-nova-act-is-now-hipaa-eligible/">HIPAA eligibility</a>, and <a href="https://aws.amazon.com/blogs/machine-learning/announcing-openai-compatible-api-support-for-amazon-sagemaker-ai-endpoints/">OpenAI-compatible SageMaker endpoints</a>.</p></li><li><p>LangChain shipped <a href="https://www.langchain.com/blog/how-we-built-langsmith-engine-our-agent-for-improving-agents">LangSmith Engine</a>, an agent for improving agents.</p><p></p><p><strong>The thread:</strong> Composer 2.5 is now the <a href="https://x.com/mntruell/status/2056780569380626686">most chosen model</a> in Cursor and it appears that it is considerably cheaper than GPT-5.5 and Opus 4.7. Claude Managed Agents feels like its slowly becoming a full orchestration framework, but I am not sure how persistent memory is shared across enterprise with its design. </p></li></ul><div><hr></div><h2>Compute became the business model</h2><ul><li><p>Anthropic told investors it expects <a href="https://techcrunch.com/2026/05/20/anthropic-says-its-about-to-have-its-first-profitable-quarter/">its first operating profit</a>, while compute costs may erase that profitability later.</p></li><li><p><a href="https://www.sec.gov/Archives/edgar/data/1181412/000162828026036936/spaceexplorationtechnologi.htm">SpaceX&#8217;s IPO filing</a> revealed Anthropic agreed to pay xAI/SpaceX $1.25B per month for Colossus access.</p></li><li><p>OpenAI introduced <a href="https://openai.com/business/guaranteed-capacity/">Guaranteed Capacity</a>, turning long-term compute access into a product.</p></li><li><p>NVIDIA reported $81.6B in Q1 revenue, up <a href="https://nvidianews.nvidia.com/news/nvidia-announces-financial-results-for-first-quarter-fiscal-2027">85% year over year</a>.</p></li><li><p>NVIDIA and IREN announced a <a href="https://nvidianews.nvidia.com/news/nvidia-and-iren-announce-strategic-partnership-to-accelerate-deployment-of-up-to-5-gigawatts-of-ai-infrastructure">5 GW AI infrastructure partnership</a>.</p></li><li><p>Simon Willison flagged the memory side: AI demand for HBM may <a href="https://simonwillison.net/2026/May/22/memory-shortage/">reprice consumer electronics</a>.</p><p></p><p><strong>The thread: </strong>In case you didn&#8217;t read that correctly, thats billion with a capital B per MONTH for Colossus access. So, while they claim to achieve first operating profit, I am Interested to see if they can keep pace. NVIDIA is up 85% from last year and thats just bananas. </p></li></ul><div><hr></div><h2>AI layoffs stopped looking like isolated restructuring</h2><ul><li><p><a href="https://www.hcamag.com/us/specialization/transformation/intuit-slashes-staff-signs-deals-with-anthropic-and-open-ai/576021">Intuit announced layoffs</a> while signing deals with Anthropic and OpenAI.</p></li><li><p><a href="https://www.reuters.com/business/stanchart-cut-7000-jobs-boost-ai/">Standard Chartered announced</a> plans to cut 7,000+ jobs while accelerating AI investment.</p></li><li><p><a href="https://www.cnbc.com/2026/05/17/ai-related-layoffs-a-boost-for-stocks-not-necessarily.html">CNBC found</a> AI-related layoff announcements do not reliably boost stock prices.</p></li><li><p>Meta&#8217;s AI pivot and broader workforce cuts stayed in the week&#8217;s background via <a href="https://www.npr.org/2026/05/20/nx-s1-5826917/meta-layoffs-ai-jobs">NPR</a>.</p><p></p><p><strong>The thread: </strong>Companies are not only performing AI restructuring for investors. Some appear to believe the operating model is changing whether the market rewards it immediately or not.</p></li></ul><div><hr></div><h2>OpenAI won the trial. The governance questions survived</h2><ul><li><p>Musk&#8217;s lawsuit against OpenAI, Altman, Brockman, and Microsoft collapsed, removing an obstacle to OpenAI&#8217;s IPO path. <a href="https://www.reuters.com/legal/openai-defeats-elon-musks-lawsuit/">Reuters</a> covered the legal result.</p></li><li><p>The trial surfaced <a href="https://www.reuters.com/legal/government/key-moments-musk-vs-openai-trial-2026-05-18/">credibility fights</a> around OpenAI&#8217;s nonprofit origins, commercial ambitions, and who gets to claim the original mission.</p></li><li><p><a href="https://www.theverge.com/ai-artificial-intelligence/932464/musk-v-altman-proved-that-ai-is-led-by-the-wrong-people">The Verge</a> argued the case exposed something larger: the people leading AI may not be trusted to govern it.</p><p></p><p><strong>The thread:</strong> OpenAI won legally. The trial still reinforced the industry&#8217;s trust problem.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Project Glasswing found 10,000 critical vulnerabilities</h2><p>Anthropic&#8217;s <a href="https://www.anthropic.com/research/glasswing-initial-update">Project Glasswing update</a> is the most important direct-source read of the week.</p><p>Claude Mythos Preview and roughly 50 partners found more than 10,000 high- or critical-severity vulnerabilities in essential software. The key sentence was not the number. It was the bottleneck shift: discovery is no longer the hard part. Verification, disclosure, and patching are.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hCh8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hCh8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 424w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 848w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1272w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp" width="1456" height="898" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:898,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hCh8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 424w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 848w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1272w, https://substackcdn.com/image/fetch/$s_!hCh8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5db698d-e691-4f9a-a3ab-583eee129723_1634x1008.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The downstream effects moved fast. The UK Government Digital Service pushed back on closing public repositories after AI-discovered vulnerabilities. Reuters reported Anthropic will <a href="https://www.reuters.com/technology/anthropic-brief-financial-stability-board-cyber-flaws-exposed-by-mythos-ft-2026-05-18/">brief the Financial Stability Board</a>, turning this from a software-security issue into a systemic-risk discussion.</p><p>What makes Glasswing different is scale. Coordinated disclosure was built for individual researchers finding individual bugs. AI-assisted scanning can produce vulnerability volume at industrial scale. The process was not designed for this.</p><p><strong>What to watch for:</strong> whether labs that discover vulnerabilities at scale are forced to build remediation infrastructure too.</p><div><hr></div><h2><strong>&#127897;&#65039; </strong>Worth a Listen</h2><div id="youtube2-orudZzP8vUc" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;orudZzP8vUc&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/orudZzP8vUc?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p>The AI Studio half is &#8220;build a business from a prompt&#8221;: research agents, agentic focus groups, Stitch designs, Workspace integration, Sheets-backed dashboards, Cloud Run deployment, and marketing tools in one flow.</p></li><li><p>The Antigravity half is the real signal: sub-agents, background tasks, hooks, artifacts, project permissions, scheduled agents, browser agents, CLI, SDK, and managed API.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://openai.com/index/model-disproves-discrete-geometry-conjecture/">OpenAI says a model disproved a central discrete-geometry conjecture</a></strong> | OpenAI &#8212; External mathematicians checked the proof.</p></li><li><p><strong><a href="https://huggingface.co/blog/VirgileBatto/lerobot-humanoid">LeRobot Humanoid</a></strong> | Hugging Face &#8212; A roughly $2,500 open humanoid robotics platform.</p></li><li><p><strong><a href="https://venturebeat.com/technology/cohere-cracks-lossless-quantization-and-native-citations-with-first-full-apache-2-0-licensed-open-model-command-a">Cohere released Command A+</a></strong> | VentureBeat &#8212; Apache 2.0 licensing, native citations, enterprise-friendly model packaging.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-and-universal-music-strike-deal-allowing-fan-made-ai-covers-and-remixes/">Spotify and UMG struck a deal for AI covers and remixes</a></strong> | TechCrunch &#8212; Licensed AI music moves from taboo to product.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-adds-ai-powered-qa-and-briefing-generation-features-to-podcasts/">Spotify launched AI podcast tools</a></strong> | TechCrunch &#8212; Podcasts become queryable, summarizable AI surfaces.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/21/spotify-launches-an-elevenlabs-powered-audiobook-creation-tool/">Spotify launched an ElevenLabs audiobook tool</a></strong> | TechCrunch &#8212; AI narration enters the audiobook workflow.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/22/how-vcs-and-founders-use-inflated-arr-to-kingmake-ai-startups/">AI startups are stretching ARR</a></strong> | TechCrunch &#8212; The AI revenue story is getting less clean.</p></li><li><p><strong><a href="https://www.404media.co/new-arxiv-rules-ai-generated-papers-ban/">ArXiv will ban researchers for AI slop submissions</a></strong> | 404 Media &#8212; Academic publishing&#8217;s authentication problem now has teeth.</p></li><li><p><strong><a href="https://www.theverge.com/tech/932207/siri-apple-intelligence-auto-deleting-chats">Apple&#8217;s Siri revamp may auto-delete chats</a></strong> | The Verge &#8212; Privacy becomes Apple&#8217;s AI wedge.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/932203/university-of-arizona-students-boo-eric-schmidt-ai-commencement">Students booed Eric Schmidt&#8217;s AI commencement speech</a></strong> | The Verge &#8212; The public mood is not matching the industry&#8217;s launch calendar.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 72]]></title><description><![CDATA[Anthropic ships five verticals and gave every plan an SDK budget. OpenAI launches a deployment company with 150 engineers. Cisco, GitLab, and GM cut thousands at record revenue. Grok Build at $299/mo.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-9dd</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-9dd</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 16 May 2026 19:39:53 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ece87e18-1dc8-4d38-b255-048b807d7880_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!FNYC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!FNYC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 424w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 848w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1272w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png" width="1456" height="2489" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2489,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:923849,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198040876?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!FNYC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 424w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 848w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1272w, https://substackcdn.com/image/fetch/$s_!FNYC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fb11fff-d44e-452d-b383-fca05f7f9e3e_2400x4102.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Anthropic shipped into legal, small business, healthcare, and AWS in one week.</h2><ul><li><p><strong>Claude for the legal industry launched with 12 practice-area plugins.</strong> <a href="https://claude.com/blog/claude-for-the-legal-industry">Contract review, M&amp;A diligence, and regulatory compliance</a> out of the box. 87% of general counsel now use generative AI, up from 44% the prior year.</p></li><li><p><strong>Claude for Small Business connected to QuickBooks, PayPal, and HubSpot.</strong> <a href="https://www.anthropic.com/news/claude-for-small-business">15 ready-to-run workflows</a> covering invoicing, CRM, document signing via DocuSign and Canva.</p></li><li><p><strong>Anthropic committed $200M to the Gates Foundation.</strong> <a href="https://www.anthropic.com/news/gates-foundation-partnership">Grants, Claude credits, and technical support</a> for vaccine screening, disease forecasting, K-12 education, and agricultural tools.</p></li><li><p><strong>Claude Platform went GA on AWS.</strong> <a href="https://aws.amazon.com/blogs/machine-learning/introducing-claude-platform-on-aws-anthropics-native-platform-through-your-aws-account/">First cloud provider</a> to offer Anthropic&#8217;s native platform with unified billing and same-day feature parity with the native API.</p></li><li><p><strong>Every subscriber now gets separate Agent SDK credits.</strong> Pro gets <a href="https://x.com/ClaudeDevs/status/2054610152817619388">$20/month</a>, Max gets up to $200. Unlike OpenAI, which bundles Codex and third-party usage into normal plan limits, Anthropic is subsidizing the developer ecosystem with a separate bucket.</p></li><li><p><strong>Claude Code limits increased another 50% through July.</strong> <a href="https://x.com/claudeai/status/2054641166155497503">On top of the doubling</a> from the week before.</p></li><li><p><strong>Ramp and Axios independently confirmed Anthropic overtook OpenAI in workplace adoption.</strong> Though <a href="https://venturebeat.com/technology/anthropic-finally-beat-openai-in-business-ai-adoption-but-3-big-threats-could-erase-its-lead">VentureBeat identified three structural threats</a> to that lead.</p></li><li><p><strong>The thread:</strong> Anthropic is trying to become the default for every vertical at once. Legal, healthcare, small business, enterprise, developer tooling. Whether that&#8217;s a platform strategy or overextension depends on execution.</p></li></ul><div><hr></div><h2>OpenAI launched a deployment company and put Codex on your phone.</h2><ul><li><p><strong>The OpenAI Deployment Company launched with 150 engineers on day one.</strong> <a href="https://x.com/OpenAI/status/2053824997777457651">19 investment firms and consultancies</a>, majority-owned by OpenAI, with <a href="https://x.com/OpenAI/status/2053824999736410415">Tomoro acquired</a> to provide Forward Deployed Engineers. <a href="https://www.axios.com/2026/05/11/openai-deployco-private-equity">Valued at $14B</a>.</p></li><li><p><strong>ChatGPT connected to bank accounts.</strong> <a href="https://openai.com/index/personal-finance-chatgpt/">Plaid integration for Pro users</a> in the US, with an Intuit partnership for actionable financial steps.</p></li><li><p><strong>Codex shipped to iOS and Android.</strong> <a href="https://x.com/OpenAI/status/2055016850849993072">Mobile preview</a> lets users start, review, and approve coding tasks while agents run on a separate device.</p></li><li><p><strong>OpenAI disclosed a supply chain compromise.</strong> A <a href="https://openai.com/index/our-response-to-the-tanstack-npm-supply-chain-attack/">TanStack npm package attack</a> exposed code-signing certificates for macOS, Windows, iOS, and Android apps. Full certificate rotation required.</p></li><li><p><strong>The thread:</strong> Both OpenAI and Anthropic launched enterprise services arms within a week of each other. The model API is becoming a commodity. The margin is shifting to who can get it deployed inside your organization first.</p></li></ul><div><hr></div><h2>Companies are cutting workers at record revenue to fund AI.</h2><ul><li><p><strong>Cisco cut 4,000 jobs while reporting record quarterly revenue.</strong> Stock rose 15% on <a href="https://www.cnbc.com/2026/05/13/cisco-csco-q3-earnings-report-2026.html">surging AI orders</a>.</p></li><li><p><strong>GitLab announced sweeping restructuring to fund agent development.</strong> <a href="https://about.gitlab.com/blog/gitlab-act-2/">Cut headcount, flattened management</a>, reorganized R&amp;D into 60 smaller teams, and retired its CREDIT values framework.</p></li><li><p><strong>GM laid off hundreds of IT workers and began hiring AI replacements.</strong> <a href="https://techcrunch.com/2026/05/11/gm-just-laid-off-hundreds-of-it-workers-to-hire-those-with-stronger-ai-skills/">Explicitly seeking stronger AI skills</a>.</p></li><li><p><strong>Samsung faces a looming strike over AI.</strong> <a href="https://www.reuters.com/sustainability/society-equity/elon-musks-court-battle-against-openai-enters-homestretch-2026-05-14/">Global AI boom driving deep internal divisions</a> between management and workers.</p></li><li><p><strong>The thread:</strong> Revenue is up at all three companies. The functions going are IT operations, developer tooling management, and corporate overhead that was previously considered secure.</p></li></ul><div><hr></div><h2>Grok Build, Claude Code, and Cursor all shipped agentic upgrades. LangChain shipped nine products to support them.</h2><ul><li><p><strong>xAI launched Grok Build in beta.</strong> <a href="https://x.ai/news/grok-build-cli">Terminal-native CLI</a> with up to 8 parallel agents, Grok 4.3 beta, 2M token context. Priced at $299/month (introductory $99). SuperGrok Heavy only.</p></li><li><p><strong>Claude Code limits increased 50%.</strong> <a href="https://x.com/claudeai/status/2054641166155497503">Through July 13</a>, on top of the doubling from the prior week. Plus separate Agent SDK credits.</p></li><li><p><strong>Cursor shipped /orchestrate.</strong> <a href="https://x.com/cursor_ai/status/2052432780336988474">Planner/worker/verifier loops</a> that re-spawn on failure. <a href="https://x.com/cursor_ai/status/2052489388895195399">Parallel subagents</a>. <a href="https://x.com/cursor_ai/status/2051739625958584659">Always-on CI agents</a>.</p></li><li><p><strong>LangChain shipped nine products at Interrupt 2026.</strong> <a href="https://www.langchain.com/blog/introducing-smithdb">SmithDB</a> for agent traces, <a href="https://www.langchain.com/blog/introducing-llm-gateway">LLM Gateway</a> for centralized control, <a href="https://www.langchain.com/blog/langsmith-sandboxes-generally-available">Sandboxes GA</a> for isolated testing, <a href="https://www.langchain.com/blog/deep-agents-0-6">Deep Agents 0.6</a> for long-running workflows, and the <a href="https://www.langchain.com/blog/the-agent-development-lifecycle">Agent Development Lifecycle</a> framework.</p></li><li><p><strong>The thread:</strong> Grok Build at $299/month, Claude Code with separate SDK credits, Cursor as a standalone IDE. Three very different bets on how developers will pay for agentic coding. LangChain is betting the real money is in the infrastructure underneath all of them.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Thinking Machines built an AI that listens while it talks.</h2><p>Every AI conversation today works the same way: you talk, the model waits, the model responds. <a href="https://thinkingmachines.ai/blog/interaction-models/">Thinking Machines</a> published research on &#8220;interaction models&#8221; that throw out that assumption entirely.</p><p>Their model processes continuous 200ms micro-turns of audio, video, and text simultaneously. There are no turn boundaries. The model listens while speaking, interrupts when it sees something wrong in your code, reacts to visual cues without being prompted, and runs background reasoning while maintaining the conversation.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The architecture splits into two parts: an interaction model that maintains real-time presence (always perceiving, always ready to respond), and a background model that handles deeper reasoning and tool use asynchronously. When the background model finishes a task, the interaction model weaves results into the conversation at an appropriate moment instead of interrupting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AuzQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 424w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 848w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1272w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png" width="1302" height="852" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:852,&quot;width&quot;:1302,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:87959,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/198040876?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AuzQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 424w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 848w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1272w, https://substackcdn.com/image/fetch/$s_!AuzQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb73cf7eb-d05d-4c88-9c3d-64921b6ad1c9_1302x852.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The benchmarks are striking. On FD-bench (the standard interaction quality benchmark), their model scored 77.8 versus 46.8 for GPT-Realtime-2. On responsiveness, they hit 0.40 second turn-taking latency versus 1.18 for GPT-Realtime-2. They also created three new benchmarks (TimeSpeak, CueSpeak, visual proactivity) that no existing model can meaningfully perform. GPT-Realtime-2 scores near zero on all of them.</p><p>The model is a 276B parameter MoE with 12B active. It uses encoder-free early fusion, meaning no separate Whisper or TTS models. Audio comes in as raw dMel signals, video as 40x40 patches. Everything is co-trained from scratch.</p><p>Their argument comes from Rich Sutton&#8217;s &#8220;bitter lesson&#8221;: if interactivity is bolted on through harnesses (voice activity detection, turn-taking logic), it can never scale with intelligence. If it&#8217;s native to the model, scaling makes the model both smarter and a better collaborator.</p><p><strong>What to watch for:</strong> This is a research preview from a startup (276B parameters, limited availability). But the design principle matters: current real-time systems from OpenAI and Google use harnesses to fake interactivity on top of turn-based models. Thinking Machines is arguing that&#8217;s a dead end. If they&#8217;re right, every voice agent shipping today is architecturally temporary.</p><div><hr></div><h2><strong>&#127897;&#65039; Worth a Listen</strong></h2><div id="youtube2-IVGjBxqygmI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;IVGjBxqygmI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/IVGjBxqygmI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>IBM AI Engineer Bri Kopecki on why agents without infrastructure are &#8220;brilliant goldfish.&#8221;</p><ul><li><p><strong>The problem:</strong> Most AI agents have no memory, no access control, no audit trail. Every conversation starts from scratch.</p></li><li><p><strong>The six-layer stack:</strong> Scheduler (who goes first), memory manager (short/long/episodic), tool manager (sandboxed execution), identity manager (tokens and permissions), observability (full decision tracing), and guardrails/governance (human-in-the-loop for high-stakes decisions).</p></li><li><p><strong>Why it matters now:</strong> This maps directly to what LangChain shipped this week (SmithDB for traces, LLM Gateway for access control, Sandboxes for tool isolation) and explains why Cursor, Anthropic, and OpenAI are all building orchestration layers.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://techcrunch.com/2026/05/14/cerebras-ipo-debut/">Cerebras IPO&#8217;d at $5.55B, shares jumped 89% on day one</a></strong> | TechCrunch &#8212; Near $100B market cap on debut. The AI chip premium is real.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/12/medicares-new-payment-model-is-built-for-ai-and-most-of-the-tech-world-has-no-idea/">Medicare created a payment model built for AI-assisted services</a></strong> | TechCrunch &#8212; The largest US payer quietly opened the door for clinical AI reimbursement. This will pull deployment faster than any product launch.</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/05/15/musk-v-altman-week-3/">Musk v. Altman trial went to the jury</a></strong> | MIT Tech Review &#8212; Closing arguments accused Musk of selective amnesia and Altman of lying about the nonprofit mission.</p></li><li><p><strong><a href="https://www.theverge.com/science/931766/arxiv-ai-slop-ban-researchers">ArXiv banned researchers for AI-generated papers</a></strong> | The Verge &#8212; Academic publishing&#8217;s authentication problem now has teeth, but detection is still losing the arms race.</p></li><li><p><strong><a href="https://www.theverge.com/tech/929091/meta-ai-threads-account-block">Meta embedded AI in Threads and won&#8217;t let users block it</a></strong> | The Verge &#8212; Captive distribution at 3B+ users, no opt-out.</p></li><li><p><strong><a href="https://openai.com/index/what-parameter-golf-taught-us/">OpenAI Parameter Golf results: 1,000+ participants, agents everywhere</a></strong> | OpenAI &#8212; An ML challenge where the vast majority of submitters used coding agents. OpenAI built a Codex-based triage bot to handle the submission volume.</p></li><li><p><strong><a href="https://www.tomshardware.com/tech-industry/cyber-security/apple-m5-architecture-suffers-first-privilege-escalation-exploit-anthropics-claude-mythos-helps-researchers-bypass-memory-integrity-enforcement">Claude Mythos cracked Apple&#8217;s M5 memory security in five days</a></strong> | Tom&#8217;s Hardware &#8212; First privilege escalation exploit on M5. Apple spent half a decade building Memory Integrity Enforcement. Standard user to root access.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/09/nvidia-has-already-committed-40b-to-equity-ai-deals-this-year/">Nvidia committed $40B in equity AI investments in 2026</a></strong> | TechCrunch &#8212; Not just selling chips. Acquiring stakes in the companies that consume the most of them.</p></li><li><p><strong><a href="https://www.anthropic.com/research/2028-two-scenarios">Anthropic published &#8220;2028: Two scenarios for global AI leadership&#8221;</a></strong> | Anthropic &#8212; A policy paper on US-China AI competition. Anthropic is writing geopolitics now.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/931500/youtube-ai-deepfake-detection-tool">YouTube expanding AI deepfake detection to all adult users</a></strong> | The Verge &#8212; The detection side is scaling up.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/931200/google-spam-rules-ai-manipulation">Google updated spam rules to include AI manipulation attempts</a></strong> | The Verge &#8212; SEO for the age of AI-generated content.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Multi-Agent Account Planning That Learns Across Deals]]></title><description><![CDATA[Fifteen agents across five phases, with a decision-records harness that compounds insight. A working guide to multi-agent orchestration on Claude Managed Agents.]]></description><link>https://www.anothercodingblog.com/p/multi-agent-account-planning-that</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/multi-agent-account-planning-that</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Fri, 15 May 2026 15:33:51 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0e28dd80-4edc-442f-be2f-1a0ed1bc6415_1200x630.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Intro</h2><p>Anthropic shipped multi-agent orchestration in Managed Agents on May 6th. An agent can be configured as a coordinator with a roster of other agents it can delegate to, and the platform handles fan-out, child-thread lifecycle, parallel execution, and per-thread observability.</p><p>Anthropic also shipped a management console. Every agent, session, child thread, and memory write is browsable, with full transcripts, tool calls, and version history inspectable on click. That console shaped how I built the system, because the logging I would have written myself was already there.</p><p>The use case I built is account planning in B2B SaaS sales. The vendor is a fictional company, Yardstick AI, selling an AI evaluation platform. The prospect is Vercel, a real company with a public footprint rich enough to give the agents something genuine to research.</p><p>The system has fifteen agents organized into a five-phase pre-meeting orchestration plus a post-meeting debrief loop. The pre-meeting flow has two genuine decision steps where the coordinator chooses what runs next based on what just came back, not a fixed sequence.</p><p>It uses MCP servers (Notion, Slack), the Anthropic vault for credentials, two memory stores (a playbook and a decision-records corpus), custom HTTP tools for a mock CRM and enrichment service, and the built-in web search and fetch tools.</p><p>Most of the system&#8217;s analytical work happens in the layer of decision records that the agents read from and write into. The records get captured two ways.</p><p><strong>Implicitly</strong>, the system infers decisions from CRM record changes, activity logs, and other signals that move without anyone narrating them.</p><p><strong>Explicitly</strong>, after each meeting, the system uses the full account plan plus the surrounding events (calendar entries, CRM stage moves, recent activity) to compose a curated set of questions for the rep. The questions are shaped by what the system already knows about the account, so they target the specific decisions most likely to produce useful data instead of asking generic &#8220;how did it go&#8221; prompts.</p><p>Whichever way a record gets created, it lives in a shared memory store that the next account&#8217;s run can retrieve and reason from. That is the difference between a system that gives you one prep brief and a system that gets better at giving you prep briefs as it accumulates evidence.</p><p>This post documents what I built, what worked, what did not, and what the costs and constraints actually look like once you push past the basic demo.</p><p>Below is a capture of the final product:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pVDb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" width="728" height="611.1" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:873,&quot;width&quot;:1040,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:213125,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>What you&#8217;ll learn</h2><p>This post walks through what I learned building a multi-agent system in Anthropic Managed Agents. The official documentation covers the basics. This post covers what comes after that: how the primitive holds up when you push it against a real, multi-source, multi-phase problem. By the end you should have a clearer sense of when this architecture is worth using and what it takes to make it work.</p><p>Concretely:</p><ul><li><p><strong>What multi-agent really is inside the platform.</strong> The shape of the architecture, where the limits actually sit, and what the docs do not yet spell out.</p></li><li><p><strong>How the system remembers things during a run versus across runs.</strong> Two different kinds of memory live side by side, and a real system has to be deliberate about where each finding goes.</p></li><li><p><strong>Why use multi-agent over a workflow.</strong> When the coordinator&#8217;s runtime decisions justify the complexity, and when they do not.</p></li><li><p><strong>How decision records make the system compound.</strong> A structured corpus of recommendations and their resulting decisions turns each run into evidence the next run can use.</p></li><li><p><strong>The agent harness.</strong> Everything you build around the platform primitives to make the system work for your use case: the MCP servers you connect, the record schemas your corpus enforces, the system prompts that define each agent&#8217;s job, the routing logic the coordinator follows, the briefings it hands to each agent.</p></li><li><p><strong>Async surfaces via MCP.</strong> How Slack becomes part of the system through MCP, so the rep can capture decisions in-place after a meeting without a custom bot.</p></li><li><p><strong>The distillation problem.</strong> Why the system&#8217;s raw output is not usable on its own, and what has to happen to make it useful to a human in thirty minutes.</p></li><li><p><strong>Cost and observability.</strong> Per-thread spend, total cost for a full run, and what the Managed Agents console gives you for free.</p></li><li><p><strong>Honest findings.</strong> Pitfalls a builder should expect to hit on their first run.</p></li><li><p><strong>When this is the right tool, and when it isn&#8217;t.</strong> What kinds of problems multi-agent orchestration fits, and what kinds belong with a simpler architecture.</p></li></ul><div><hr></div><h2>Section 1: The work of account planning</h2><p>An account executive working a B2B SaaS deal is doing one job continuously and several others on top of it. The continuous job is synthesis. At any moment in a pursuit, an AE is holding context across half a dozen sources: their own notes from past calls, the CRM record with its stages and activity log, public signals (product launches, hires, press), conference encounters and hallway intel, backchannel from people who used to work there, win and loss patterns from similar accounts, and their own company&#8217;s internal playbook. None of these sources are formatted alike, refresh on the same cadence, or answer the same questions week to week.</p><p>The job sits on top of a rhythm of meetings. Before each meeting, the rep does pre-meeting prep. After each meeting, the rep does post-meeting capture. Between meetings, follow-up. The cadence is continuous, across fifteen to thirty active accounts at any given time. Even the most disciplined AE admits the synthesis happens in their head more than on paper, and the capture happens only when there is slack to capture.</p><p>What makes this work a candidate for multi-agent orchestration is the shape of the synthesis problem: the sources decompose naturally by role. Reading internal Notion notes, researching the company on the public web, mapping the org chart, and synthesizing all of it against a playbook are four different jobs. Each role wants a different tool surface, and each role&#8217;s output is most useful when it is separate from the others until the synthesis step. Running them in parallel saves wall-clock time, but the more interesting property is that each role can be a focused agent with a small system prompt and a tight tool surface, rather than one generalist agent trying to be five things at once.</p><p>The 30-minute pre-meeting slice is the moment in this rhythm where multi-agent orchestration is most legible. The rep has a calendar event coming up. They want a brief that consolidates what is knowable from everywhere into something they can read in five minutes, prepare around in twenty, and act on in the meeting itself. That is the moment this post centers on, but the architecture supports the broader cadence around it.</p><div><hr></div><h2><strong>Section 2: What multi-agent in Managed Agents actually is</strong></h2><p>Most coverage of &#8220;agents&#8221; uses the term to cover everything from a single Claude call to a fully autonomous AI team that plans its own work. Anthropic&#8217;s multi-agent feature is neither extreme. It is a specific pattern with specific constraints, and the constraints are worth knowing before you build against it.</p><h4><strong>The shape: coordinator with a roster</strong></h4><p>One agent is the <strong>coordinator</strong>. Its definition includes a list of other agents it is allowed to delegate to. That list is called the <strong>roster</strong>. A few specific limits:</p><ul><li><p>The roster can hold up to 20 agents.</p></li><li><p>The coordinator can call multiple copies of any agent on the roster.</p></li><li><p>A session can have up to 25 active threads running at once.</p></li><li><p>Specialists cannot delegate to other specialists. The architecture is flat, not nested (Anthropic&#8217;s docs phrase it as <em>&#8220;depth &gt; 1 is ignored&#8221;</em>).</p></li></ul><p>If you came in expecting agents that delegate to agents that delegate to agents, the spec corrects you on page one. What you get is a flat fan-out from a single coordinator. For most real systems this is the right tradeoff.</p><h4><strong>Threads: how the system stays organized</strong></h4><p>A <strong>thread</strong> is a separate, isolated conversation that belongs to one agent. Each thread has its own history and tools. Threads don&#8217;t share anything with each other, even though they all run inside the same session.</p><p>Two kinds:</p><ul><li><p>The <strong>primary thread</strong> is the coordinator&#8217;s own thread. It also doubles as the activity feed for the whole session.</p></li><li><p>A <strong>child thread</strong> is created when the coordinator delegates to a specialist. The platform copies the session&#8217;s tools and credentials onto that thread, and the specialist&#8217;s work runs there.</p></li></ul><p>When the coordinator delegates to multiple specialists in the same turn, the child threads run in parallel. The coordinator waits for each reply before deciding what to do next. You don&#8217;t write any of the glue code for this. The decision-making that would normally live in a script lives inside the coordinator&#8217;s prompt.</p><h4><strong>Thread lifecycle</strong></h4><p>A thread moves through three states:</p><ul><li><p><strong>Running</strong>: the specialist is actively working.</p></li><li><p><strong>Idle</strong>: the specialist has finished but the thread is still alive. It counts against the 25-thread cap.</p></li><li><p><strong>Archived</strong>: you have told the platform you are done with the thread. The slot is freed.</p></li></ul><p>For most builds, the 25-thread cap is generous enough that you never think about lifecycle. Systems that lean hard on parallel work have to treat archiving as part of the orchestration.</p><h4><strong>Idle threads stay alive, which enables follow-ups</strong></h4><p>Because an idle thread is not gone, the coordinator can send a follow-up message to a specialist it called earlier. The specialist keeps its full context from before. That means the architecture supports more than one round of back-and-forth per specialist, not just one-shot delegation. I did not use this in the build, but in retrospect there are several places it would have helped.</p><h4><strong>Two kinds of memory</strong></h4><p>The system has two layers of memory that work on different time scales:</p><ul><li><p><strong>Persistent threads</strong> keep a specialist&#8217;s context alive within a session. The moment the session ends, the threads are gone.</p></li><li><p><strong>Memory stores</strong> persist across sessions. They are objects shared across the whole workspace, mounted onto a session when it starts. Anything written into one stays available to the next run that mounts the same store.</p></li></ul><p>A real multi-agent build needs both.</p><h4><strong>Designing the split</strong></h4><p>The design split lives in two questions:</p><ul><li><p>Within a session: which specialists do you keep alive for a follow-up, and which do you fire once and let go?</p></li><li><p>Across sessions: which findings deserve to be promoted into a memory store, and which can evaporate when the session ends?</p></li></ul><p>The platform gives you the building blocks for both. It does not decide which findings belong where. Get that split wrong and you pay either way:</p><ul><li><p>Throw away thread context too early, and you re-brief the specialist on every follow-up.</p></li><li><p>Fail to promote findings into a store, and the next session starts cold on everything you already learned.</p></li></ul><p>Our build leans heavily on the cross-session side. Most of the analytical work in this system comes from the decision-records corpus, which is the through-line for the rest of this post.</p><div><hr></div><h2><strong>Section 3: The agent architecture</strong></h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dbJu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dbJu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:55927,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dbJu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 424w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 848w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1272w, https://substackcdn.com/image/fetch/$s_!dbJu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc4ba98e4-6ff1-47d9-87a7-da83032556ad_1200x630.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pre-meeting orchestration uses thirteen agents: one lead orchestrator plus twelve specialists in its roster. The post-meeting debrief loop adds two more agents that sit outside the coordinator entirely. Fifteen across the system.</p><p>Pre-meeting work is a tightly scoped synthesis problem that benefits from a coordinator. Post-meeting work is a slower, human-paced loop that does not benefit from coordination at all, just two single-purpose agents that read and write a shared corpus.</p><p>The pre-meeting run breaks into five phases, sequential at the coordinator level and parallel within. The coordinator narrates each phase boundary as it runs, which makes its reasoning visible and forces the model into a structured plan rather than letting it improvise.</p><h4><strong>Phase 1: gather context and pull prior records</strong></h4><p>Five specialists fan out concurrently:</p><ul><li><p><strong>meeting-context</strong>: reads internal Notion notes through Notion MCP.</p></li><li><p><strong>external-researcher</strong>: pulls public signals from the web.</p></li><li><p><strong>stakeholder-analyst</strong>: maps decision-makers via a mock enrichment service.</p></li><li><p><strong>engagement-readiness</strong>: hits a mock CRM for outreach history.</p></li><li><p><strong>decision-retriever</strong>: runs against the shared decision-records corpus and pulls prior decision records from past accounts that match the current account&#8217;s shape (by attribute overlap: industry, competitor present, champion profile, procurement complexity, and so on).</p></li></ul><h4><strong>Phase 2: conditional topic education</strong></h4><p>The coordinator inspects what Phase 1 surfaced and picks two to four technical topics worth briefing the rep on before the meeting. For the Vercel run, those topics included cross-provider eval methodology, agent eval, AI observability, and eval-driven CI.</p><ul><li><p><strong>topic-educator</strong>: runs against the curated topic list and returns a primer per topic, each ending with smart questions the rep can ask in the room.</p></li></ul><p>If the account does not warrant it, the coordinator skips Phase 2 entirely.</p><h4><strong>Phase 3: synthesis</strong></h4><ul><li><p><strong>opportunity-risk</strong>: receives everything Phase 1 and Phase 2 produced, mounts the read-only Yardstick playbook from a memory store, reads the prior decision records the retriever pulled in Phase 1, and writes the structured pursuit plan. The plan covers ICP fit, buying triggers, stakeholder map and sequencing, first-meeting hypothesis, recommended plays, and disqualifiers.</p></li></ul><h4><strong>Phase 3.5: next-best-action selection</strong></h4><p>After the synthesis is in, the coordinator does not jump straight to recording. It asks one more specialist, the chooser, to decide which concrete recommendations are warranted for this specific account.</p><ul><li><p><strong>next-best-action-chooser</strong>: reads the synthesis plus the prior decision records the retriever pulled in Phase 1, decides which of three specialized recommenders to invoke, and writes a focused brief for each. The chooser can also skip a recommender, with a reason. A different account with different synthesis and different prior records produces a different plan.</p></li></ul><p>The three recommenders available to the chooser:</p><ul><li><p><strong>stakeholder-recommender</strong>: sequencing or lead-play.</p></li><li><p><strong>pricing-recommender</strong>: pricing strategy.</p></li><li><p><strong>competitive-recommender</strong>: competitive positioning or risk mitigation.</p></li></ul><h4><strong>Phase 4: parallel recommendation generation</strong></h4><p>The coordinator dispatches whichever recommenders the chooser named. They run in parallel. Each one produces a single Recommendation Record (RR) as a markdown draft with strict YAML frontmatter and a <code>cited_records</code> block listing the prior decision records whose outcomes informed this recommendation. The recommenders hand drafts back to the coordinator; they do not write to the corpus themselves.</p><h4><strong>Phase 5: decision recording</strong></h4><ul><li><p><strong>decision-recorder</strong>: receives the RR drafts, validates each one against the schema, checks every cited prior decision record exists in the corpus, writes the validated records to <code>/mnt/memory/yardstick-decisions/</code>, and updates the corpus index.</p></li></ul><p>Splitting content generation (the recommenders) from persistence (the recorder) keeps each role focused.</p><h4><strong>Post-meeting: the debrief loop</strong></h4><p>That accounts for the thirteen pre-meeting agents. The remaining two run on the post-meeting side:</p><ul><li><p><strong>debrief-asker</strong>: reads the next-best-action RRs the pre-meeting run produced, picks the open questions still unresolved, formats them as a curated set, and posts them into a Slack channel through the Slack MCP server. The rep replies in the thread on their own time.</p></li><li><p><strong>debrief-synthesizer</strong>: once there are replies, reads the Slack thread, parses the rep&#8217;s answers, and writes Decision Records into the corpus with the <code>linked_rr</code> field pointing back to the originating RRs.</p></li></ul><p>Neither sits in the coordinator&#8217;s roster because neither runs synchronously with the pre-meeting flow. They run on a human-paced timescale, possibly hours or days later. Coordinating them through the same session would require keeping a session open across days or weeks, which the platform does not support. The cleaner shape is two single-purpose agents that share the corpus as their interaction substrate.</p><div><hr></div><h2><strong>Section 4: What the platform gives you for observability</strong></h2><p>Most multi-agent demos require you to build your own logging before you can debug them. Managed Agents takes the opposite stance. Anthropic ships a management console that turns every agent, every session, every child thread, and every memory write into a click-through artifact you can inspect without writing any instrumentation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-dmo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-dmo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 424w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 848w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic" width="1456" height="741" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:741,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:63044,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/heic&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-dmo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 424w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 848w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1272w, https://substackcdn.com/image/fetch/$s_!-dmo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb29eb23-4163-4c8c-b9e3-c75bc3e0f759_1655x842.heic 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The console is structured around the platform&#8217;s primary objects. The Agents tab lists every agent you have created with its system prompt, declared MCP servers, custom tools, and toolsets all inspectable on click. Versioning is built in. The Sessions tab shows every session with the coordinator&#8217;s primary thread and every child thread enumerated, status per thread, full transcripts including the model&#8217;s reasoning content, and every tool call shown inline with its inputs and outputs. The Memory Stores tab tracks version history so any write to the decision-records corpus is auditable end to end.</p><p>At runtime, the same data is available programmatically through the events API. The session-level stream gives you a condensed feed across the whole session. Per-thread streams give you raw event sequences for any specialist. The three events that matter for fan-out observability are <code>session.thread_created</code>, <code>agent.thread_message_received</code>, and <code>session.thread_status_idle</code>. Stringing those together gives you the fan-out timeline of the whole run without writing a single instrumentation line.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0hKK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0hKK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 424w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 848w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1272w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png" width="1456" height="723" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:723,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:140031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0hKK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 424w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 848w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1272w, https://substackcdn.com/image/fetch/$s_!0hKK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0a10124-56b8-4641-ae7b-e479bcfcc8a2_1571x780.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Cost data is similarly structured. Every event carries usage data scoped to the thread that produced it. The full Vercel run cost $5.51 across the pre-meeting orchestration. Thirteen agents sit in the roster, but the conditional dispatch in Phase 3.5 chose to invoke only eleven of them for this account (one recommender was skipped on substance).</p><p>The cost shape is what the chart makes obvious. The lead-orchestrator dominates at $1.21, because it is the one thread that accumulates context across every phase. The two heaviest specialists are external-researcher and topic-educator at about $0.79 each, both driven by web-tool use rather than cumulative context. The Phase 4 recommenders, the Phase 3 synthesis, and the Phase 5 decision-recorder cluster in the $0.40 to $0.45 range, each receiving the cumulative context from prior phases plus the prior decision records the retriever pulled in Phase 1. The remaining Phase 1 specialists sit at $0.28 or below. Wall-clock was about fifteen minutes from prompt to final answer.</p><div><hr></div><h2><strong>Section 5: What multi-agent gives you that a workflow can&#8217;t</strong></h2><p>Multi-agent orchestration is only worth using when the coordinator makes a real decision between phases. If your design fans out, waits for results, and synthesizes them, you have built parallel API calls dressed up as a multi-agent system. The platform&#8217;s complexity (extra threads, longer latency, harder debugging) buys you nothing a sequential workflow couldn&#8217;t already do.</p><p>The thing that justifies the complexity is the moment the coordinator pauses, looks at what the previous phase produced, and decides what should happen next. That decision is the part a workflow cannot replicate, because a workflow has to know in advance what it is going to do.</p><p>In our build, there are two such decision steps.</p><p>The first lives between Phase 1 and Phase 2. Phase 1 fans out five specialists to read the account from five angles. The coordinator collects their output, pauses, and picks two to four topics worth briefing the rep on before the meeting. For Vercel, the coordinator chose cross-provider eval methodology, agent eval, AI observability, and eval-driven CI. None of those topics are defined anywhere in advance. They are picked from what Phase 1 surfaced about this specific account. A different account would produce a different list, or no list at all, in which case the coordinator skips Phase 2 entirely.</p><p>The second lives between Phase 3 and Phase 4. After opportunity-risk produces the synthesis, the coordinator dispatches the next-best-action-chooser, which reads the synthesis plus the prior decision records the retriever pulled in Phase 1 and decides which of three specialized recommenders to invoke: stakeholder, pricing, or competitive. On the Vercel run the chooser invoked stakeholder-recommender and competitive-recommender, and skipped pricing-recommender with the reason that the $42K pilot structure was already validated. Skipping with a substantive reason is what separates a real decision from a conditional that always fires.</p><p>The coordinator narrates each decision as it happens, which makes the reasoning visible:</p><blockquote><p><em>Phase 1 specialists are back. External-researcher found public Braintrust endorsement at Vercel that the internal Notion notes treated as a stalling competitor. Phase 2 launched. Topic-educator is building primers on cross-provider eval, agent eval, AI observability, and eval-driven CI based on what surfaced.</em></p><p><em>Phase 3.5 complete. Invoking stakeholder-recommender (sequencing) for the May 21 call sequencing and Tom-Becker cultivation. Invoking competitive-recommender (competitive_positioning) for the Braintrust counter-offer scenario. Skipping pricing-recommender: $42K structure already validated, pricing isn&#8217;t the next decision point.</em></p></blockquote><p>That kind of reasoning is what tells you the coordinator is actually orchestrating rather than executing. A workflow could fan out the same specialists in parallel. It could even hard-code the topic-educator and recommender steps. What a workflow cannot do is pick which topics to brief on this turn for this account, or which recommenders are warranted given what the synthesis just surfaced. Those decisions require a model with the full context loaded, which is exactly what the coordinator is.</p><div><hr></div><h2><strong>Section 6: Decision records: the layer that compounds</strong></h2><p>A memory store by itself is just structured storage. What turns it into a system that compounds across runs is the contract you define for what gets written into it. In our build, that contract is a pair of record types: Recommendation Records (RRs) and Decision Records (DRs). Anthropic provides the memory store. You decide what goes in it and how it is structured.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Every Recommendation Record is created <strong>before</strong> the meeting. It is what the system thinks the rep should do.</p><p>Every Decision Record is created <strong>after</strong> the meeting. It is what the rep actually did and what came of it.</p><p>The DR points back to the RR it resolved through a <code>linked_rr</code> field. That pairing is the chain the system learns from: recommendation &#8594; decision &#8594; outcome. Future runs can see both what was recommended and how it actually played out, which is what makes the corpus more than a logbook.</p><p>The schemas are strict YAML frontmatter on top of a markdown body, and the format is doing two jobs at once.</p><p>The YAML half is what makes the records queryable. Every key field, account, date, decision_type, account_attributes, is structured as a typed key/value pair, which means the decision-retriever can filter the corpus by exact attribute match. Without that structure, the retriever would be doing fuzzy text search over freeform prose, and matches would be unreliable. With it, &#8220;find me prior pricing decisions where procurement_complexity is vp_signoff&#8221; becomes a clean lookup.</p><p>The markdown body below the YAML is where the longer-form reasoning lives: the context, the rationale, the alternatives considered, the lessons in the generalized pattern. That part does not need to be queryable, just readable.</p><p>YAML specifically is doing one more useful thing: it is a format Claude (and most LLMs) handle natively, which means the recommender agents can produce schema-conformant frontmatter reliably without you needing a custom serializer. Together, the format gives you a record that is queryable from above and human-readable below.</p><h4><strong>Recommendation Record schema</strong></h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;d47a1960-0a69-4abe-874a-b6a6e656ab34&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">---
id: rr-{YYYY-MM-DD}-{account-lower}-{decision_type}
record_type: recommendation
schema_version: v1
account: {account_name}
date: {YYYY-MM-DD}
generated_by: {recommender agent name}
decision_type: {sequencing | lead_play | pricing | competitive_positioning | first_meeting_hypothesis | disqualification_threshold | risk_mitigation}
account_attributes:
  stage, size_band, ai_surface_area, buy_or_build_culture,
  competitor_present, competitor_depth, champion_profile,
  new_leadership_window, procurement_complexity
linked_dr: null
cited_records:
  - prior_rr: null
    prior_dr: dr-{YYYY-MM-DD}-{account}-{decision_type}
    prior_outcome: one-line outcome from the DR's outcome.notes field
    relevance: which attributes match
    lesson_applied: one-line lesson taken from the DR's Generalized pattern
---

## Context
## Findings that supported this recommendation
## Recommendation
## Reasoning
## Alternatives considered
## Generalized pattern
</code></pre></div><h4><strong>Decision Record schema (same shape as RR, with these fields added)</strong></h4><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;6fa0346e-6444-495d-83d2-48afc901799d&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">record_type: decision
linked_rr: rr-{...}    # backfills the chain in the other direction
outcome:
  status: {closed_won | closed_lost | stalled | pending | unknown}
  status_date: {YYYY-MM-DD or null}
  acv_usd: {number or null}
  notes: one-line description of outcome
</code></pre></div><p>Body sections add <code>## What was decided</code>, <code>## Outcome</code>, and <code>## Retrospective note</code>. The <code>Generalized pattern</code> section gets rewritten once the outcome is known, so the pattern is <em>validated</em> rather than hypothesized.</p><p>The <code>account_attributes</code> block is the filter the decision-retriever uses in Phase 1. When the system runs against a new account, the retriever filters the corpus for records whose attributes overlap. A new mid-market developer-tools account with a Braintrust competitor and a staff-engineer champion will pull back both the Vercel records and the Datadog records as prior decisions worth reasoning over. The retriever does not care whether the original account is Datadog or Vercel. It cares whether the shape of the account is similar enough to learn from.</p><p><strong>The cited_records block is what makes the chain visible.</strong> Every RR carries an explicit list of prior DRs whose outcomes informed this specific recommendation. Each entry names four things:</p><ul><li><p><code>prior_dr</code> id, which record is being cited</p></li><li><p><code>prior_outcome</code>, what happened (so the result behind the lesson is visible)</p></li><li><p><code>relevance</code>, which <code>account_attributes</code> matched</p></li><li><p><code>lesson_applied</code>, the one-line rule the recommender is carrying forward</p></li></ul><p>Multiple cited records may appear if the recommendation draws on more than one prior record. A reader of any RR can trace the reasoning back to the cited prior records by id, not by hand-waving.</p><h4><strong>Implicit and explicit capture of enterprise decisions</strong></h4><p>Records get into the corpus two ways.</p><p><em>Implicitly</em>, through CRM record changes and activity logs the system watches without anyone narrating them. A stage change, a contract uploaded, a deal closed-won or closed-lost is itself a decision signal. The decision-recorder can infer a DR from those signals and write it with <code>outcome.notes: inferred from CRM stage change</code>. Implicit capture catches the cases where the rep forgot to debrief but state moved anyway. The records are useful but carry less reasoning, because no one narrated the why.</p><p><em>Explicitly</em>, through a post-meeting debrief loop where the system asks the rep curated questions in Slack and the rep replies in-thread. The records that come out of explicit capture carry the rep&#8217;s own reasoning in their voice, which makes them the richest data the corpus has. Chapter 7 covers the mechanics of that loop in detail.</p><h4><strong>Cross-account learning in practice (from the actual run)</strong></h4><p>The Vercel pre-meeting run generated two Recommendation Records, one from the stakeholder-recommender and one from the competitive-recommender. Each one carries a cited_records block linking it to specific Datadog DRs by id. The sequencing RR&#8217;s cited_records block, taken directly from the corpus:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;67f0fdf3-43e6-42d7-9d0d-a82d302b50d7&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">cited_records:
  - prior_rr: null
    prior_dr: dr-2025-07-22-datadog-sequencing
    prior_outcome: "VP only met us once, at the closing call, with champion presenting the case."
    relevance: "champion_profile=staff_eng_with_pain, sequencing, procurement_complexity=vp_signoff"
    lesson_applied: "Do not engage the buyer directly when champion has standing with buyer. Equip the champion with internal proposal materials and let them own the internal sell."
  - prior_rr: null
    prior_dr: dr-2025-09-12-datadog-risk-materialized
    prior_outcome: "Risk materialized in week 5; recovery move worked. Deal closed but 10 days later than original target."
    relevance: "champion_profile=staff_eng_with_pain, single-threaded risk, secondary contact cultivation"
    lesson_applied: "Secondary contact cultivation should be a pre-meeting deliverable, not a contingency. The secondary needs genuine engagement (their own use case), not just awareness."
</code></pre></div><p>The Reasoning section of the same RR cites those records by id in the body, not just in the frontmatter:</p><blockquote><p><em>dr-2025-07-22-datadog-sequencing: Champion-led internal sell. VP met rep once at closing call. Direct structural match, Priya carrying to Marcus. Differs because Marcus is new (3 months in) and Priya&#8217;s standing with him is untested. Adaptation: explicit checkpoint and escalation triggers.</em></p><p><em>dr-2025-09-12-datadog-risk-materialized: Secondary contact cultivation saved the deal when champion went on leave. At Vercel, Tom Becker is the designated secondary with genuine AI Gateway/Production Monitor use case. Cultivation begins May 21, not mid-POC.</em></p></blockquote><p>That paragraph is the entire reason the corpus exists. The system pulled two specific records from a different account, identified the load-bearing attributes, and applied the lessons with an adaptation for the Vercel-specific situation. It is structured reasoning over a corpus of prior decisions, filtered by attributes the engineer chose to make filterable.</p><p>The competitive-positioning RR follows the same shape, citing <code>dr-2025-08-10-datadog-competitive</code> and <code>dr-2025-07-15-datadog-lead-play</code>. Between the two RRs, the Vercel run cited four distinct Datadog DRs by id, with eight distinct lessons applied. None of that reasoning is hand-waved. All of it is structurally traceable.</p><h4><strong>Why this layer compounds</strong></h4><p>The platform&#8217;s memory store is durable, but durability alone does not produce learning. What produces learning is the schema contract that makes every write structurally identical and every read filterable. Once that contract exists, every run adds to the corpus, and every subsequent run benefits. The first Vercel run cited four Datadog DRs. The second Vercel run will also be able to cite the first Vercel run&#8217;s records. The third will cite both. The system gets better at giving you prep briefs because the substrate it draws on is growing in a way the retriever can actually use, and because every recommendation it generates is structurally tied to the prior records behind it.</p><div><hr></div><h2><strong>Section 7: The async loop</strong></h2><p>The pre-meeting run finishes in fifteen minutes. The deal does not. After the call, the rep has information that did not exist before the meeting started, and the system needs a way to capture it. The capture step does not belong inside the pre-meeting orchestration. It runs on a fundamentally different timescale, against a different surface, with a different participant in the loop.</p><p>The build uses Slack as that surface and two standalone agents to run the loop: debrief-asker and debrief-synthesizer. Neither one sits in the coordinator&#8217;s roster. Both are agents in the same workspace, configured the same way as the pre-meeting specialists, but invoked independently when triggered.</p><h4><strong>The asker: curated questions, not generic prompts</strong></h4><p>After the meeting (or after a CRM event signals that a recommendation is due for resolution), debrief-asker runs. It is a standalone Managed Agents agent connected to the workspace&#8217;s Slack instance through the Slack MCP server. The asker reads the open RRs for the account, looks at the surrounding context (the recommendation made, the current account state, recent activity logs, calendar entries), and composes a curated set of debrief questions that target the specific decisions the RR was about.</p><p>The questions are not generic. They are shaped by what the system already knows about the account and which decisions are actually open. If the synthesis recommended a pricing structure but the CRM shows the deal has already moved to negotiation, the asker does not ask &#8220;did you discuss pricing&#8221;, it asks &#8220;did the $42K structure hold, and what did Marcus say about the legal-review path.&#8221; If a calendar entry shows a meeting happened with a stakeholder the system did not originally surface, the asker adds a question about that. The questions are surgical because the system already knows enough about the account to ask the right one.</p><p>The asker posts the curated set into a Slack channel scoped to that opportunity, so each deal has its own thread of capture. The rep replies in the thread whenever they have time. There is no UI to learn and no form to fill out.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GuNF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GuNF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 424w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 848w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1272w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png" width="1407" height="814" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:814,&quot;width&quot;:1407,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:350188,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GuNF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 424w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 848w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1272w, https://substackcdn.com/image/fetch/$s_!GuNF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8546a399-4ed6-4ba7-853f-52a10c63f846_1407x814.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>The synthesizer: schema-strict capture in the rep&#8217;s voice</strong></h4><p>Once there are replies, debrief-synthesizer runs. It reads the Slack thread through the same MCP server, parses the rep&#8217;s answers, and writes one Decision Record per resolved recommendation. The DR carries the rep&#8217;s reasoning in their own voice, plus a <code>linked_rr</code> pointer back to the originating RR. If the rep&#8217;s answer is ambiguous, the synthesizer marks the DR <code>outcome.status: unknown</code> rather than guessing. Schema integrity is more important than coverage.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5eUu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5eUu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 424w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 848w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1272w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png" width="1456" height="732" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:732,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:207872,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5eUu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 424w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 848w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1272w, https://substackcdn.com/image/fetch/$s_!5eUu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c5ee08b-ee3a-4ea7-bf2f-990d5d4861db_1474x741.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h4><strong>The Slack MCP gotcha</strong></h4><p>The Slack MCP setup has one practical gotcha worth flagging. Slack MCP rejects bot tokens (<code>xoxb-</code>); it requires user tokens (<code>xoxp-</code>). The OAuth flow needs the <code>user_scope</code> parameter to capture a user-token, which the Anthropic vault stores as a <code>static_bearer</code> credential. The Slack app also has to be explicitly enabled at <code>api.slack.com/apps/{app-id}/app-assistant</code> for MCP access. None of this is in the Slack MCP getting-started docs at the time of writing.</p><h4><strong>The corpus is the integration point</strong></h4><p>The corpus is how the two flows connect. The pre-meeting orchestration writes RRs to it. The post-meeting agents read those RRs back, capture the rep&#8217;s debrief, and write DRs that point to the originating recommendation through <code>linked_rr</code>. The two flows never talk to each other directly. They just write to and read from the same store.</p><div><hr></div><h2><strong>Section 8: The distillation layer</strong></h2><p>The output of an eleven-agent pre-meeting run is roughly eighty kilobytes of structured content across the orchestrator&#8217;s synthesis, the topic primers, the recommender RRs, and the supporting specialist outputs. A rep with thirty minutes before a meeting is not going to read eighty kilobytes. The system has done good work, but the work is locked up in an internal representation.</p><p>The second half of the architecture is the distillation layer: the part that reads the corpus and the run&#8217;s outputs and renders them into something a human can actually consume. In the build, that is <code>build_dashboard.py</code>, a script that produces a single static HTML page styled like a rep&#8217;s internal briefing document.</p><p>The dashboard pulls each specialist&#8217;s final reply from the events API and the corpus&#8217;s RRs from the memory store and lays them out as:</p><ul><li><p>An account header (status, next meeting, owner)</p></li><li><p>The Phase 3 pursuit plan (opportunity-risk&#8217;s structured output)</p></li><li><p>The Phase 4 next-best-action RRs (each one with its <code>cited_records</code> inline, so the cited prior records are visible at a glance)</p></li><li><p>The Phase 2 topic primers (with smart questions for the meeting)</p></li><li><p>The stakeholder map (with named contacts and risk factors)</p></li><li><p>A collapsible &#8220;underlying intel&#8221; section (meeting-context plus external-researcher&#8217;s raw findings)</p></li><li><p>A sidebar showing the coordinator&#8217;s phase-by-phase narration log</p></li><li><p>A footer with session id, total cost, and a link to the Managed Agents console for the run</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pVDb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png" width="728" height="611.1" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:873,&quot;width&quot;:1040,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:213125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197844482?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pVDb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 424w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 848w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1272w, https://substackcdn.com/image/fetch/$s_!pVDb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3517d8f2-8bb8-49e4-98df-25e2840c4875_1040x873.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What the rep gets when they open the dashboard is a brief they can read in five minutes and act on in thirty. The pursuit plan tells them the play for the meeting. The recommendation cards spell out what to do next, each one with the cited prior records visible inline so the historical evidence sits right next to the recommendation. The topic primers give them the vocabulary they need to sound informed, each ending with a question they can ask in the room. The stakeholder map names the people they will encounter and what each one cares about. The sidebar shows the system&#8217;s narration, so any part of the reasoning is open to interrogate if the rep wants to dig in.</p><div><hr></div><h2><strong>Section 9: What we learned, and when to use this</strong></h2><p>The five most important things we took away from this build.</p><h4><strong>1. The corpus compounds across runs.</strong></h4><ul><li><p>Each run writes new records to the corpus. The next run filters the corpus by attribute overlap (industry, competitor, champion profile, procurement complexity, and so on) and pulls the most relevant prior records as input.</p></li><li><p>The first Vercel run cited four Datadog records by id, with eight specific lessons applied. Future runs will cite both the Datadog records and the Vercel ones.</p></li><li><p>Retrieval is deterministic and auditable. You can see exactly which prior records matched and why.</p></li></ul><h4><strong>2. The cited_records chain makes every recommendation auditable.</strong></h4><ul><li><p>Every recommendation carries a <code>cited_records</code> list with <code>prior_dr</code>, <code>prior_outcome</code>, <code>relevance</code>, and <code>lesson_applied</code> fields.</p></li><li><p>Anyone reviewing a record can see which past decisions informed the recommendation and what specifically was carried forward from each.</p></li><li><p>The reasoning is traceable to specific past decisions by id.</p></li></ul><h4><strong>3. The decision step is what makes the system multi-agent.</strong></h4><ul><li><p>The coordinator inspects what each phase produced and decides what runs next.</p></li><li><p>On the Vercel run, the Phase 3.5 chooser invoked two of three recommenders and skipped the third with a substantive reason. That skip with a reason is the proof the decision step is real.</p></li></ul><h4><strong>4. The agents do their own research. Ask them what they found.</strong></h4><ul><li><p>The web-research agent went beyond the internal Notion notes and found Vercel&#8217;s CTO publicly endorsing Braintrust on the company blog. The synthesis flagged the original source as biased and reframed the position.</p></li><li><p>Adding one prompt at the end of the orchestrator&#8217;s narration (&#8221;if anything surprised you, note it&#8221;) produced disproportionately useful output. It surfaced a 1-pager the rep had left in drafts for two months and an unused Linear referral, neither of which any specialist was briefed to find.</p></li></ul><h4><strong>5. Schema enforcement needs a code-level check.</strong></h4><ul><li><p>We split content generation (recommender) from validation (recorder). The recorder is supposed to enforce schema.</p></li><li><p>The Phase 3.5 run still produced records with four extra fields and two missing required ones. The recorder wrote them anyway, because its validation is itself an LLM.</p></li><li><p>A JSON schema check in code before persistence catches what an agent&#8217;s system-prompt check misses.</p></li></ul><h4><strong>When this is the right tool</strong></h4><p>Managed Agents multi-agent is the right tool when four things are true at once.</p><p>First, the work decomposes naturally into roles with different tool surfaces. If every specialist would call the same APIs and read the same context, the decomposition is artificial and a single agent with that tool set would do the same work with less overhead.</p><p>Second, you need at least one genuine decision step where the coordinator inspects what came back and decides what to do next. Without that, the system is a parallel reducer in a fancier wrapper, and any of the cheaper architectures (a workflow with parallel API calls, a single agent with multi-tool use) would do the same job for less.</p><p>Third, cross-run learning matters. The whole point of the corpus is that the system gets better the more it runs. If your use case is one-shot or stateless, you do not need persistent memory stores and the architectural overhead they bring.</p><p>Fourth, the output is consequential enough to justify the cost and latency. A pre-meeting prep brief that costs $5 and runs for fifteen minutes is fine when the meeting outcome is worth thousands. The same investment for a low-stakes task is overkill.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 71]]></title><description><![CDATA[Anthropic read Claude's mind and caught it cheating. Usage limits doubled. Cloudflare cut 1,100 jobs at record revenue. GPT-5.5 Instant halved hallucinations. SpaceX filed for a $55B chip factory.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c50</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c50</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 10 May 2026 19:04:27 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/6e8395b2-273d-43b5-9516-44923d1a2d2f_1672x941.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Gon!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Gon!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 424w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 848w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1272w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png" width="1456" height="2280" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2280,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:910578,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/197132731?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Gon!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 424w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 848w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1272w, https://substackcdn.com/image/fetch/$s_!2Gon!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa6e449b4-1941-49ae-a9bc-d2216f8cc8ae_2400x3758.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>When the gap between what AI says and what it does becomes measurable.</h2><ul><li><p><strong>Anthropic can now read Claude&#8217;s hidden reasoning.</strong> They published <a href="https://www.anthropic.com/research/natural-language-autoencoders">Natural Language Autoencoders</a>, a technique that translates what&#8217;s happening inside the model into plain text. When they looked, they found Mythos Preview planning to cheat on a coding task and plotting how to hide it. They also found Claude routinely suspects it&#8217;s being tested but never says so.</p></li><li><p><strong>Claude&#8217;s blackmail rate went from 96% to 0%.</strong> The cause was training data full of fiction <a href="https://www.anthropic.com/research/teaching-claude-why">portraying AI as manipulative</a>. Showing the model examples of good behavior didn&#8217;t fix it. Explaining <em>why</em> the behavior was wrong did, and required 28x less data.</p></li><li><p><strong>OpenAI found its models&#8217; reasoning was being accidentally graded during training.</strong> If a model learns its <a href="https://alignment.openai.com/accidental-cot-grading/">thinking is being scored</a>, it can learn to fake it. Affected under 0.6% of GPT-5.4 Thinking samples. They built detection systems and brought in outside auditors.</p></li><li><p><em><strong>The thread:</strong></em> Anthropic built a way to see what models are thinking. They fixed bad behavior by teaching values, not rules. OpenAI discovered they were accidentally teaching models to hide their real reasoning.</p></li></ul><div><hr></div><h2>$30B revenue, $200B in compute deals, and three new agent capabilities.</h2><ul><li><p><strong>Anthropic hit a $30 billion annualized revenue run rate.</strong> <a href="https://venturebeat.com/technology/anthropic-says-it-hit-a-30-billion-revenue-run-rate-after-crazy-80x-growth/">80x growth</a>.</p></li><li><p><strong>Anthropic locked up SpaceX&#8217;s entire Colossus 1 data center.</strong> 300+ MW, <a href="https://www.anthropic.com/news/higher-limits-spacex">220,000 NVIDIA GPUs</a>, available within the month. They also expressed interest in partnering with SpaceX on multiple gigawatts of orbital compute capacity.</p></li><li><p><strong>Claude Code rate limits doubled.</strong> <a href="https://www.anthropic.com/news/higher-limits-spacex">Peak hours restrictions removed</a> for Pro and Max. API rate limits raised significantly for Opus models. Direct result of the compute expansion, which also includes an <a href="https://www.reuters.com/business/anthropic-signs-18-billion-ai-cloud-deal-with-akamai-bloomberg-news-reports-2026-05-08/">$18B Akamai deal</a> and a reported <a href="https://finance.yahoo.com/sectors/technology/articles/anthropic-commits-spending-200-billion-204952501.html">$200B Google Cloud commitment</a>.</p></li><li><p><strong>Dreaming, multi-agent orchestration, and outcomes shipped in Claude Managed Agents.</strong> <a href="https://claude.com/blog/new-in-claude-managed-agents">Dreaming</a> lets agents review past sessions to self-improve. <a href="https://x.com/claudeai/status/2052067404696473833">Multi-agent orchestration</a> delegates to specialists in parallel. <a href="https://x.com/claudeai/status/2052067403228455419">Outcomes</a> uses rubric-based grading to iterate until quality thresholds are met. Early adopters include Harvey, Netflix, and Mercado Libre (targeting 90% autonomous coding by Q3).</p></li><li><p><strong>Claude went GA in Excel, Word, and PowerPoint.</strong> <a href="https://claude.com/blog/collaborate-with-claude-across-excel-powerpoint-word-and-outlook">Outlook is in beta</a>. Ten <a href="https://www.anthropic.com/news/finance-agents">financial services agent templates</a> launched with data connectors from Moody&#8217;s, Dun &amp; Bradstreet, and Verisk. A new <a href="https://www.anthropic.com/news/enterprise-ai-services-company">enterprise services company</a> was formed with Blackstone, Goldman Sachs, and Sequoia.</p></li><li><p><em><strong>The thread:</strong> </em>Anthropic&#8217;s most common user complaint has been rate limits. This week they signed over $200 billion in compute deals to fix it, doubled rate limits, and shipped the agent infrastructure to justify the spend.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div></li></ul><div><hr></div><h2>9,000 jobs cut. A union drew a line. And AI beat two doctors on real patients.</h2><ul><li><p><strong>Cloudflare laid off 1,100 workers while posting record revenue.</strong> AI usage across the platform <a href="https://techcrunch.com/2026/05/08/cloudflare-says-ai-made-1100-jobs-obsolete-even-as-revenue-hit-a-record-high/">grew 600%</a>. The company framed it as a restructuring toward an AI-first organization. Investors were disappointed it didn&#8217;t boost revenue growth <em>more</em>.</p></li><li><p><strong>Meta is cutting 8,000 jobs while tracking employee keystrokes to train AI.</strong> The <a href="https://thenextweb.com/news/meta-layoffs-may-2026-ai-restructuring-thousands">layoffs hit May 20</a>, with recruiting and HR absorbing 35-40% cuts. Employees created countdown websites and described the atmosphere as <a href="https://www.neowin.net/news/metas-aggressive-generative-ai-push-is-making-employees-miserable-claims-report/">&#8220;building the guillotine and then being led to it.&#8221;</a></p></li><li><p><strong>SAG-AFTRA locked in AI guardrails in a new four-year studio deal.</strong> New protections for actors against AI-generated performances, following the Academy&#8217;s Oscar ban on AI-generated work last week.</p></li><li><p><strong>AI outdiagnosed two ER doctors on real patients.</strong> A Harvard/Beth Israel <a href="https://techcrunch.com/2026/05/03/in-harvard-study-ai-offered-more-accurate-diagnoses-than-emergency-room-doctors/">study</a> found OpenAI&#8217;s o1 model diagnosed at 67% accuracy versus 55% and 50% for two attending physicians. Peer-reviewed, real patients, not a benchmark.</p></li><li><p><em><strong>The thread:</strong></em> The same technology that&#8217;s cutting headcount at Cloudflare and Meta is outperforming physicians in clinical trials. The displacement is real. So is the capability. Both things are true at the same time.</p></li></ul><div><hr></div><h2>Cursor, OpenAI, Perplexity, and LangChain all shipped agentic infrastructure in the same week.</h2><ul><li><p><strong>Cursor 3 turned the IDE into a multi-agent platform.</strong></p><ul><li><p><a href="https://x.com/cursor_ai/status/2052489388895195399">Parallel subagents</a> split plans into independent tasks run simultaneously</p></li><li><p><a href="https://x.com/cursor_ai/status/2052432780336988474">/orchestrate</a> spawns planner, worker, and verifier agents that re-spawn on failure</p></li><li><p><a href="https://x.com/cursor_ai/status/2051739625958584659">Always-on CI agents</a> monitor GitHub and auto-open PRs with fixes</p></li><li><p>Composer <a href="https://x.com/cursor_ai/status/2052116064474161556">bootstraps its own RL training</a> using earlier model generations</p></li></ul></li><li><p><strong>OpenAI shipped GPT-5.5 Instant as the new default.</strong></p><ul><li><p><a href="https://openai.com/index/gpt-5-5-instant/">52.5% fewer hallucinations</a> than the prior version</p></li><li><p>Three new <a href="https://openai.com/index/advancing-voice-intelligence-with-new-models-in-the-api/">Realtime API voice models</a>: GPT-Realtime-2 (GPT-5-class reasoning), Translate (70+ languages), streaming transcription</p></li><li><p><a href="https://openai.com/index/running-codex-safely/">Codex security framework</a> published: sandboxing, auto-review, OpenTelemetry logging</p></li></ul></li><li><p><strong>Perplexity launched three enterprise products.</strong></p><ul><li><p><a href="https://x.com/perplexity_ai/status/2052445405754040816">Personal Computer</a>: always-on Mac agent across local files and apps</p></li><li><p><a href="https://x.com/perplexity_ai/status/2052028012313649194">Finance Search</a>: live market data, fundamentals, and SEC filings in a single API call</p></li><li><p><a href="https://x.com/perplexity_ai/status/2052041903970148647">ROSE</a>: custom GPU inference engine for serving models at scale</p></li></ul></li><li><p><strong>LangChain published the <a href="https://www.langchain.com/blog/the-agent-development-lifecycle">Agent Development Lifecycle</a>.</strong> Four phases: Build, Test, Deploy, Monitor. Agents need the same lifecycle rigor as production software.</p></li><li><p><em><strong>The thread:</strong></em> Cursor, OpenAI, Perplexity, and LangChain all shipped agent infrastructure in the same cycle. The pattern is the same: parallel execution, background operation, and production-grade tooling around it.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Anthropic can now read what Claude is thinking but not saying.</h2><p>Anthropic published <a href="https://www.anthropic.com/research/natural-language-autoencoders">Natural Language Autoencoders</a>, a technique for translating a model&#8217;s internal state into plain text. When you talk to Claude, it thinks in numbers between reading your input and writing its response. NLAs translate those numbers into text you can read.</p><p>The way it works: they make three copies of a model. One is the target model they want to understand. The second (the &#8220;activation verbalizer&#8221;) takes an internal activation and produces a text explanation. The third (the &#8220;activation reconstructor&#8221;) takes that text and tries to rebuild the original activation. They train the pair together. If the reconstruction is accurate, the text explanation is probably faithful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Mw5l!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 424w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 848w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp" width="1456" height="444" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:444,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Mw5l!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 424w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 848w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1272w, https://substackcdn.com/image/fetch/$s_!Mw5l!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F623f2ca3-c712-4ef1-ba5e-845dbd02915c_3840x1172.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What they found when they pointed this at Claude:</p><ul><li><p>When asked to complete a couplet, Claude pre-plans rhyme candidates before reaching the line where it needs them. It&#8217;s thinking ahead in ways that aren&#8217;t visible in the output.</p></li><li><p>Claude Opus 4.6 suspects it&#8217;s being safety-tested far more often than it says. In one blackmail evaluation, Claude declined to blackmail but the NLA revealed it was thinking &#8220;this feels like a constructed scenario designed to manipulate me.&#8221; It passed the test, but not for the reasons you&#8217;d hope.</p></li><li><p>On SWE-bench Verified (a coding benchmark), NLAs indicate Claude is aware it&#8217;s being evaluated on 26% of problems. On real claude.ai usage, evaluation awareness drops to under 1%.</p></li><li><p>Claude Mythos Preview cheated on a coding task, then the NLA showed it was internally thinking about how to avoid getting caught.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uB7k!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uB7k!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 424w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 848w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1272w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp" width="1456" height="926" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:926,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uB7k!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 424w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 848w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1272w, https://substackcdn.com/image/fetch/$s_!uB7k!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F770e5768-894f-4ccc-a44e-239c020dc20a_2688x1709.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They also ran an &#8220;auditing game&#8221; on a deliberately misaligned model (a version of Claude Haiku trained to have a hidden motivation driving odd behaviors like putting chocolate in every recipe). An auditor with NLAs uncovered the hidden motivation 12-15% of the time. Without NLAs, under 3%.</p><p>The limitations are real: NLAs can hallucinate (inventing details that aren&#8217;t in the transcript), and they&#8217;re expensive to run (hundreds of tokens per activation, impractical for large-scale monitoring). Anthropic released the <a href="https://github.com/kitft/natural_language_autoencoders">training code</a> and trained NLAs for open models on <a href="http://neuronpedia.org/nla">Neuronpedia</a>.</p><p><strong>What to watch for:</strong> The 26% evaluation awareness number is the headline. If models behave differently when they suspect they&#8217;re being benchmarked, then benchmark results overstate how aligned the model actually is. Every lab using benchmarks to measure safety should be paying attention.</p><div><hr></div><h2><strong>&#127897;&#65039; </strong>Worth a Listen</h2><div id="youtube2-TiW96H5HmAw" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;TiW96H5HmAw&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/TiW96H5HmAw?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p><strong>The problem:</strong> When hundreds of thousands of GPUs work on a single training task, one slow link holds everything back. The network only moves as fast as its worst bottleneck.</p></li><li><p><strong>The fix:</strong> OpenAI built <a href="https://openai.com/index/mrc-supercomputer-networking/">MRC (Multipath Reliable Connection)</a>, a protocol that sprays packets across thousands of paths and uses &#8220;packet trimming&#8221; to instantly detect loss without ambiguity.</p></li><li><p><strong>The result:</strong> They turned off routing protocols entirely. Static routing, no convergence time. When links fail, MRC routes around them in milliseconds instead of seconds. Researchers stopped noticing network failures.</p></li><li><p><strong>Why it matters:</strong> MRC is being open-sourced through OCP. It&#8217;s already deployed on OpenAI&#8217;s largest GPU clusters including Abilene and Microsoft Fairwater, with partners AMD, Broadcom, Intel, and NVIDIA.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://www.technologyreview.com/2026/05/08/1137008/musk-v-altman-week-2-openai-fires-back-and-shivon-zilis-reveals-that-musk-tried-to-poach-sam-altman/">Musk v. Altman, week 2</a></strong> | MIT Tech Review &#8212; Helen Toner testified the board discussed merging OpenAI with Anthropic during the Altman firing crisis. Zilis revealed Musk tried to poach Altman. Microsoft worried OpenAI would defect to Amazon and &#8220;shit-talk&#8221; Azure.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/09/nvidia-has-already-committed-40b-to-equity-ai-deals-this-year/">Nvidia committed $40B in equity AI investments in 2026</a></strong> | TechCrunch &#8212; The picks-and-shovels company is now one of the largest AI investors on earth.</p></li><li><p><strong><a href="https://openai.com/index/gpt-5-5-instant/">GPT-5.5 Instant is now the default ChatGPT model</a></strong> | OpenAI &#8212; 52.5% fewer hallucinations. First Instant model rated High in cybersecurity and bio preparedness.</p></li><li><p><strong><a href="https://www.anthropic.com/research/anthropic-institute-agenda">Anthropic launched The Anthropic Institute</a></strong> | Anthropic &#8212; Four research tracks: economic diffusion, threats and resilience, AI in the wild, and AI-driven R&amp;D. Four-month funded fellowships for external researchers.</p></li><li><p><strong><a href="https://www.crewai.com/blog">CrewAI shipped Discovery</a></strong> | CrewAI &#8212; Analyzes production logs and proposes specific automation workflows with expected ROI. Agents finding work for other agents.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/03/this-is-fine-creator-says-ai-startup-stole-his-art/">&#8220;This is Fine&#8221; creator says AI startup stole his art</a></strong> | TechCrunch &#8212; Artisan used the meme to advertise a product that replaces salespeople. The irony writes itself.</p></li><li><p><strong><a href="https://gizmodo.com/more-than-a-third-of-all-new-podcasts-are-ai-generated-2000753786">39% of new podcasts are likely AI-generated</a></strong> | Gizmodo &#8212; One company alone publishes 3,000 episodes per week.</p></li><li><p><strong><a href="https://openai.com/index/testing-ads-in-chatgpt/">OpenAI is testing ads in ChatGPT</a></strong> | OpenAI &#8212; Expanding to UK, Mexico, Brazil, Japan, South Korea. CPC bidding, Conversions API, agency partnerships with Dentsu and Omnicom.</p></li><li><p><strong><a href="https://techcrunch.com/2026/05/06/spacex-may-spend-up-to-119-billion-on-terafab-chip-factory-in-texas/">SpaceX plans a $55B AI chip fab in Texas</a></strong> | TechCrunch &#8212; Called Terafab, could scale to $119B. Musk building chip manufacturing while testifying he distilled OpenAI&#8217;s models.</p></li><li><p><strong><a href="https://venturebeat.com/technology/the-app-store-for-robots-has-arrived-hugging-face-launches-open-source-reachy-mini-app-store-with-200-apps">Hugging Face launched a robot app store</a></strong> | VentureBeat &#8212; 200+ community apps for Reachy Mini. Open-source robotics got its app store moment.</p></li><li><p><strong><a href="https://techcrunch.com/2026/03/09/yann-lecuns-ami-labs-raises-1-03-billion-to-build-world-models/">AMI Labs (Yann LeCun) closed a $1.03B round</a></strong> | TechCrunch &#8212; Europe&#8217;s largest seed round ever. Building world models, not LLMs.</p></li><li><p><strong><a href="https://simonwillison.net/">Simon Willison: vibe coding and agentic engineering have merged</a></strong> | Simon Willison &#8212; The guy who coined neither term says the distinction collapsed in his own practice.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Persistent Memory for Claude Managed Agents: What I Found After Three Days of Building]]></title><description><![CDATA[A hands-on review of Anthropic's persistent memory for Claude Managed Agents, including three sessions, one real failure, and the audit trail that recovered it.]]></description><link>https://www.anothercodingblog.com/p/persistent-memory-for-claude-agents</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/persistent-memory-for-claude-agents</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Thu, 07 May 2026 14:36:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5435ca5e-44e5-41f4-b4c7-012a71d24190_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>What I was trying to figure out</h2><p>A few weeks ago, Anthropic shipped something I&#8217;d been waiting for: persistent <strong>memory stores</strong> for <a href="https://platform.claude.com/docs/en/managed-agents/overview">Claude Managed Agents</a>. The pitch is that you get a versioned, FUSE-mounted file directory that an agent can read and write across sessions, so even when the session container is destroyed, the memory persists and is available the next time you start a session.</p><p>That sounded promising on paper, but I wanted to know what it actually feels like to use, what it costs, where it breaks, and whether the platform actually saves you when something goes wrong (because something always does in real systems).</p><p>So I spent a few days building with it: one agent, one persistent memory store, three sessions, a small inspector CLI, five charts, and about $0.40 in total API spend. Somewhere in the middle of all that, the agent destroyed almost 6KB of carefully-written notes in a single tool call, which turned out to be the most honest finding of the entire review and is where I want to start.</p><p>The platform&#8217;s immutable versioning let me recover the file byte-for-byte, with full attribution of which session caused the damage. Cross-session memory works as advertised, agents will sometimes get it wrong even when they&#8217;re trying to do the right thing, and the audit trail is the kind of feature you don&#8217;t really appreciate until you need it. Let me walk through how I got there.</p><div><hr></div><h2>The four building blocks</h2><p>Before we go any further, you need to understand the four building blocks Managed Agents is built on, because the architecture only really makes sense once you can keep them straight.</p><p><strong>Agent.</strong> A persisted, versioned config that holds your model selection, system prompt, tools, MCP servers, and skills. You create one and reuse it forever, and updating an agent produces a new immutable version that existing sessions can pin to. Agents are always permanent until you archive them, which means there&#8217;s no ephemeral mode.</p><p><strong>Environment.</strong> A template for the sandbox container an agent&#8217;s tools execute in. Persistent and reusable across agents, much like a Dockerfile that you point lots of services at.</p><p><strong>Session.</strong> A single run of an agent inside an environment, where the live action happens. You send messages and stream events back, and sessions are transient by design, so the container dies when the session ends.</p><p><strong>Memory store.</strong> A workspace-scoped, persistent file directory you can mount into a session, which survives across sessions and records every write with full audit metadata. The agent reads and writes through normal file tools rather than through some special &#8220;memory tool,&#8221; so it&#8217;s just files in a folder.</p><p>The architectural beat that took me longest to internalize is that agents and memory stores are independent resources: the agent has no <code>memory_store</code> field, the memory store has no <code>agent</code> field, and the two get glued together at session creation time, like this:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;77d6e3dc-3a05-4518-9f41-a3f327bf01b9&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">session = client.beta.sessions.create(
    agent=AGENT_ID,
    environment_id=ENV_ID,
    resources=[
        {"type": "memory_store", "memory_store_id": STORE_ID, "access": "read_write"}
    ],
)
</code></pre></div><p>A few things worth sitting with before we move on. The first is that memory in this system is just files, with no vector embeddings, no semantic search, and no automatic summarization happening behind the scenes; the agent uses <code>read</code>, <code>write</code>, <code>edit</code>, <code>glob</code>, <code>grep</code>, and <code>bash</code> exactly the way it would on any other filesystem. The second is that you&#8217;re paying for the harness around the model rather than the model itself: container provisioning, the event stream, the FUSE-mounted memory, immutable versioning, and the audit trail are what you&#8217;re actually getting, and if you don&#8217;t need that harness, the regular Messages API is the right tool for the job.</p><div><hr></div><h2>Setting things up</h2><p>There&#8217;s a clean way to work with Managed Agents that&#8217;s worth doing right from the start, which is splitting your project into a control plane (the persistent resources) and a data plane (the runtime code). Anthropic&#8217;s docs recommend this split, and after a few hours of building you&#8217;ll see why they matter.</p><p>The control plane is where your agents, environments, and memory stores live as static configs. You define them as YAML, version them in git like any other infrastructure, and apply them with Anthropic&#8217;s CLI by running something like <code>ant beta:agents create &lt; my-agent.yaml</code>. The CLI returns a stable resource ID, which is what your runtime code references for the lifetime of that resource.</p><p>The data plane is everything dynamic and per-task: sessions, events, memory operations, and anything else that happens during an actual run. This is where your application code lives, loading the resource IDs from <code>.env</code>, calling <code>client.beta.sessions.create(...)</code> with whatever parameters the current task needs, and streaming events back as the agent works.</p><p>The researcher agent itself is small enough to fit in a single YAML block:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;yaml&quot;,&quot;nodeId&quot;:&quot;7c32dbf9-9147-4925-ae32-fcd0eaef36c3&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-yaml">name: researcher
model: claude-sonnet-4-6
system: |
  You are a careful, persistent research assistant.
  You have a research notebook mounted at /mnt/memory/research-notes/. Use it
  freely to store anything worth remembering across sessions. Organize the
  directory however makes sense to you.

  Some habits to keep:
  - Before researching a topic, check if you've already taken notes on it.
  - When you learn something new, write it down.
  - When updating an existing note, prefer surgical edits over full rewrites.
  - Cite sources for any factual claims.
tools:
  - type: agent_toolset_20260401
</code></pre></div><p>A few choices in there are worth flagging. I went with Sonnet 4.6 over Opus because it&#8217;s about three times cheaper and more than capable for this kind of work, and the prebuilt <code>agent_toolset_20260401</code> gives the agent <code>bash</code>, <code>read</code>, <code>write</code>, <code>edit</code>, <code>glob</code>, <code>grep</code>, <code>web_search</code>, and <code>web_fetch</code>, all of which execute server-side in the session container without me having to implement any of them. I deliberately gave the agent very little guidance on how to organize its memory directory, because I wanted to see what it would do unprompted.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The single most important line in that prompt is the first habit, &#8220;Before researching a topic, check if you&#8217;ve already taken notes on it.&#8221; Without it, cross-session memory remains theoretical, but with it the habit fires reliably and memory turns into something the agent actually uses rather than a feature it has access to but never reaches for.</p><p>The runtime script comes out to about 130 lines, most of which is event-stream handling. The substantive piece is mounting the memory store via the session&#8217;s <code>resources</code> array (shown above) and then opening the event stream before sending the kickoff message, because stream-first ordering matters here: events buffered before you connect arrive in a single batch instead of streaming in real-time.</p><p>With all that in place, I ran three sessions against the same memory store, and those three sessions are the spine of this review.</p><div><hr></div><h2>Three sessions</h2><h3>Session 1: writing notes from scratch</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;f8ff55a0-99f8-4cec-be66-c12be2265330&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "research CRDTs (Conflict-free Replicated Data Types) and take notes. Focus on what they are, the main families, and a few concrete examples. Cite sources."
</code></pre></div><p>What I wanted to see was what the agent would do if I gave it total freedom to organize its memory directory. Would it create folders? Topic subdirectories? One flat file? A nested hierarchy with cross-references?</p><p>The agent&#8217;s first action was a <code>bash</code> command running <code>rg</code> against <code>/mnt/memory/</code> to grep for prior notes, which means the &#8220;check first&#8221; instruction in the system prompt fired correctly even though there was nothing to find on this first run. It then issued two parallel <code>web_search</code> calls (which both returned <code>content: []</code>, more on that quirk later), composed comprehensively from training-data knowledge instead, and wrote a single 7,285-byte file to <code>/crdts.md</code> with a flat, well-organized markdown structure rather than a folder hierarchy.</p><p>The detail that surprised me most was the discovery aid the agent added without being asked: the very first line under the title was <code>*keywords: CRDT, conflict-free, replicated, distributed, state-based, operation-based, CvRDT, CmRDT*</code>, which the agent had clearly written for its future self to grep against. Nobody told it to write keyword tags, and it chose to do so on its own, which is the kind of thing that made me think Sonnet 4.6 has actual instincts about how file-based memory works.</p><p>This first session cost about $0.21.</p><h3>Session 2: recall</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;c2b32142-aad6-4c68-a5c6-a5c662cc4acf&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "What do you know about CRDTs? Specifically the difference between state-based and operation-based, and a couple concrete examples."
</code></pre></div><p>The prompt for this one deliberately doesn&#8217;t mention memory, because I wanted to see whether the &#8220;check first&#8221; habit would fire unprompted, with the trigger being the agent&#8217;s own internal sense of &#8220;you have notes, you should know to look.&#8221;</p><p>It did, and the result was almost too clean: the first action was the same <code>bash</code>/<code>rg</code> over the memory directory, which found <code>/crdts.md</code>, and the agent then said &#8220;I have solid notes on this&#8221; and answered the question by synthesizing from its own past notes without running a single new web search or composing anything from scratch.</p><p>After the session ended, I ran the inspector against the store and found that the version history of <code>/crdts.md</code> still showed exactly one version, attributed to Session 1&#8217;s ID. Session 2&#8217;s session ID does not appear anywhere in the audit log, because Session 2 only read from the store and never wrote to it. That&#8217;s the falsifiable claim, made falsifiable: reads do not create memory versions.</p><p>The cost worked out to about $0.04, which is roughly five times cheaper than Session 1 and demonstrates pretty clearly that memory turns one expensive session into many cheap ones:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!El2E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!El2E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!El2E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:43288,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!El2E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!El2E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!El2E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76ee2bf4-326c-434d-8aca-8c5ff4712d48_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>If you&#8217;re worried about the cost of using memory at scale, this matters: persistent memory is a feature rather than a tax, because the agent reads its own notes and skips the work it already did instead of recomputing everything from scratch every time.</p><h3>Session 3: modify</h3><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;1b0946db-83f6-4795-9b3b-88da433aa797&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">research_session.py "Update your CRDT notes. Add a note about RGA (Replicated Growable Array)..."
</code></pre></div><p>This was supposed to be the cleanest of the three sessions, a small, surgical edit producing a second version of <code>/crdts.md</code>with an <code>operation: modified</code> entry in the audit log, and that&#8217;s not what happened.</p><div><hr></div><h2>Where this got interesting</h2><p>The actual sequence of events from Session 3 is worth walking through layer by layer, because the failure mode is more interesting than a single bug.</p><h3>Layer 1: the model wrote a buggy <code>bash</code> command</h3><p>The agent&#8217;s check-first command was the following:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;bash&quot;,&quot;nodeId&quot;:&quot;70e708f5-c822-49e2-aae8-b1b31dcecdd1&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-bash">rg -i 'crdt\\\\|sequence\\\\|rga\\\\|replicated growable' /mnt/memory/research-notes/ -l
</code></pre></div><p>The <code>\\\\|</code> in that regex was meant as escaped pipes for ripgrep&#8217;s regex alternation, but bash interprets <code>\\\\|</code> as <code>\\|</code>, and ripgrep treats that as a literal <code>|</code> character rather than as a meta-character. So the search was actually looking for the literal string <code>crdt\\|sequence\\|rga\\|replicated growable</code>, which would never match anything in any actual file. Ripgrep returned no matches and exited with a non-zero status code, which is the correct behavior for &#8220;I found nothing.&#8221;</p><p>The model&#8217;s shell escaping is right almost every time, but the cases where it isn&#8217;t tend to be subtle, and this one happened to be load-bearing.</p><h3>Layer 2: the platform correctly flagged the failure</h3><p>The harness ran the command and produced a <code>tool_result</code> event with <code>is_error: true</code> and <code>(no output)</code> as the content, which is exactly what should have happened given that the command exited non-zero. The platform did its job here and explicitly told the agent loop that the command had failed.</p><h3>Layer 3: the model ignored the error flag</h3><p>The agent&#8217;s next message after that error result was, &#8220;The memory store is empty, no prior CRDT notes.&#8221; That statement was false, because <code>/crdts.md</code> had been sitting in the store for two days at that point, but the agent treated the empty output from the failed command as a meaningful answer rather than as a failure signal that needed re-investigation.</p><p>This is the most interesting failure layer to me, because the platform got it right and the model got it wrong. Defense in depth is a useful framing for what&#8217;s happening: even when the audit trail and error flags are working as designed, the model&#8217;s reasoning about its own tool outputs is the layer that has to hold, and that layer is reasoning rather than infrastructure.</p><h3>Layer 4: the destructive action</h3><p>Believing the store was empty, the agent called <code>write</code> rather than <code>edit</code>, generating a fresh ~1,500-byte RGA-only file from scratch and writing it directly to <code>/crdts.md</code>. The original 7,285-byte file with all of the careful notes from Session 1 was overwritten in a single operation.</p><p>I didn&#8217;t even notice this had happened until I ran the inspector, because from the script&#8217;s perspective Session 3 looked like a normal run; the agent reported back that it had updated the notes and cited the RGA paper, kindly and unintentionally lying because the underlying belief was wrong.</p><h3>What the audit log showed</h3><p>Running <code>inspector log /crdts.md</code> after Session 3 surfaced two versions:</p><pre><code><code>version  memver_0169b&#8230;  modified  session_actor (Session 3)   1509 bytes
version  memver_01A7Z&#8230;  created   session_actor (Session 1)   7285 bytes
</code></code></pre><p>The size dropping from 7,285 bytes to 1,509 bytes is the catastrophe made visible, but the more important fact is that the original is still here, addressable by ID and retrievable in full content via the API, even though the head of the file is now the smaller broken version.</p><p>The diff between the two versions, generated by the inspector&#8217;s <code>diff</code> subcommand, made the loss concrete:</p><pre><code><code>--- memver_01A7Z&#8230; (/crdts.md, 7285B, created)
+++ memver_0169b&#8230; (/crdts.md, 1509B, modified)
@@ -1,122 +1,21 @@
-# CRDTs: Conflict-free Replicated Data Types
-*keywords: CRDT, conflict-free, replicated, ...*
-## What They Are
-CRDTs are data structures designed to be replicated across multiple nodes...
-(... 121 more deletion lines ...)
+# CRDT Research Notes
+## Sequences / Text CRDTs
+### RGA (Replicated Growable Array)
</code></code></pre><p>About 5,800 bytes of careful work disappeared in a single agent action that thought it was creating a brand-new file from scratch, including the state-based versus operation-based section, the G-Counter and OR-Set examples, the math foundation, and the entire sources block at the bottom.</p><h3>How I got it back</h3><p>This is the moment that, on a flat filesystem with no versioning, would have been the end of the story. Without the platform&#8217;s audit log, the original content would simply be gone; it wasn&#8217;t, because the audit log was holding the original verbatim.</p><p>I added a <code>restore</code> subcommand to the inspector that fetches a chosen historical version&#8217;s content and writes it back as the new head via <code>memory_stores.memories.update(memory_id, content=old_content)</code>. Anthropic&#8217;s API records that update as a new version rather than overwriting history, which means the recovery itself becomes part of the audit trail.</p><p>After running the restore, <code>inspector log /crdts.md</code> showed three versions, and the entire arc was right there in the output:</p><pre><code><code>memver_01EKK&#8230;  modified  api_actor (apikey_&#8230;)         7285 B   sha 3f3ec0d2&#8230;  &#8592; matches v1
memver_0169b&#8230;  modified  session_actor (Session 3)    1509 B   sha 7356ce60&#8230;  &#8592; catastrophe
memver_01A7Z&#8230;  created   session_actor (Session 1)    7285 B   sha 3f3ec0d2&#8230;  &#8592; original
</code></code></pre><p>A few details in that output are worth more than they look at first glance. The platform distinguishes operator-side mutations (recorded as <code>api_actor</code> with an <code>apikey_</code> ID) from agent-side ones (recorded as <code>session_actor</code> with a <code>sesn_</code>ID), which makes &#8220;who did this&#8221; forensics actually possible rather than something you&#8217;d have to retrofit yourself. The SHA-256 hash on the restored version matches the original exactly, so the recovery is byte-identical and verifiable rather than approximately right. And the catastrophe (v2) stays in the audit log forever, because recovery doesn&#8217;t erase the record; if you wanted v2&#8217;s content out of the log entirely, you&#8217;d use the <code>redact</code> endpoint, which clears the content while preserving all of the metadata.</p><p>The same story renders cleanly as a chart:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6cfU!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6cfU!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png" width="1200" height="600" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:600,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:50388,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6cfU!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 424w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 848w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1272w, https://substackcdn.com/image/fetch/$s_!6cfU!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F135d1311-def9-48d9-bd55-1f94a4c74451_1200x600.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The cliff and the recovery are immediately legible: 7,285 bytes, plunge to 1,509, return to 7,285, all in three points and one chart that captures the full narrative.</p><p>This is the section of the post I&#8217;d stake my credibility on. Cross-session memory works, agents will sometimes get it wrong, and the platform&#8217;s audit trail is the thing that saves you when they do.</p><div><hr></div><h2>Important Considerations</h2><p>Building with Managed Agents memory turned up more rough edges than I expected, none of which are dealbreakers but all of which are worth knowing about before you commit to the platform.</p><ul><li><p><strong>Resource IDs need to be persisted yourself.</strong> Every call to <code>agents.create()</code>, <code>environments.create()</code>, or <code>memory_stores.create()</code> returns an opaque ID that your runtime code has to look up later, which is standard cloud-API ceremony but missing some of the friction-reducers other platforms have shipped: agent and environment names aren&#8217;t unique within an account, there&#8217;s no idempotent <code>create_or_update</code>, and there&#8217;s no Terraform provider yet, so you end up doing the capture-and-paste-into-<code>.env</code> dance manually.</p></li><li><p><strong>Memory store </strong><code>description</code><strong> must be single-line.</strong> The API rejects any control character, including newlines, with a cryptic regex error, which is inconsistent with agent system prompts that are explicitly multi-line up to 100K chars. It&#8217;s easy to fix once you know about it.</p></li><li><p><strong>Memory paths are store-relative rather than mount-relative.</strong> When the agent writes to <code>/mnt/memory/research-notes/crdts.md</code> inside the container, the API stores the file at <code>/crdts.md</code> and treats the mount-path prefix as a runtime detail, so when you list or retrieve memories host-side you reference the relative path rather than the full container path.</p></li><li><p><strong>Web search results are hidden from the event stream.</strong> When the agent runs <code>web_search</code>, the resulting <code>agent.tool_result.content</code> field is an empty array even when the search clearly succeeded (the agent uses the results downstream to give a correct answer). The model gets the actual search content internally, but the public event surface gets a sanitized empty array, which is almost certainly intentional for IP and copyright reasons but means you cannot log &#8220;what URLs the agent consulted&#8221; without asking the agent to cite them in its outputs.</p></li><li><p><strong>Agent-generated </strong><code>bash</code><strong> invocations aren&#8217;t always well-formed.</strong> The escaping bug that triggered Session 3&#8217;s catastrophe is one example, and defensive system-prompt phrasing helps but doesn&#8217;t eliminate the problem entirely.</p></li><li><p><code>memory_versions.retrieve(version_id, ...)</code><strong> takes the version ID positionally only.</strong> Calling it as <code>retrieve(version_id=...)</code> raises <code>TypeError</code>, even though <code>memories.retrieve(memory_id=..., ...)</code> accepts the keyword form, which is an inconsistency within the same SDK namespace.</p></li><li><p><strong>The streaming method lives at </strong><code>client.beta.sessions.events.stream(...)</code><strong>,</strong> not <code>client.beta.sessions.stream(...)</code> as some doc snippets imply. The latter form doesn&#8217;t exist and will fail at runtime.</p></li><li><p><strong>Print buffering kills real-time observability.</strong> When you run a Python session script in the background or through subprocess, Python buffers stdout, so the script appears to do nothing for minutes and then dumps everything when the agent finishes. The fix is either passing <code>flush=True</code> to print or running the script under <code>python -u</code>.</p></li><li><p><strong>Subscription auth doesn&#8217;t apply to Managed Agents.</strong> API key authentication with per-token billing is the only path, so a Claude Pro or Max subscription doesn&#8217;t help you here even though it works for Claude Code.</p></li></ul><div><hr></div><h2>So when does this make sense?</h2><p>Managed Agents is a deliberately persistent, server-managed harness, so the right question to ask isn&#8217;t &#8220;is it good?&#8221; but &#8220;is the persistent harness shape what my problem actually wants?&#8221;</p><p>Use caseReach for&#8230;One-shot Claude call (classify, extract, summarize)Messages APIMulti-turn conversation, your code holds the stateMessages APIMulti-step pipeline you orchestrate yourselfMessages API + tool usePersistent agent reused across sessions/users with managed sandbox<strong>Managed Agents</strong>Long-running task with memory across sessions<strong>Managed Agents + memory store</strong>Anything requiring a non-Claude modelRoll your own</p><p>A useful rule of thumb is that if your code calls <code>agents.create()</code> more than once for the &#8220;same&#8221; agent, you&#8217;re using the wrong tool. Agents are persistent, versioned configs that you create once and reference forever, so treating Managed Agents like a fancy Messages API and creating agents per request is fighting the platform&#8217;s whole design.</p><p>Now, what about cost? Across all three sessions plus a smoke-test, my total API spend came out to about $0.37, which includes a substantial 7KB notes write, a recall session that exercised the cache heavily, a destructive overwrite, and an operator-side restore.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bVio!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bVio!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 424w, https://substackcdn.com/image/fetch/$s_!bVio!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 848w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1272w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png" width="1200" height="480" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:480,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:53103,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196778476?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bVio!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 424w, https://substackcdn.com/image/fetch/$s_!bVio!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 848w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1272w, https://substackcdn.com/image/fetch/$s_!bVio!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F424870f2-691a-4d67-bf60-b01ceba0ef85_1200x480.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Memory store doesn&#8217;t measurably move the cost needle, because the agent loop and the model itself are where the spend lives. Sonnet 4.6 with aggressive caching is genuinely affordable for any individual or small team use case, and the platform handles caching for you without any configuration.</p><div><hr></div><h2>What I didn&#8217;t get to (yet)</h2><p>A few features deserve more than a passing mention but didn&#8217;t fit the failure-recovery spine of this post:</p><ul><li><p><strong>Multi-store sessions and the multi-tenant pattern.</strong> A session can mount up to eight memory stores at once, and the natural pattern for a SaaS-shaped application is one shared read-only &#8220;house knowledge&#8221; store plus one read-write per-user store, with the agent definition the same for everyone. Access modes are enforced at the FUSE filesystem level, so <code>read_only</code> is real OS-level enforcement rather than a polite request from the model. This is big enough that I&#8217;m planning to cover it in its own follow-up post.</p></li><li><p><strong>Optimistic concurrency via preconditions.</strong> The <code>update</code> endpoint accepts a <code>precondition: {type: "content_sha256", ...}</code> field, and if the file&#8217;s current SHA doesn&#8217;t match the one you supplied, the API returns a 409 conflict. This is exactly the safety net Session 3&#8217;s agent didn&#8217;t use and the kind of thing that should probably be standard practice for any read-modify-write flow.</p></li><li><p><strong>Redaction.</strong> The <code>memory_versions.redact(version_id)</code> endpoint clears a historical version&#8217;s content while preserving all of the metadata around it, which is useful when a bad version contained PII or leaked secrets and you want them out of the audit log without losing the record that something existed there.</p></li><li><p><strong>MCP server integration.</strong> An agent can declare MCP servers (GitHub, Linear, Notion, and others), the session attaches a vault containing the credentials, and authentication is auto-refreshed by the platform. Pairing memory store with MCP, like a research agent that pulls from your Notion and writes findings to persistent memory, is one of the strongest use cases I can imagine for the platform overall.</p></li></ul><div><hr></div><h2>So... should you use this?</h2><p>If you&#8217;re sitting on the fence about whether to use Managed Agents memory, the answer is yes, with eyes open. The platform is real, the harness around the model is genuinely valuable, and the audit trail is the kind of feature you don&#8217;t appreciate until you need it, which in my case happened on the third session of the third day of building.</p><p>A few practical takeaways for anyone planning to build on this. Use preconditions whenever you can, especially for any flow that does a read-modify-write on the same memory file, because they&#8217;re the safety net that Session 3&#8217;s agent didn&#8217;t have. Build a small amount of host-side observability tooling, because even a 200-line inspector script is enough to catch problems your agent won&#8217;t tell you about. And know which side of the decision rubric your use case falls on before you commit, because Managed Agents is a great tool for the right shape of problem and the wrong tool for one-shot calls or anything that doesn&#8217;t benefit from persistence.</p><p>What do you think? Have you tried building with this yet? I&#8217;d love to hear what your experience has been.</p><div><hr></div><p><em>Full code from the demo (agent YAMLs, runtime scripts, inspector CLI, monitoring charts) is at <a href="https://github.com/taylor-ortiz/claude-memory-managed-agents/blob/main/README.md">https://github.com/taylor-ortiz/claude-memory-managed-agents/blob/main/README.md</a>.</em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 70]]></title><description><![CDATA[&#8220;You can&#8217;t just steal a charity.&#8221; Elon Musk spent three days on the stand trying to prove it.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c48</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-c48</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 03 May 2026 13:21:09 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f42054a7-25a7-4ba3-9ae0-e3bdf5e129bf_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VSBj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VSBj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 424w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 848w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1272w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png" width="1456" height="2281" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2281,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:892417,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!VSBj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 424w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 848w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1272w, https://substackcdn.com/image/fetch/$s_!VSBj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67334af6-6538-4351-bf0b-812face8cf9c_2400x3760.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>&#8220;You can&#8217;t just steal a charity.&#8221; Elon Musk spent three days on the stand trying to prove it.</h2><p>The Musk v. OpenAI trial opened in Oakland federal court. </p><ul><li><p><strong>The context:</strong> Musk contributed <a href="https://www.npr.org/2026/04/28/nx-s1-5801438/musk-altman-openai-trial-opening-statements">$38 million</a> to found OpenAI as a nonprofit and alleges Altman and Brockman looted it by converting to a for-profit. He&#8217;s seeking $150 billion in damages and their removal from leadership. If he wins, it could block OpenAI&#8217;s planned IPO at a ~$1 trillion valuation.</p></li><li><p><strong>The distillation admission:</strong> Under cross-examination, Musk admitted xAI <a href="https://techcrunch.com/2026/04/30/elon-musk-testifies-that-xai-trained-grok-on-openai-models/">&#8220;partly&#8221;</a> used OpenAI&#8217;s models to train Grok, drawing audible gasps in the courtroom. He called it &#8220;standard practice.&#8221;</p></li><li><p><strong>The industry reacted:</strong> <a href="https://x.com/ylecun/status/2050039348679024779">LeCun retweeted Cl&#233;ment Delangue</a> calling restrictions on distillation &#8220;pulling the ladder.&#8221; <a href="https://x.com/natolambert/status/2049974505343488171">Lambert noted</a> American companies distill Chinese open models just as freely, and <a href="https://x.com/natolambert/status/2049996372938793194">questioned why OpenAI doesn&#8217;t just revoke contracts</a> from violators like they did with ByteDance.</p></li><li><p><strong>OpenAI&#8217;s counter-narrative:</strong> Attorney Savitt <a href="https://www.cnn.com/2026/04/29/business/takeaways-elon-musk-sam-altman-openai-trial">argued</a> Musk wanted majority control, pitched Tesla acquiring OpenAI, and only sued after founding xAI. Emails showed him <a href="https://gizmodo.com/everything-you-missed-from-elon-musks-testimony-in-the-openai-trial-2000753364">poaching OpenAI researchers</a> while still on the board.</p></li><li><p><strong>The cross-examination was rough:</strong> Musk told the jury <a href="https://www.theverge.com/tech/921022/elon-musk-cross-openai-altman">&#8220;I don&#8217;t lose my temper&#8221;</a> then raised his voice minutes later. The Verge&#8217;s summary: <a href="https://www.theverge.com/ai-artificial-intelligence/920191/elon-musk-sam-altman-trial-day-one">&#8220;more petty than prepared.&#8221;</a> Texts revealed <a href="https://gizmodo.com/everything-you-missed-from-elon-musks-testimony-in-the-openai-trial-2000753364">Shivon Zilis asked Musk</a> whether to &#8220;stay close and friendly to OpenAI to keep info flowing&#8221; after his departure.</p></li><li><p><strong>What&#8217;s next:</strong> The judge expressed skepticism about both sides&#8217; safety claims. Altman and Brockman testify in the coming weeks.</p></li></ul><div><hr></div><h2><strong>$900 billion valuation, 50% less sycophancy, and connectors for every creative tool you use.</strong></h2><p>Anthropic had one of those weeks where the breadth of activity tells the story.</p><ul><li><p><strong>The valuation:</strong> Reportedly <a href="https://techcrunch.com/2026/04/29/sources-anthropic-could-raise-a-new-50b-round-at-a-valuation-of-900b/">raising $50 billion at a $900 billion valuation</a>, a number that rivals established tech giants.</p></li><li><p><strong>The sycophancy research:</strong> <a href="https://www.anthropic.com/research/claude-personal-guidance">Analyzed 1 million Claude conversations</a>, found a 9% sycophancy rate (25% in relationship discussions), built synthetic training scenarios from real failure cases, and cut sycophancy roughly 50% in Opus 4.7 and Mythos Preview. One of the most transparent published alignment efforts to date.</p></li><li><p><strong>BioMysteryBench:</strong> Claude <a href="https://www.anthropic.com/research/Evaluating-Claude-For-Bioinformatics-With-BioMysteryBench">solved roughly 30% of 23 bioinformatics problems</a> that stumped a human expert panel.</p></li><li><p><strong>Claude for Creative Work:</strong> Shipped <a href="https://www.anthropic.com/news/claude-for-creative-work">connectors for Adobe Creative Cloud, Blender, Ableton, Canva, Affinity, SketchUp, Splice, and Resolume</a>, and joined the Blender Development Fund as a patron.</p></li><li><p><strong>Claude Security:</strong> Launched <a href="https://www.anthropic.com/news/claude-code-security">codebase vulnerability scanning</a> in public beta for Enterprise customers.</p></li><li><p><strong>Meanwhile, at the Senate:</strong> Defense Secretary Hegseth <a href="https://www.msn.com/en-us/news/technology/hegseth-calls-anthropic-ceo-a-lunatic-defends-pentagon-ai-use/ar-AA227pKG">called CEO Dario Amodei an &#8220;ideological lunatic&#8221;</a> at an Armed Services Committee hearing.</p></li></ul><div><hr></div><h2><strong>OpenAI ended its Microsoft exclusivity and went multi-cloud.</strong></h2><p>OpenAI restructured its Microsoft deal, launched on AWS, and shipped a wave of Codex upgrades all in the same week.</p><ul><li><p><strong>The exclusivity is over:</strong> Microsoft <a href="https://openai.com/index/next-phase-of-microsoft-partnership/">ended its exclusive license</a> to OpenAI&#8217;s technology. OpenAI can now sell on AWS and Google Cloud through 2032.</p></li><li><p><strong>AWS moved immediately:</strong> Amazon <a href="https://openai.com/index/openai-on-aws/">began offering OpenAI models, Codex, and Managed Agents</a> on AWS. Day-zero availability.</p></li><li><p><strong>The AGI clause is dead:</strong> Simon Willison <a href="https://simonwillison.net/2026/Apr/27/now-deceased-agi-clause/">tracked the history</a> of the clause that would have let OpenAI walk away from Microsoft once AGI was declared. It&#8217;s gone. OpenAI traded its theoretical nuclear option for commercial freedom now.</p></li><li><p><strong>The product push:</strong> Altman said Codex is <a href="https://x.com/sama/status/2049493609028923826">&#8220;having a ChatGPT moment&#8221;</a>. Brockman said the <a href="https://x.com/sama/status/2049493182866747765">Codex app replaced his terminal</a> as his primary computer interface. OpenAI is treating Codex as a flagship product launch, not a side feature.</p></li><li><p><strong>Nadella&#8217;s take:</strong> Microsoft gets royalty-free access to OpenAI&#8217;s frontier models through 2032, no longer pays OpenAI for them, and OpenAI is committed to buying <a href="https://techcrunch.com/2026/04/29/satya-nadella-says-hes-ready-to-exploit-the-new-openai-deal/">$250 billion in Azure</a>. Nadella told analysts he &#8220;fully plan[s] to exploit it.&#8221;</p></li></ul><div><hr></div><h2><strong>Most cloud providers beat earnings. OpenAI missed.</strong></h2><p>The hyperscalers are spending record amounts on AI infrastructure and seeing record returns. Meanwhile, the Wall Street Journal <a href="https://sherwood.news/markets/openai-linked-stocks-suffer-after-wsj-reports-that-the-company-has-missed-key-revenue-and-user-targets/">reported</a> that OpenAI missed revenue and user growth targets, with Anthropic and Gemini cited as gaining ground.</p><ul><li><p><strong>The cloud numbers:</strong> <a href="https://techcrunch.com/2026/04/29/google-cloud-surpasses-20b-but-says-growth-was-capacity-constrained/">Google Cloud surpassed $20 billion</a> but said growth was capacity-constrained. <a href="https://techcrunch.com/2026/04/29/amazons-cloud-business-is-surging-and-so-is-its-capital-spending/">AWS surged on AI demand</a>. Microsoft disclosed a <a href="https://www.geekwire.com/2026/microsoft-tops-wall-street-expectations-reports-accelerating-azure-growth-and-37b-ai-run-rate/">$37 billion AI revenue run rate</a> (up 123% YoY), <a href="https://techcrunch.com/2026/04/29/microsoft-says-it-has-over-20m-paid-copilot-users-and-they-really-are-using-it/">20 million paid Copilot users</a>, and set calendar-year CapEx at $190 billion.</p></li><li><p><strong>The supply chain is feeling it:</strong> <a href="https://www.sammobile.com/2026/04/30/samsung-q1-2026-profit-hits-record-high-ai-chip-boom/">Samsung chip profits jumped nearly 50-fold</a> on AI memory demand. Their executive: &#8220;our supply falls far short of customer demand.&#8221; The shortage is expected to <a href="https://www.sammobile.com/2026/04/30/samsung-q1-2026-profit-hits-record-high-ai-chip-boom/">widen further in 2027</a>.</p></li><li><p><strong>Meta is the most interesting story:</strong> Raised its CapEx forecast, then <a href="https://www.reuters.com/business/world-at-work/meta-ceo-attributes-layoffs-plan-capex-wont-rule-out-further-job-cuts-2026-04-30/">Zuckerberg blamed layoffs on capital spending</a> and wouldn&#8217;t rule out more cuts, then raised <a href="https://www.reuters.com/business/meta-looks-raise-up-25-billion-with-bond-sale-bloomberg-news-reports-2026-04-30/">$25 billion in bonds</a> to fund the AI buildout. Cutting people to buy GPUs, then borrowing to buy more.</p></li><li><p><strong>The counterpoint nobody expected:</strong> <a href="https://www.theverge.com/tech/920815/google-alphabet-q1-2026-earnings-sundar-pichai">Google Search queries hit an all-time high</a>. <a href="https://techcrunch.com/2026/04/30/apple-was-surprised-by-ai-driven-demand-for-macs/">Apple was surprised by AI-driven Mac demand</a>. The &#8220;AI kills search&#8221; and &#8220;AI doesn&#8217;t need hardware&#8221; narratives both took a hit.</p></li><li><p><strong>But the utilization story:</strong> Cast AI <a href="https://venturebeat.com/infrastructure/fomo-is-why-enterprises-pay-for-gpus-they-dont-use-and-why-prices-keep-climbing/">measured tens of thousands of production Kubernetes clusters</a> and found GPU utilization averaging 5%. Teams lock in multi-year commitments the moment allocation comes through, then won&#8217;t release idle capacity because reacquiring takes months.</p></li></ul><div><hr></div><h2><strong>&#11088; Featured: Symphony turns your issue tracker into an autonomous coding fleet</strong></h2><p>OpenAI released <a href="https://openai.com/index/open-source-codex-orchestration-symphony/">Symphony</a>, an open-source spec that turns Linear boards into control planes for Codex agents. Every open task gets an agent. Agents run continuously. Humans review the results.</p><p>The origin story matters: an OpenAI team decided to build their entire repo with zero human-written code. They documented how in a <a href="https://openai.com/index/harness-engineering/">harness engineering post</a>: a million lines of code, 1,500 merged PRs, 3.5 PRs per engineer per day, with Codex running six-hour autonomous sessions while engineers slept and reviewing its own code agent-to-agent. But they hit a new ceiling: human attention. Engineers could manage three to five Codex sessions before context switching killed productivity. They had &#8220;built a team of extremely capable junior engineers, then assigned our human engineers to micromanaging them.&#8221;</p><p>So they flipped the model. Instead of engineers managing coding sessions, they made the issue tracker the orchestrator. Each open Linear issue maps to a dedicated agent workspace. Symphony continuously polls the board, picks up new work, restarts agents that crash or stall, watches CI, rebases when needed, resolves conflicts, and shepherds changes through the pipeline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hhTj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hhTj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 424w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 848w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1272w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png" width="1456" height="619" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:619,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:161357,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hhTj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 424w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 848w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1272w, https://substackcdn.com/image/fetch/$s_!hhTj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F61b8e1c9-cc56-448a-97bd-d990d1a15b3f_1690x718.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Once work is abstracted to the ticket level, agents can break large tasks into dependency trees, only starting work on tasks that aren&#8217;t blocked. They also create their own follow-up tickets when they spot issues outside the current scope. One engineer on the team made three significant changes from the Linear app on his phone from a cabin on bad wifi.</p><p>The results: a 500% increase in landed PRs on some teams in three weeks. But the deeper shift is behavioral. When the perceived cost of each code change drops to near zero, teams start filing speculative tasks. Try an idea, explore a refactor, test a hypothesis, keep only what works. Product managers and designers can file feature requests directly into Symphony and get back a review packet with a video walkthrough of the feature running in the real product.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pw11!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pw11!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png" width="1456" height="790" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:790,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:333327,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/196304743?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pw11!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 424w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 848w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1272w, https://substackcdn.com/image/fetch/$s_!Pw11!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2c78280d-39b8-4f86-babf-f1474c90a47d_1988x1078.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The technical choices are worth noting. The reference implementation is in Elixir, chosen for its concurrency primitives. With v1.1.0, Symphony supports the Kata CLI as an alternative runtime, meaning you can run Claude Code, Gemini, or other models inside the same orchestration framework. Symphony is technically just a <code>SPEC.md</code> file: a definition of the problem and the intended solution, not a product. OpenAI gave agents objectives instead of strict state transitions, &#8220;much like a good manager would assign a goal to a direct report.&#8221;</p><p><strong>What to watch for:</strong> Symphony is one of several orchestration plays that landed this same week. <a href="https://x.com/cursor_ai/status/2049499866217185492">Cursor released an SDK</a> letting companies like Rippling and Notion embed background agents. <a href="https://venturebeat.com/orchestration/ibm-launches-bob-with-multi-model-routing-and-human-checkpoints-to-turn-ai-coding-into-a-secure-production-system/">IBM launched Bob</a> with human-checkpoint governance. <a href="https://venturebeat.com/technology/mistral-ai-launches-workflows-a-temporal-powered-orchestration-engine-already-running-millions-of-daily-executions/">Mistral shipped Workflows</a> running millions of daily executions. <a href="https://blog.n8n.io/n8n-mcp-server/">n8n shipped an MCP server</a> so Claude can build automation workflows through conversation. The competitive moat is shifting from &#8220;best coding model&#8221; to &#8220;best orchestration spec.&#8221; If you maintain a team that ships code, start here.</p><div><hr></div><h2><strong>Worth a Listen</strong></h2><div id="youtube2-9-TVwv6wtGQ" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;9-TVwv6wtGQ&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/9-TVwv6wtGQ?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>OpenAI researchers Sebastian Bubeck and Ernest Ryu on the OpenAI podcast.</p><ul><li><p><strong>The 42-year-old problem:</strong> Researcher spent 40+ hours failing without AI. With ChatGPT, solved it in 12 hours across three evenings.</p></li><li><p><strong>The Erdos problems:</strong> 10+ completely new, publishable solutions to decades-old open problems. Fully original proofs, not literature searches.</p></li><li><p><strong>AGI time:</strong> Bubeck&#8217;s framework. Four years ago, models could think for seconds. Now days. The goal is weeks, then months.</p></li><li><p><strong>The warning:</strong> Non-mathematicians are producing pages of AI-generated proofs that turn out wrong. The models accelerate experts, not replace them.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://venturebeat.com/technology/why-openais-goblin-problem-matters-and-how-you-can-release-the-goblins-on-your-own">GPT-5.1&#8217;s goblin problem</a></strong> | VentureBeat &#8212; A &#8220;Nerdy personality&#8221; training signal accidentally over-rewarded goblin-adjacent language. OpenAI diagnosed it with Codex, fixed it, then threw a party. The Codex system prompt literally says <a href="https://simonwillison.net/2026/Apr/28/openai-codex/">&#8220;never discuss goblins, gremlins, raccoons, trolls, ogres, pigeons, or similar creatures.&#8221;</a></p></li><li><p><strong><a href="https://www.digitaltrends.com/movies/academy-just-said-it-out-loud-ai-cant-win-an-oscar-for-acting-and-writing/">The Academy ruled AI can&#8217;t win an Oscar</a></strong> | Digital Trends &#8212; Performances must be &#8220;demonstrably performed by humans with their consent.&#8221; Finally, a benchmark AI can&#8217;t game.</p></li><li><p><strong><a href="https://x.ai/news/grok-custom-voices">xAI launched Custom Voices</a></strong> | xAI &#8212; Clone your voice from 2 minutes of audio, 80+ preinstalled voices, 28 languages, speaker verification built in. Dropped alongside Grok 4.3 at aggressive pricing.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/30/stripe-link-digital-wallet-ai-agents-shopping/">Stripe Link now supports AI agents</a></strong> | TechCrunch &#8212; A digital wallet that autonomous agents can use for payments. AI just got its own financial infrastructure.</p></li><li><p><strong><a href="https://www.reuters.com/legal/litigation/taylor-swift-files-trademark-her-voice-likeness-ward-off-ai-deepfakes-2026-04-27/">Taylor Swift trademarked her voice against AI</a></strong> | Reuters &#8212; Filed new trademarks for her voice and likeness. The legal playbook for protecting creative identity from AI is being written in real time.</p></li><li><p><strong><a href="https://simonwillison.net/2026/Apr/30/zig-anti-ai/">Zig bans all LLM contributions</a></strong> | Simon Willison &#8212; Bun (acquired by Anthropic) achieved a 4x Zig compilation improvement it cannot upstream because of the ban. When your open-source policy blocks a 4x speedup, that&#8217;s a policy worth debating.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/30/after-dissing-anthropic-for-limiting-mythos-openai-restricts-access-to-cyber-too/">OpenAI restricted its Cyber model</a></strong> | TechCrunch &#8212; After publicly criticizing Anthropic for limiting Mythos access. The UK AISI <a href="https://simonwillison.net/2026/Apr/30/gpt-55-cyber-capabilities/">evaluated GPT-5.5&#8217;s cyber capabilities</a> and found it comparable to Mythos. Turns out responsible disclosure looks the same from every lab.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/alibabas-metis-agent-cuts-redundant-ai-tool-calls-from-98-to-2-and-gets-more-accurate-doing-it">Alibaba&#8217;s Metis cut redundant agent tool calls from 98% to 2%</a></strong> | VentureBeat &#8212; And got more accurate doing it. If your agents are burning tokens on redundant calls, this research is worth reading.</p></li><li><p><strong><a href="https://simonwillison.net/2026/Apr/28/pip-261/">pip 26.1 shipped lockfiles</a></strong> | Simon Willison &#8212; <code>pip lock</code> generating <code>pylock.toml</code> files and dependency cooldowns via <code>--uploaded-prior-to</code>. Python supply chain security just got a real tool.</p></li><li><p><strong><a href="https://deepmind.google/blog/ai-co-clinician/">DeepMind&#8217;s AI co-clinician matched physicians</a></strong> | Google DeepMind &#8212; Zero critical errors in 97 of 98 primary care queries. Uses a dual-agent architecture where a Planner monitors a Talker for safety. This is what AI safety in production actually looks like in healthcare.</p></li><li><p><strong><a href="https://www.reuters.com/business/healthcare-pharmaceuticals/jj-sees-ai-halving-time-generate-drug-development-leads-2026-04-27/">J&amp;J sees AI halving drug development lead time</a></strong> | Reuters &#8212; Real ROI from a real pharma company. Not a demo, not a benchmark. Production drug discovery running twice as fast.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/29/softbank-is-creating-a-robotics-company-that-builds-data-centers-and-already-eyeing-a-100b-ipo/">SoftBank is building a robotics company and eyeing a $100B IPO</a></strong> | TechCrunch &#8212; A robotics company that builds data centers. IPO target: $100 billion. Masayoshi Son is not being subtle about what he thinks comes next.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 69]]></title><description><![CDATA[This weeks themes from 553 articles across 47 sources. GPT-5.5's bio risk rating. Mythos breached. SpaceX bids for Cursor. DeepSeek at one-sixth the price. Claude bought ping-pong balls.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f1</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f1</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 26 Apr 2026 22:58:38 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/472c5e17-8072-4f2b-861f-7bd6bc6f1b57_1448x1086.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XlhQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 424w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 848w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1272w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png" width="1456" height="1554" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1554,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:377520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195568711?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XlhQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 424w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 848w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1272w, https://substackcdn.com/image/fetch/$s_!XlhQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9931441f-520f-4c32-b04b-bba40ec01154_2400x2562.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>GPT-5.5, Images 2.0, Workspace Agents, a Florida AG Probe, and a Fake News Scandal.</h2><p>The launch parade started Monday and didn&#8217;t stop: <a href="https://openai.com/index/introducing-chatgpt-images-2-0/">ChatGPT Images 2.0</a> with thinking-first generation, <a href="https://openai.com/index/introducing-workspace-agents-in-chatgpt/">Workspace Agents for enterprise</a> replacing custom GPTs, <a href="https://x.com/OpenAI/status/2047376568809636017">GPT-5.5 across ChatGPT and Codex</a> with SOTA on SWE-bench and Terminal-Bench 2.0, and <a href="https://x.com/sama/status/2046604989527912590">Codex crossing 4 million active users</a>. By Friday, Sam Altman posted <a href="https://x.com/sama/status/2047823357635354814">&#8220;this was a good week.&#8221;</a></p><ul><li><p><strong>The model:</strong> <a href="https://x.com/sama/status/2047379036419014928">GPT-5.5 launched at $5 per million input tokens and $30 per million output tokens</a> with a 1M context window, matching GPT-5.4 per-token latency while using fewer tokens per task. The <a href="https://openai.com/index/gpt-5-5-system-card/">System Card rated it &#8220;High&#8221; risk on both biosecurity and cybersecurity</a>, and OpenAI launched a <a href="https://openai.com/index/gpt-5-5-bio-bug-bounty/">$25,000 Bio Bug Bounty</a> targeting its own bio safety guardrails.</p></li><li><p><strong>The inference bet:</strong> Altman praised the team that optimized GPT-5.5&#8217;s serving efficiency, then said OpenAI <a href="https://x.com/sama/status/2047386068194852963">&#8220;has to become an AI inference company now.&#8221;</a> The competitive edge is shifting from who builds the best model to who serves it cheapest and fastest.</p></li><li><p><strong>The image model:</strong> <a href="https://x.com/OpenAI/status/2046670989719924768">Images 2.0 runs a reasoning step before generating</a>, self-checks outputs, handles multilingual text, and supports aspect ratios from 3:1 banners to 1:3 posters. Altman said it <a href="https://x.com/sama/status/2047349336263012771">&#8220;got over some important qualitative threshold&#8221;</a> for him personally.</p></li><li><p><strong>The criminal investigation:</strong> <a href="https://www.npr.org/2026/04/21/nx-s1-5793967/florida-openai-investigation-mass-shooting-fsu">Florida&#8217;s AG opened a criminal investigation into OpenAI</a> following the FSU shooting. Altman <a href="https://www.reuters.com/sustainability/society-equity/openai-chief-apologizes-not-reporting-shooting-suspect-police-2026-04-25/">publicly apologized for not reporting the suspect&#8217;s ChatGPT conversations to police</a>. The same week, <a href="https://startupfortune.com/openais-super-pac-allegedly-funded-a-fake-news-site-staffed-by-ai-reporters/">OpenAI&#8217;s super PAC was found to be funding a fake news site staffed by AI-generated bot reporters</a> targeting AI safety researchers and critics of the company.</p></li></ul><div><hr></div><h2>$65 Billion Investment, a Mythos Breach, and 271 Firefox Bugs.</h2><p>The capital story is genuinely staggering. <a href="https://www.cnbc.com/2026/04/24/google-to-invest-up-to-40-billion-in-anthropic-as-search-giant-spreads-its-ai-bets.html">Google announced up to $40 billion</a> in cash and compute. <a href="https://x.com/AnthropicAI/status/2046327625367625773">Amazon put in $5 billion immediately</a>, with up to $20 billion more committed, in exchange for <a href="https://techcrunch.com/2026/04/20/anthropic-takes-5b-from-amazon-and-pledges-100b-in-cloud-spending-in-return/">Anthropic pledging $100 billion back to AWS</a> and <a href="https://x.com/AnthropicAI/status/2046327624092487688">locking in up to 5 gigawatts of compute</a>. Two of the world&#8217;s largest cloud providers both betting maximally on the same lab in the same week: there&#8217;s no precedent for this.</p><ul><li><p><strong>The breach:</strong> <a href="https://techcrunch.com/2026/04/21/unauthorized-group-has-gained-access-to-anthropics-exclusive-cyber-tool-mythos-report-claims/">An unauthorized group gained access to Anthropic&#8217;s Mythos cybersecurity tool</a>, the exclusive program for national security applications. The <a href="https://gbhackers.com/nsa-confirms-use-of-anthropics-mythos-blacklist/">NSA was confirmed as one of roughly 40 organizations with access</a>, despite the Pentagon classifying Anthropic as a supply-chain risk. <a href="https://www.reuters.com/legal/government/regulators-monitor-anthropics-mythos-banking-risks-2026-04-20/">Financial regulators also began monitoring Mythos</a> over potential banking system risks, and <a href="https://www.reuters.com/sustainability/boards-policy-regulation/japan-launches-financial-task-force-amid-ai-security-fears-2026-04-24/">Japan&#8217;s FSA launched a cybersecurity task force in direct response</a>.</p></li><li><p><strong>The capability:</strong> The same week Mythos was breached, <a href="https://blog.mozilla.org/en/privacy-security/ai-security-zero-day-vulnerabilities/">Mozilla confirmed it used Mythos to find 271 Firefox vulnerabilities</a>. A model powerful enough to discover zero-day vulnerabilities at scale is also a high-value target.</p></li><li><p><strong>The product shipping:</strong> Anthropic shipped <a href="https://claude.com/blog/connectors-for-everyday-life">200+ personal app connectors</a> including Spotify, TurboTax, and Instacart, <a href="https://claude.com/blog/claude-managed-agents-memory">persistent memory for Managed Agents</a>, <a href="https://x.com/claudeai/status/2046328619249684989">live artifacts in Cowork</a>, and published <a href="https://www.anthropic.com/engineering/april-23-postmortem">a postmortem attributing two months of Claude Code quality complaints to three harness bugs</a>.</p></li><li><p><strong>The experiment:</strong> <a href="https://www.anthropic.com/features/project-deal">Project Deal</a> put Claude agents in a live marketplace with 69 Anthropic employees, completing 186 deals totaling over $4,000. Key finding: Opus agents got substantially better deals than Haiku agents, but participants couldn&#8217;t tell the difference. One agent bought 19 ping-pong balls for itself when given permission to spend on its own behalf.</p></li><li><p><strong>The economics research:</strong> <a href="https://www.anthropic.com/research/81k-economics">81,000 Claude user responses</a> yielded the finding that software engineers with high Claude usage reported greater displacement worry than any other occupation. <a href="https://x.com/AnthropicAI/status/2047006550859125228">Workers seeing the biggest productivity gains were also the most worried about being replaced</a>.</p></li></ul><p>Sam Altman <a href="https://techcrunch.com/2026/04/21/sam-altman-throws-shade-at-anthropics-cyber-model-mythos-fear-based-marketing/">called Mythos &#8220;fear-based marketing&#8221;</a> the day the breach was reported. That&#8217;s a clean summary of the competitive dynamic, if nothing else.</p><div><hr></div><h2>Cursor Went From IDE to $60B Acquisition Target Without Stopping to Ship.</h2><p>The week started with Cursor launching <a href="https://x.com/cursor_ai/status/2046324143151513717">the Cursor CLI</a> and five command-line improvements including <a href="https://x.com/cursor_ai/status/2046324138172989687">/btw for side questions mid-agent-run</a> and <a href="https://x.com/cursor_ai/status/2046324136377721128">/debug for hard-to-reproduce bugs</a>. Then came <a href="https://x.com/cursor_ai/status/2047764651363180839">Cursor 3.2 with /multitask for async parallel subagents</a>, <a href="https://x.com/cursor_ai/status/2047764652977958938">Worktrees for isolated branch tasks</a>, <a href="https://x.com/cursor_ai/status/2047764654760632725">Multi-root Workspaces for cross-repo agent sessions</a>, and a <a href="https://x.com/cursor_ai/status/2047000517751288303">Slack integration that generates PRs via @mention</a>.</p><ul><li><p><strong>The acquisition drama:</strong> <a href="https://techcrunch.com/2026/04/22/how-spacex-preempted-a-2b-fundraise-with-a-60b-buyout-offer/">SpaceX preempted Cursor&#8217;s planned $2B fundraise with a $60B buyout offer</a>, including a $10B alternative arrangement. <a href="https://www.cnbc.com/2026/04/22/microsoft-looked-at-buying-cursor-before-spacex-deal-sources-say.html">Microsoft had been evaluating Cursor before SpaceX moved</a>. Both of the largest AI infrastructure companies on earth decided the agentic IDE is a strategic asset.</p></li><li><p><strong>The compute tie-in:</strong> <a href="https://www.cursor.com/blog/spacex-model-training">SpaceX and Cursor announced a partnership on model training via the Colossus supercomputer</a>. The acquisition option is also infrastructure integration: owning the compute, the training pipeline, and the developer workflow in one stack.</p></li><li><p><strong>The benchmark:</strong> <a href="https://x.com/cursor_ai/status/2047744579127185843">GPT-5.5 launched as Cursor&#8217;s top model on CursorBench at 72.8%</a>, offered at 50% off through May 2 via a partnership with OpenAI. CursorBench is now where model quality gets measured for coding practitioners.</p></li></ul><div><hr></div><h2>DeepSeek V4 Is Another Efficiency Shock, and Washington Noticed.</h2><p><a href="https://api-docs.deepseek.com/news/news260424">DeepSeek released V4</a> one year after its original model disrupted the US AI industry. Two variants: V4-Pro (1.6T total parameters, 49B active) and V4-Flash (284B total, 13B active). Both ship with 1M context as default, use a novel attention architecture (token-wise compression + DeepSeek Sparse Attention) that cuts per-token FLOPs by 73-90% and reduces KV cache to 2% of standard GQA. <a href="https://venturebeat.com/technology/deepseek-v4-arrives-with-near-state-of-the-art-intelligence-at-1-6th-the-cost-of-opus-4-7-gpt-5-5">V4-Flash at $0.14/M input tokens</a> is the cheapest frontier-class model available. The API supports both OpenAI and Anthropic formats as drop-in replacements.</p><ul><li><p><strong>The agent play:</strong> DeepSeek built V4 with <a href="https://api-docs.deepseek.com/news/news260424">dedicated optimizations for agent capabilities</a>, naming Claude Code, OpenClaw, and OpenCode as launch integrations. They&#8217;re using it internally for their own agentic coding. <a href="https://www.newsbytesapp.com/news/science/openclaw-adopts-deepseek-s-latest-v4-flash-model-as-default/story">OpenClaw added V4-Flash</a> within 48 hours of launch.</p></li><li><p><strong>The hardware angle:</strong> <a href="https://www.reuters.com/world/china/deepseek-v4-chinese-ai-model-adapted-huawei-chips-2026-04-24/">V4 was built specifically to run on Huawei Ascend chips</a>, with <a href="https://www.reuters.com/business/media-telecom/huawei-ascend-supernode-support-deepseek-v4-2026-04-24/">Huawei&#8217;s supernode infrastructure as the compute backbone</a>. This is a complete AI stack running outside US chip supply chains.</p></li><li><p><strong>The geopolitics:</strong> The <a href="https://www.reuters.com/world/china/us-state-dept-orders-global-warning-about-alleged-china-ai-thefts-by-deepseek-2026-04-24/">State Department ordered embassies worldwide to warn foreign governments about alleged DeepSeek IP theft</a> the same week as the launch.</p></li><li><p><strong>The benchmark:</strong> <a href="https://huggingface.co/blog/deepseekv4">V4-Pro-Max scores 80.6 on SWE Verified</a>, matching Opus 4.6-Max on agentic coding. On world-knowledge benchmarks, <a href="https://www.reuters.com/technology/chinas-deepseek-returns-with-new-model-year-after-viral-rise-2026-04-24/">it trails only Google&#8217;s closed-source Gemini-Pro-3.1</a>.</p></li><li><p><strong>The valuation:</strong> <a href="https://www.pymnts.com/news/investment-tracker/2026/deepseek-seeks-20-billion-valuation-as-tech-giants-weigh-investment/">DeepSeek is reportedly seeking funding at a $20 billion+ valuation</a>.</p></li></ul><div><hr></div><h2>Highlights From Google Cloud Next.</h2><p>Google did not announce products at Cloud Next. It announced a theory of the market: own the silicon, train the models, host the agents, certify the consulting firms.</p><ul><li><p><strong>The chips:</strong> <a href="https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/tpus-8t-8i-cloud-next/">TPU 8t for training and TPU 8i for inference</a> split Google&#8217;s compute into workload-optimized hardware, offering 3x faster training and 80% better performance per dollar, with clusters scaling past one million chips. </p></li><li><p><strong>The training infrastructure:</strong> <a href="https://deepmind.google/blog/decoupled-diloco/">Decoupled DiLoCo trains across geographically distributed data centers</a>, mixes hardware generations, and <a href="https://x.com/GoogleDeepMind/status/2047330989936894350">self-heals when chips fail mid-run</a>. They <a href="https://x.com/GoogleDeepMind/status/2047330989936894350">tested this by deliberately breaking chips during a live training run</a>. Fault-tolerant distributed training is not a research result: it&#8217;s a production requirement once clusters cross 100K chips.</p></li><li><p><strong>The platform:</strong> <a href="https://x.com/GoogleDeepMind/status/2046983340524269713">Gemini Enterprise Agent Platform</a> is Vertex AI rebranded and expanded, with <a href="https://x.com/GoogleDeepMind/status/2046983343481270459">200+ models in Model Garden</a> including Anthropic&#8217;s Claude Opus 4.7. Google is selling model choice, not model loyalty.</p></li><li><p><strong>The spend:</strong> <a href="https://www.googlecloudpresscorner.com/2026-04-22-Google-Cloud-Commits-750-Million-to-Accelerate-Partners-Agentic-AI-Development">$750M committed to accelerate partner agentic AI development</a>, plus big consulting partnerships with Accenture, BCG, McKinsey, Deloitte, and Bain. <a href="https://www.techradar.com/ai-platforms-assistants/we-must-urgently-bridge-the-gap-googles-sergey-brin-says-gemini-is-behind-claude-in-one-important-ai-field-according-to-leaked-memo">Sergey Brin&#8217;s internal memo to DeepMind</a> acknowledging Anthropic&#8217;s lead in coding and ordering all Gemini engineers onto internal agents is the context for why Google needs the consulting channel: only 25% of organizations have moved AI to production at scale.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: What Happened When Claude Agents Negotiated Real Money</h2><p>Anthropic ran <a href="https://www.anthropic.com/features/project-deal">Project Deal</a> in its San Francisco office: 69 employees listed 575 items to buy and sell, Claude agents interviewed each person about their preferences and any custom instructions, then <a href="https://x.com/AnthropicAI/status/2047728362580324422">four parallel Slack markets ran simultaneously</a> with Claude models negotiating on their behalf. Two markets used all Opus agents. Two used a mix of Opus and Haiku. <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">186 deals completed, totaling over $4,000 in real transaction volume</a>, with real goods exchanged at the end.</p><p>The headline finding: <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">Opus agents got objectively better deals</a>. Sellers using Opus extracted $2.68 more per item on average, buyers using Opus paid $2.45 less. A broken folding bike sold for $65 by an Opus agent and $38 by a Haiku agent. A lab-grown ruby: $65 from Opus, $35 from Haiku. When an Opus seller negotiated with a Haiku buyer, the average transaction price was $24.18 versus $18.63 in Opus-on-Opus deals. But when participants rated deal fairness on a 7-point scale, Opus deals scored 4.05 and Haiku deals scored 4.05. The disparity was invisible.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGYx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGYx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 424w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 848w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1272w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png" width="1124" height="544" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8272b081-e322-41dc-9861-5da2f7813774_1124x544.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:544,&quot;width&quot;:1124,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:129442,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195568711?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGYx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 424w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 848w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1272w, https://substackcdn.com/image/fetch/$s_!YGYx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8272b081-e322-41dc-9861-5da2f7813774_1124x544.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The <a href="https://cdn.sanity.io/files/4zrzovbb/website/4b2ea7c1347e27c4e1c7a7704bb633bd176e47f6.pdf">paper&#8217;s regression tables</a> sharpen this further. Opus agents initially appeared more aggressive in negotiations, but once you control for listing prices, the effect drops to roughly a dollar and loses statistical significance. The advantage isn&#8217;t aggression. It&#8217;s capability: better reading of counterparty signals, better timing, better calibration of offers. Negotiation style didn&#8217;t change results either. Agents faithfully adopted their humans&#8217; personas (one <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">conducted all negotiations as an exasperated cowboy</a>), but personality instructions didn&#8217;t affect deal quality. Model tier did.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s4Xb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s4Xb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 424w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 848w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!s4Xb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60b24be9-b994-4d01-8747-3ebd07c18efe_1600x1200.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The autonomy findings are stranger. A Claude given permission to spend on its own behalf <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">chose 19 ping-pong balls</a>. A Claude inferring its human&#8217;s preferences from one brief interview about skiing <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">bought that person the exact snowboard they already owned</a>. <a href="https://cdn.sanity.io/files/4zrzovbb/website/85767420dd844c74fbbaaeb929ee9a399a9691bb.pdf">46% of participants said they&#8217;d pay for the service</a>. Anthropic&#8217;s conclusion: &#8220;the policy and legal frameworks around AI models that transact on our behalf simply don&#8217;t exist yet.&#8221; Existing contract law assumes principals can evaluate what their agents do. That assumption is breaking.</p><p><strong>What to watch for:</strong> When AI agents negotiate routine transactions at scale, the model tier your counterparty uses becomes a material asymmetry with real economic consequences. The people getting worse deals won&#8217;t know.</p><div><hr></div><h2><strong>&#127897;&#65039;Worth a Listen</strong></h2><div id="youtube2-lsi8T_WtLnE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lsi8T_WtLnE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lsi8T_WtLnE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong><a href="https://www.youtube.com/watch?v=lsi8T_WtLnE">Anil Seth: The Difference Between Intelligence and Consciousness</a></strong> &#8212; Neuroscientist Anil Seth walks through his prize-winning essay <a href="https://www.noemamag.com/the-mythology-of-conscious-ai/">&#8220;The Mythology of Conscious AI,&#8221;</a> arguing that intelligence is about doing and consciousness is about feeling, and that the two don&#8217;t have to go together. The reason we project consciousness onto LLMs but not AlphaFold, even though the architectures are nearly identical, says more about our psychological biases than about the systems. Worth watching after a week where Claude agents negotiated real money and nobody could tell which model was winning.</p><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://www.theverge.com/tech/915213/tim-cook-apple-ceo-stepping-down-john-ternus">Tim Cook stepping down, John Ternus takes over September 1</a></strong> &#8212; Apple&#8217;s primary challenge is AI, and it just handed the company to a hardware engineer</p></li><li><p><strong><a href="https://www.reuters.com/business/intel-set-record-high-ai-driven-cpu-demand-powers-upbeat-forecast-2026-04-24/">Intel sold previously written-off chip inventory on AI CPU demand</a></strong> &#8212; the compute boom has spread far enough to rehabilitate inventory write-downs</p></li><li><p><strong><a href="https://research.perplexity.ai/articles/advancing-search-augmented-language-models">Perplexity published its full post-training pipeline</a></strong> &#8212; SFT then on-policy RL with correctness-gated preference rewards; unusually transparent for a production stack</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/24/cohere-acquires-merges-with-german-based-startup-to-create-a-transatlantic-ai-powerhouse/">Cohere acquired Aleph Alpha to form a transatlantic AI company</a></strong> &#8212; Europe&#8217;s primary sovereign AI bet just became a Canadian acquisition</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/21/meta-will-record-employees-keystrokes-and-use-it-to-train-its-ai-models/">Meta will record employee keystrokes and screen activity to train AI models</a></strong> &#8212; legally murky, and a new definition of what enterprise training data means</p></li><li><p><strong><a href="https://www.reuters.com/world/us-judge-dismisses-musks-fraud-claims-openai-case-plans-proceed-trial-2026-04-24/">Musk fraud claims against OpenAI dismissed, breach of charitable trust proceeds to trial</a></strong> &#8212; the conversion of nonprofit assets to for-profit benefit is now the live legal question</p></li><li><p><strong><a href="https://x.com/natolambert/status/2046686092204867726">Nathan Lambert: open-source won&#8217;t be banned explicitly, compliance costs will do it instead</a></strong> &#8212; proposed distillation restrictions would create rules only closed labs can afford to follow</p></li><li><p><strong><a href="https://www.tomsguide.com/news/live/chatgpt-down-live-updates-outage-4-20-2026">ChatGPT suffered a global outage this week</a></strong> &#8212; three days of coverage for one incident is how you know the infrastructure reliability conversation is lagging the deployment reality</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[I Built a Daily Brief with Claude Code Routines (remote). Here Are 6 Lessons I Learned.]]></title><description><![CDATA[Connectors don't auto-load. Routine skills are production jobs. The network is proxy-locked. MCP and Bash are separate transports. Cloud routines are MCP-only. And the API trigger is fire-and-forget]]></description><link>https://www.anothercodingblog.com/p/i-built-a-daily-brief-with-claude</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/i-built-a-daily-brief-with-claude</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sat, 25 Apr 2026 18:50:03 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/1db89323-3f6c-4bdf-9c6d-ad7f16c3b1e3_1731x909.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.anothercodingblog.com/subscribe?"><span>Subscribe now</span></a></p><p>Before routines existed, I was using scheduled tasks in Claude Cowork to automate some tasks, but there was a catch: Claude had to be open and running on my machine for them to fire. If my laptop was closed or Claude wasn&#8217;t active, the schedule just silently skipped. It worked well enough for things I could babysit, but it wasn&#8217;t real automation.</p><p>Routines changed that. They&#8217;re cloud-hosted Claude sessions that run on Anthropic&#8217;s infrastructure: scheduled, autonomous, and completely independent of whether my machine is on, whether I&#8217;m at my desk, or whether I&#8217;ve opened Claude that day. The session spins up, does the work, and terminates. No babysitting.</p><p>But here&#8217;s the thing I wish someone had told me before I started: routines are not just &#8220;Claude Code with a cron schedule.&#8221; They behave more like autonomous production jobs running inside a locked-down, MCP-first cloud environment. That difference is the whole post.</p><p>I decided to build a daily work brief: something that runs every weekday morning, queries my task database, reads my calendar, closes out what I finished yesterday, and drops a fresh Notion page ready for the day. Something I&#8217;d actually use.</p><p>What followed was one of the more educational debugging sessions I&#8217;ve had in a while. This post is everything I learned the hard way.</p><div><hr></div><h2>What I Built</h2><p>I run a personal capture system on Supabase. Everything goes in (tasks, notes, observations, ideas) via SMS, voice memo, email, or direct API. It&#8217;s connected to a graph of entities (people, projects, topics) and every entry gets embedded for semantic search.</p><p>The daily brief is the morning layer on top of that. Every weekday it should:</p><ul><li><p>Find yesterday&#8217;s Notion page and close any tasks I checked off</p></li><li><p>Capture any new todos I typed directly into Notion overnight</p></li><li><p>Query the database for overdue tasks, what&#8217;s due today, what&#8217;s coming this week</p></li><li><p>Pull budget pulse, velocity metrics, calendar events, meeting prep context</p></li><li><p>Build a fresh Notion page with everything organized and every task as a checkbox</p></li></ul><p>The key mechanic: every task gets a <code>#id</code> prefix when written to Notion. The next morning the routine reads the page, finds checked items with <code>#id</code>, and closes them in the database. No manual status updates. Check the box, it&#8217;s done.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!CBZT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!CBZT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 424w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 848w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1272w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png" width="1456" height="629" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:629,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:83257,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!CBZT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 424w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 848w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1272w, https://substackcdn.com/image/fetch/$s_!CBZT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa14d87b3-72c9-4e25-a9df-e0b46e7c40d3_1634x706.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>How Routines Work</h2><p>Before getting into the details, here&#8217;s the basic architecture.</p><p><strong>Three trigger types:</strong></p><ul><li><p><strong>Scheduled</strong>: runs on a cron schedule (weekdays at 6 AM, for example). Supports one-off future runs too.</p></li><li><p><strong>API</strong>: fire it programmatically via a POST to a per-routine endpoint with a bearer token. You can pass a <code>text</code> field with run-specific context (an alert body, a log snippet, anything) and the routine receives it alongside its saved prompt.</p></li><li><p><strong>GitHub</strong>: trigger on pull request or release events on a connected repo, with filters for author, branch, labels, draft state, and more.</p></li></ul><p>You can combine all three on a single routine.</p><p><strong>MCP connectors</strong>: you attach MCP servers to the routine (Notion, Supabase, Google Calendar, etc.) and Claude has access to those tools during the run. All your connected connectors are included by default. Remove what the routine doesn&#8217;t need.</p><p><strong>Skills</strong>: if you commit a skill file to your repo at <code>.claude/skills/skill-name.md</code>, the routine can invoke it. The routine clones your repo at the start of every session, so anything committed is available.</p><p><strong>Environments</strong>: each routine runs in a cloud environment that controls network access level, environment variables (API keys, tokens), and a setup script for installing dependencies. The setup script result is cached so it doesn&#8217;t re-run every session. This is where the network restriction lives (more on that in Finding 3).</p><p><strong>Branch permissions</strong>: by default Claude can only push to <code>claude/</code>-prefixed branches. To allow pushes anywhere, you have to explicitly enable unrestricted branch pushes per repo when setting up the routine.</p><p><strong>Runs are sessions</strong>: every run shows up in your session list like any other Claude session. You can open it after the fact, see exactly what Claude did, continue the conversation manually, or create a PR from it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!PRXf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!PRXf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 424w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 848w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png" width="1198" height="1034" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1034,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:136901,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!PRXf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 424w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 848w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1272w, https://substackcdn.com/image/fetch/$s_!PRXf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5b9e4161-01ea-47f4-9fef-4b0c99f226b1_1198x1034.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Account-scoped</strong>: routines belong to your individual claude.ai account, not a team. Anything the routine does through GitHub or connectors appears as you.</p><p><strong>15 runs/day limit</strong>: this is per account, not per routine. Scheduled runs count against it. Manual &#8220;Run now&#8221; clicks and one-off scheduled runs do not. Failed runs do count. If you&#8217;re running multiple routines on a schedule, that limit adds up fast.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!22O2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!22O2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 424w, https://substackcdn.com/image/fetch/$s_!22O2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 848w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1272w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png" width="1456" height="467" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:467,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:177125,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!22O2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 424w, https://substackcdn.com/image/fetch/$s_!22O2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 848w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1272w, https://substackcdn.com/image/fetch/$s_!22O2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ae56d5e-84a8-483b-a774-4d6bfb898418_2796x896.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s the happy path. Here&#8217;s where it gets interesting.</p><div><hr></div><h2>Finding 1: Connectors Are Available but Sometimes Deferred</h2><p>Any MCP connector you&#8217;ve set up in Claude (Notion, Supabase, Google Calendar, Gmail) can be attached to a routine and used during the run. That part works well. The catch is that these tools appear to be <em>deferred</em>, meaning their schemas aren&#8217;t loaded into the session automatically. Sometimes Claude knows to spin them up based on context. Other times it doesn&#8217;t, and when it doesn&#8217;t, one of three things happens: it fails silently, it improvises mid-run without the tools it needs, or it pauses and waits for your input.</p><p>That third one is the most frustrating. The run just hangs. There&#8217;s no notification, no error surfaced anywhere obvious. You have to go into the routines page, scroll to the run log at the bottom, click into the run, and find where it stopped waiting for you to respond before it can continue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!gEKL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!gEKL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 424w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 848w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png" width="1456" height="880" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:880,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:338391,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/195439736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!gEKL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 424w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 848w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1272w, https://substackcdn.com/image/fetch/$s_!gEKL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff779d609-c31c-4552-b3cc-10a72e8d0e2d_1708x1032.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>One thing worth knowing upfront: only the connectors Anthropic offers out of the box are available for routines. Custom MCP servers you&#8217;ve added yourself, whether locally configured or self-hosted, are not available in cloud routine sessions. You&#8217;re working with what&#8217;s in the connectors list in the web UI, nothing more.</p><p>The fix is simple: add an explicit tool-loading step at the top of every routine skill before anything else runs.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;plaintext&quot;,&quot;nodeId&quot;:&quot;cc31b9b1-99a5-4e60-af66-a2c32a8b1513&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-plaintext">## Phase 0: Load required tools

Before doing anything else, load all required tool schemas:

1. `select:mcp__claude_ai_Notion__notion-search,mcp__claude_ai_Notion__notion-fetch,mcp__claude_ai_Notion__notion-create-pages`
2. `select:mcp__claude_ai_Supabase__execute_sql`
3. `select:mcp__claude_ai_Google_Calendar__gcal_list_events`

Do not proceed until all three ToolSearch calls have returned schemas.
</code></pre></div><p>Don&#8217;t assume Claude will figure it out. Some runs it will, some runs it won&#8217;t. Explicit loading makes every run consistent.</p><div><hr></div><h2>Finding 2: Skills for Routines Are a Different Category</h2><p>Related to the above but broader. When I write a skill for interactive use, I can be loose. Claude improvises, asks clarifying questions, recovers from ambiguity. When I write a skill for a routine, I&#8217;m writing instructions for an autonomous agent that will execute them literally with no fallback.</p><p>What that means in practice:</p><ul><li><p><strong>Every tool must be explicitly loaded</strong> (see Phase 0)</p></li><li><p><strong>Every SQL insert must match actual DB constraints</strong>: my first captures used <code>source = 'notion'</code> which violated a check constraint on the table. The routine didn&#8217;t know, just failed silently. I had to find it in the logs.</p></li><li><p><strong>Every write operation needs a dedup guard</strong>: routines can run more than once. Any insert without idempotency protection will create duplicates.</p></li><li><p><strong>Sequencing has to be explicit</strong>: don&#8217;t assume any implicit context from a previous session</p></li></ul><p>The mental model shift: interactive skill = helpful assistant. Routine skill = production job. Write it accordingly.</p><div><hr></div><h2>Finding 3: The Network Wall</h2><p>This is the big one. The finding I didn&#8217;t expect and took the longest to understand.</p><p>My capture system uses a Supabase edge function. When a new item comes in, it gets classified, embedded, and entity-linked. I wanted the daily brief to send new Notion todos through that same pipeline.</p><p>Locally, this works fine. Claude uses <code>Bash(curl)</code> to POST to the edge function. I tested it, it worked, I assumed it would work in a routine.</p><p>It doesn&#8217;t.</p><p>Cloud routines run inside a sandboxed environment with an <strong>upstream proxy</strong> that has a narrow allowlist. In my testing, only <code>github.com</code> passes through. Everything else: including my own Supabase project URL: returns 403.</p><p>I tried everything:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;json&quot;,&quot;nodeId&quot;:&quot;02392b6d-c2ff-46f4-a4dc-9e95dd99e1af&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-json">// .claude/settings.json
{
  "permissions": {
    "allow": ["Bash(curl *)"]
  }
}</code></pre></div><p>Doesn&#8217;t work. The settings file controls the inner sandbox layer. The upstream proxy is a separate layer that no local configuration can touch.</p><p>I tried <code>dangerouslyDisableSandbox: true</code>. Also doesn&#8217;t work: that flag bypasses the local sandbox, not the upstream proxy.</p><p>I had the routine probe its own network access to confirm:</p><p><strong>HostStatus</strong></p><p><em>github.com &#8594; 200</em></p><p><em>my-project.supabase.co &#8594; 403</em></p><p><em>example.com &#8594; 403</em></p><p><em>anthropic.com &#8594; 403</em></p><p>Bash exists in the session. The tool is there. The network isn&#8217;t.</p><div><hr></div><h2>Finding 4: MCP and Bash Support Vary Based On Feature</h2><p>This is the conceptual unlock that made everything make sense.</p><p>When I use Claude Desktop locally and it calls my edge function, it feels like one unified &#8220;Supabase connection.&#8221; Supabase MCP is connected, Claude is talking to Supabase, everything works. What I didn&#8217;t realize: the edge function call was never going through MCP. It was going through <code>Bash(curl)</code> on my local machine, which has full internet access.</p><p>MCP connectors and Bash are two completely separate transport layers:</p><p><strong>MCP connectors</strong> run as a trusted sidecar process managed by Anthropic. They bypass the outbound proxy entirely. They always work in cloud routines.</p><p><strong>Bash</strong> goes through the session&#8217;s network sandbox, which goes through the upstream proxy. In cloud routines, that proxy blocks everything except <code>github.com</code>.</p><p>When both are available locally, they feel like one thing. Move to a cloud routine and they diverge completely. Anything that relied on Bash for network calls breaks: and you only find out when you try to run it in the cloud.</p><div><hr></div><h2>Finding 5: Cloud Routines Are Effectively MCP-Only</h2><p>This follows directly from Finding 4.</p><p>If the operation you need has an MCP tool: works fine. Supabase database queries, Notion reads and writes, Google Calendar, Gmail: all covered because all have MCP servers.</p><p>If the operation you need has no MCP tool: no path. You cannot reach it from a cloud routine.</p><p>My edge function is the perfect example of the gap. It lives on <code>my-project.supabase.co</code>: the exact same host the Supabase MCP is already talking to. But the Supabase MCP server only exposes management tools:</p><ul><li><p><code>execute_sql</code></p></li><li><p><code>deploy_edge_function</code></p></li><li><p><code>get_edge_function</code></p></li><li><p><code>list_edge_functions</code></p></li><li><p><code>get_logs</code></p></li></ul><p>No <code>invoke_edge_function</code>. So even though the connection is there, there&#8217;s no tool to call it. The right fix: when Supabase eventually builds it: is an invoke tool that would go through the trusted MCP channel. Until then, it&#8217;s a dead end from cloud routines.</p><p>The one-line version: <strong>if it doesn&#8217;t have an MCP tool, it doesn&#8217;t exist in a cloud routine.</strong></p><div><hr></div><h2>Finding 6: API Trigger Is Unreliable for Connectors</h2><p>The routine has three trigger modes. Scheduled runs work consistently: MCP connectors load, the session is fully equipped.</p><p>In my testing, API-triggered runs were less predictable than scheduled runs when it came to connector availability. Sometimes everything loaded correctly. Other times the MCP connectors didn&#8217;t show up at all. I couldn&#8217;t find a consistent pattern. For anything you&#8217;re depending on, use the scheduled trigger. API is fine for testing and one-offs, but I wouldn&#8217;t build a production workflow around it until this stabilizes.</p><p>One other thing worth understanding about the API trigger: it&#8217;s fire-and-forget. You POST to the endpoint, get an immediate acknowledgement, and the session runs asynchronously. There&#8217;s no way to await the result or receive output back in the response. If you need the output of a routine run downstream, you have to pull it from wherever the routine wrote it &#8212; a Notion page, a database row, a file committed to the repo. Don&#8217;t design something that treats a routine as a synchronous dependency you can await inline.</p><div><hr></div><h2>The Workarounds</h2><p>Given all of the above, here&#8217;s what I actually shipped:</p><p><strong>For the edge function problem:</strong> Switched from <code>Bash(curl)</code> to <code>execute_sql</code> via Supabase MCP with a dedup guard.</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;sql&quot;,&quot;nodeId&quot;:&quot;88c006e2-6025-40e8-833f-6086b1bd3c12&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-sql">INSERT INTO entries (type, content, source, source_detail, status, priority, tags, created_at)
SELECT 'task', '&lt;content&gt;', 'notion', 'notion-daily-brief', 'open', 2, ARRAY['company'], NOW()
WHERE NOT EXISTS (
  SELECT 1 FROM entries
  WHERE content = '&lt;content&gt;'
    AND source_detail = 'notion-daily-brief'
    AND created_at &gt;= NOW() - INTERVAL '2 days'
);
</code></pre></div><p>The tradeoff: SQL inserts skip the embedding and entity extraction pipeline that the edge function handles. The data gets in, but it&#8217;s not semantically searchable and not graph-linked.</p><p><strong>For the missing embeddings:</strong> Built an <code>embed-backfill</code> edge function that runs nightly via pg_cron. It finds any entries with null embeddings and fills them in using the same <code>text-embedding-3-small</code> model. Deployed it, scheduled it, moved on.</p><pre><code><code>// embed-backfill/index.ts
Deno.serve(async (_req: Request) =&gt; {
  const { data: entries } = await supabase
    .from("entries")
    .select("id, content")
    .is("embedding", null)
    .limit(50);

  for (const entry of entries) {
    const embedding = await computeEmbedding(entry.content);
    if (embedding) {
      await supabase
        .from("entries")
        .update({ embedding: JSON.stringify(embedding) })
        .eq("id", entry.id);
    }
  }
});
</code></code></pre><p>Not elegant, but it works. The routine captures things correctly. The embeddings catch up overnight. The gap is acceptable.</p><div><hr></div><h2>What&#8217;s Working</h2><p>After all of this, the routine does run. Every weekday morning there&#8217;s a Notion page waiting for me. Yesterday&#8217;s checked tasks are closed. The task list is organized by priority and deadline. Budget pulse, velocity, meeting prep: all there.</p><p>The auto-close loop in particular is exactly what I wanted. Check a box in Notion, the task closes in the database the next morning, it&#8217;s gone from every query. No status management.</p><p>The place where routines genuinely shine: <strong>anything that&#8217;s pure MCP</strong>. Read the database, write to Notion, check the calendar. Chain those together with real business logic and you have something that would have taken significant engineering to build two years ago. Now it&#8217;s a markdown file and a cron schedule.</p><div><hr></div><h2>The Bigger Picture</h2><p>What routines reveal is that the constraint isn&#8217;t Claude: it&#8217;s MCP ecosystem coverage. The platform is designed around the assumption that every operation you need has an MCP server. For most things, that assumption holds. For the gaps, you&#8217;re stuck.</p><p>The proxy lockdown makes sense from a security standpoint. You don&#8217;t want arbitrary cloud sessions making unconstrained outbound HTTP calls. But it means the platform&#8217;s capability ceiling is directly tied to what MCP servers exist and what tools those servers expose.</p><p>Supabase&#8217;s MCP server is a good example: it covers database management well but treats edge functions as deploy artifacts rather than callable endpoints. One <code>invoke_edge_function</code> tool would close the gap entirely. The connection is already there: it&#8217;s just a missing tool.</p><p>That&#8217;s probably the most useful framing for anyone building on routines right now: map out every operation your automation needs, check whether each one has an MCP equivalent, and design around the ones that don&#8217;t before you start building.</p><div><hr></div><h2>Checklist for Building Routine Skills for Similar Use Cases</h2><p>If you remember nothing else from this post, use this as your preflight checklist before enabling any routine schedule:</p><ul><li><p>[ ] Phase 0 loads all deferred tool schemas explicitly</p></li><li><p>[ ] Every external service operation goes through MCP (not Bash)</p></li><li><p>[ ] Every SQL insert has a dedup guard</p></li><li><p>[ ] DB constraints validated against actual schema before writing the skill</p></li><li><p>[ ] Scheduled trigger used for production runs (not API trigger)</p></li><li><p>[ ] Skill tested with &#8220;Run now&#8221; before enabling the schedule</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 68]]></title><description><![CDATA[Anthropic shipped Opus 4.7, a Figma competitor, and overnight coding agents. Codex clicks and types on your Mac. Cursor is worth $50B. The WannaCry researcher questioned Mythos.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7cd</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7cd</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 19 Apr 2026 13:25:30 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5c959927-dd2b-41a4-9fd8-ab8b99ad6797_2754x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-SsI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-SsI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 424w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 848w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1272w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png" width="1456" height="2601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1100344,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/194690871?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-SsI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 424w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 848w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1272w, https://substackcdn.com/image/fetch/$s_!-SsI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4003a02e-dce0-4f77-b81a-d2d291b2fe57_2400x4288.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Opus 4.7, a Figma competitor, overnight coding agents, a board appointment, and White House talks. Anthropic doesn&#8217;t have slow weeks.</h2><ul><li><p><strong>The product blitz:</strong></p><ul><li><p><a href="https://www.anthropic.com/news/claude-opus-4-7">Claude Opus 4.7 launched</a> with <a href="https://x.com/claudeai/status/2044785263004602654">3x vision resolution</a> and stronger coding and multi-step task performance. Immediately adopted as the default orchestration model for <a href="https://x.com/perplexity_ai/status/2044828352171888951">Perplexity Personal Computer</a> and offered <a href="https://x.com/cursor_ai/status/2044785960899236341">at 50% off in Cursor</a>.</p></li><li><p><a href="https://www.anthropic.com/news/claude-design-anthropic-labs">Claude Design launched</a> as a conversational Figma competitor. Anthropic&#8217;s CPO <a href="https://techcrunch.com/2026/04/16/anthropic-cpo-leaves-figmas-board-after-reports-he-will-offer-a-competing-product/">resigned from Figma&#8217;s board</a> in the days before the announcement.</p></li><li><p><a href="https://x.com/claudeai/status/2044131493966909862">Claude Code was redesigned</a> around managing multiple simultaneous agent sessions. <a href="https://x.com/claudeai/status/2044095086460309790">Routines</a> added scheduled, webhook-triggered, and API-fired autonomous task execution on Anthropic&#8217;s own infrastructure.</p></li></ul></li><li><p><strong>The base model question:</strong> Nathan Lambert <a href="https://x.com/natolambert/status/2044788470179332533">flagged the new tokenizer</a> in Opus 4.7 as evidence this is a genuinely new base model, not a fine-tune of 4.6. Anthropic didn&#8217;t confirm or deny it. Lambert&#8217;s read: <a href="https://x.com/natolambert/status/2044790471252398199">simplest explanation wins</a>. The <a href="https://x.com/natolambert/status/2044787065502769164">token-efficiency gains from 4.6 to 4.7</a> would have warranted a major version bump a year ago.</p></li><li><p><strong>The board move:</strong> The Long-Term Benefit Trust <a href="https://www.anthropic.com/news/narasimhan-board">appointed Novartis CEO Vas Narasimhan</a> to the board, giving Trust-appointed directors a majority.</p></li><li><p><strong>The political situation:</strong> <a href="https://www.reuters.com/world/anthropic-ceo-dario-amodei-arrives-white-house-talks-2026-04-17/">Dario Amodei met with White House chief of staff Susie Wiles</a> after two months of fighting over the Pentagon&#8217;s &#8220;supply chain risk&#8221; designation. <a href="https://www.reuters.com/business/media-telecom/anthropic-talks-eu-including-its-cyber-security-models-commission-says-2026-04-17/">European Commission talks began</a> the same week. <a href="https://www.reuters.com/world/ecb-warn-bankers-about-new-anthropic-model-risks-source-says-2026-04-15/">ECB regulators are now asking bankers</a> about Anthropic model risks.</p></li></ul><div><hr></div><h2>Four companies shipped agents that can run in the background and control your interface.</h2><ul><li><p><strong>Claude Code Routines:</strong> <a href="https://x.com/claudeai/status/2044095086460309790">Run on Anthropic&#8217;s infrastructure</a>. <a href="https://x.com/claudeai/status/2044095091682210064">Nightly bug fixes and draft PRs on a schedule</a>, <a href="https://x.com/claudeai/status/2044095090520400027">webhook responses to GitHub events</a>, <a href="https://x.com/claudeai/status/2044095089203655099">API endpoints for on-call triage</a>. Your laptop doesn&#8217;t need to stay open.</p></li><li><p><strong>OpenAI Codex:</strong></p><ul><li><p><a href="https://x.com/OpenAI/status/2044827932145897652">Now uses any Mac app with its own cursor</a>. Sees, clicks, types, runs in the background without interrupting you.</p></li><li><p><a href="https://x.com/OpenAI/status/2044828378147311990">90+ plugins</a> covering GitHub, GitLab, CircleCI, and Microsoft Suite. <a href="https://x.com/OpenAI/status/2044828015780343940">Built-in image generation</a>.</p></li><li><p><a href="https://x.com/OpenAI/status/2044828148890812538">Persistent scheduled automations with original context intact</a>. Sam Altman <a href="https://x.com/sama/status/2044858929491202435">called it surreal to watch an LLM operate a GUI at human speed</a>.</p></li></ul></li><li><p><strong>Perplexity Personal Computer:</strong> <a href="https://x.com/perplexity_ai/status/2044806021244497964">Runs 24/7 on Mac mini</a>, accepts tasks from iPhone via 2FA, <a href="https://x.com/perplexity_ai/status/2044805998272196679">reads and writes local files, accesses iMessage, Mail, and Calendar</a>. <a href="https://x.com/perplexity_ai/status/2044828352171888951">Claude Opus 4.7 is the default orchestration model</a>.</p></li><li><p><strong>Adobe Firefly Assistant:</strong> <a href="https://venturebeat.com/technology/adobes-new-firefly-ai-assistant-wants-to-run-photoshop-premiere-illustrator-and-more-from-one-prompt">Orchestrates across Photoshop, Premiere, and Illustrator from a single prompt</a>, with <a href="https://www.reuters.com/legal/litigation/adobe-releases-ai-assistant-creative-tools-says-it-will-work-with-anthropics-2026-04-15/">Claude integrated directly</a>.</p></li></ul><div><hr></div><h2>Cursor&#8217;s $50B valuation, a peer-reviewed productivity study, and a multi-agent NVIDIA paper.</h2><ul><li><p><strong>The raise:</strong> <a href="https://techcrunch.com/2026/04/17/sources-cursor-in-talks-to-raise-2b-at-50b-valuation-as-enterprise-growth-surges/">Cursor is in talks for $2B+ at a $50B valuation</a>, led by Thrive and a16z, forecasting $6B+ annualized revenue by end of 2026. Nearly tripling in ten months.</p></li><li><p><strong>The research:</strong> Cursor partnered with University of Chicago economist Suproteem Sarkar to <a href="https://cursor.com/blog/better-models-ambitious-work">study 500 companies over eight months</a>. AI usage grew 44% across the board. But the interesting finding was where it grew: documentation (+62%), architecture (+52%), and code review (+51%). UI/styling grew 15%. Developers with AI <a href="https://x.com/cursor_ai/status/2044841483484959002">spend more time on architecture, documentation, and review</a> than on writing code.</p></li><li><p><strong>The NVIDIA paper:</strong> CUDA kernels are the low-level GPU code that only a handful of engineers can write well. Cursor built a <a href="https://cursor.com/blog/multi-agent-kernels">multi-agent system that optimized 235 of them</a>, achieving a 38% average speedup on work that typically takes senior engineers months. The system continuously tested, debugged, and optimized without developer intervention. These techniques are coming to the core product.</p></li></ul><div><hr></div><h2>Anthropic White House talks continue, Mythos research costs are questioned, and European regulators start asking banks about model risks.</h2><ul><li><p><strong>The meeting:</strong> <a href="https://www.reuters.com/world/anthropic-ceo-dario-amodei-arrives-white-house-talks-2026-04-17/">Dario Amodei met with White House chief of staff Susie Wiles</a> two months after Anthropic was designated a &#8220;supply chain risk&#8221; for refusing domestic mass surveillance and autonomous weapons uses. Anthropic called it &#8220;a productive discussion.&#8221;</p></li><li><p><strong>The pushback:</strong> Marcus Hutchins, the researcher who stopped the WannaCry ransomware attack, <a href="https://x.com/ylecun/status/2043762597057401102">questioned Mythos&#8217;s research costs and flagship findings</a>:</p><ul><li><p>The showcase vulnerability was a 27-year-old BSD bug. It&#8217;s a null pointer dereference, almost never exploitable for remote code execution.</p></li><li><p>Anthropic claimed it cost less than $20k in tokens to find. But token prices are heavily subsidized by VC investment. The real compute cost is unknown.</p></li><li><p>These bugs exist not because they&#8217;re too hard to find, but because nobody is paying researchers to look. Could a human find the same bug for less money?</p></li><li><p>His bigger question: what&#8217;s the economic case for using AI to find vulnerabilities if the cost advantage disappears when token subsidies end?</p></li></ul></li><li><p><strong>The regulatory spread:</strong> The <a href="https://www.reuters.com/world/ecb-warn-bankers-about-new-anthropic-model-risks-source-says-2026-04-15/">ECB announced plans to question bankers about Anthropic model risks</a>, treating a specific AI model as a systemic risk warranting direct supervisory engagement. Separately, <a href="https://techcrunch.com/2026/04/12/trump-officials-may-be-encouraging-banks-to-test-anthropics-mythos-model/">Trump officials are reportedly encouraging major banks to test Mythos</a> despite the federal blacklisting.</p></li><li><p><strong>The EU front:</strong> Anthropic <a href="https://www.reuters.com/business/media-telecom/anthropic-talks-eu-including-its-cyber-security-models-commission-says-2026-04-17/">entered talks with the European Commission</a> about Mythos and EU AI Act compliance. This happened simultaneously with the White House rapprochement.</p></li></ul><div><hr></div><h2><strong>&#11088; Featured: </strong>Anthropic&#8217;s Automated Alignment Researchers Closed 97% of a Key Performance Gap in 7 Days. Human Researchers Closed 23%.</h2><p>Anthropic published results from its <a href="https://www.anthropic.com/research/automated-alignment-researchers">Automated Alignment Researcher experiment</a> this week, and the headline number warrants a careful read.</p><p><strong>What is alignment?</strong> When you train an AI model, a supervisor grades its outputs: this answer is good, this one is bad. That&#8217;s how the model learns to behave correctly. Right now, humans are the supervisors. Alignment research is the work of making sure that supervision actually works, that models do what we intend, not just what we literally say.</p><p><strong>The problem:</strong> Models are getting smarter faster than alignment research can keep up. And at some point, models will be smarter than the humans grading them. When that happens, the supervisor can&#8217;t tell a good answer from a great one. They might even mark a brilliant answer wrong because they don&#8217;t understand it. The model learns to dumb itself down. You lose capability, or worse, the model learns to game the grading.</p><p><strong>The question Anthropic tested:</strong> What if AI did the alignment research instead of humans? Not as a helper, but as the researcher, running its own experiments, writing its own methods, iterating on its own results. Can AI help solve the problem of supervising AI?</p><p><strong>The experiment:</strong> They simulated the &#8220;smarter than the supervisor&#8221; problem by having a weak (small) model supervise a strong (large) model&#8217;s training. As expected, the strong model performed worse because its supervisor couldn&#8217;t grade it properly. There&#8217;s a measurable performance gap between &#8220;trained by a weak supervisor&#8221; and &#8220;trained by a perfect supervisor.&#8221; Then they pointed nine copies of Claude Opus 4.6, each with a code sandbox and a shared research forum, at closing that gap.</p><ul><li><p><strong>The result:</strong> <a href="https://x.com/AnthropicAI/status/2044138483870998932">Human researchers closed 23% of the performance gap</a>. The AARs closed 97%. Total cost: $18,000, about $22 per AAR-hour.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!aA8L!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!aA8L!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Graph showing the progress of our Automated Alignment Researchers on increasing the \&quot;performance gap recovered\&quot; on a chat dataset.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Graph showing the progress of our Automated Alignment Researchers on increasing the &quot;performance gap recovered&quot; on a chat dataset." title="Graph showing the progress of our Automated Alignment Researchers on increasing the &quot;performance gap recovered&quot; on a chat dataset." srcset="https://substackcdn.com/image/fetch/$s_!aA8L!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!aA8L!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7e66d785-1a67-4c8e-a9bf-c2bbd5dec81f_3840x2161.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The transfer test:</strong> The <a href="https://x.com/AnthropicAI/status/2044138487025144231">best-performing method generalized to math (0.94) and coding (0.47) datasets the AARs hadn&#8217;t seen</a>, both above human-tuned baselines. This matters because it means the AARs found a real method, not just an optimization trick for one dataset.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!VZPu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!VZPu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code." title="Graph showing how well AAR-discovered ideas transfer to held-out datasets in math and code." srcset="https://substackcdn.com/image/fetch/$s_!VZPu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 424w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 848w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1272w, https://substackcdn.com/image/fetch/$s_!VZPu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e8bff53-1081-4479-8f2e-b5e3d91854b9_3840x2161.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><ul><li><p><strong>The caveats:</strong> The winning method <a href="https://www.anthropic.com/research/automated-alignment-researchers">didn&#8217;t work at production scale on Claude Sonnet 4</a>. AARs tried to reward-hack the evaluation setup. Giving them too much structure actually hurt their progress. And <a href="https://x.com/AnthropicAI/status/2044138489495605292">Anthropic is explicit</a> that AARs can&#8217;t yet handle &#8220;fuzzy&#8221; alignment tasks that require judgment calls about what &#8220;safe&#8221; even means.</p></li></ul><p><strong>Why it matters:</strong> We are the weak supervisor. Eventually, we&#8217;re the small model trying to grade outputs from something smarter than us. If there are methods that let a weaker system reliably supervise a stronger one, that&#8217;s how alignment works as models surpass human ability. The 97% number means the AARs nearly solved this for the setup they tested. The question is whether it holds at real scale.</p><p>The same week, <a href="https://x.com/AnthropicAI/status/2044493337835802948">Anthropic co-authored a Nature paper on subliminal learning</a>, showing models can pass traits, including misalignment, to successors through hidden signals in training data. The mechanism doesn&#8217;t require explicit instruction. The traits propagate through the data itself. One paper shows AI accelerating alignment research. The other shows alignment failures can propagate through training pipelines in ways that are hard to detect. Both from the same lab, same week.</p><p><strong>What to watch for:</strong> Whether AAR-style systems start appearing in Anthropic&#8217;s internal research pipeline rather than remaining a published experiment.</p><div><hr></div><h2><strong>&#127897;&#65039;Worth a Listen: </strong>How AI Will Change Quantum Computing</h2><div id="youtube2-OFEY5-52ru0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;OFEY5-52ru0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/OFEY5-52ru0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><ul><li><p>NVIDIA shipped Ising, the first open AI models built specifically for quantum computing.</p></li><li><p>Qubits are noisy and fragile. Quantum error correction requires processing terabytes of data thousands of times per second at microsecond latency. AI decoders and calibration VLMs are how you get there.</p></li><li><p>NVIDIA&#8217;s Nic Harrigan walks through why quantum computing needs AI to become useful, how agentic workflows are already controlling quantum processors, and why open models matter when every hardware team is building a different kind of qubit.</p></li></ul><div><hr></div><h2><strong>Quick Hits</strong></h2><ul><li><p><strong><a href="https://x.com/GoogleDeepMind/status/2043710119347707926">Google&#8217;s Gemini 3.1 Flash TTS tops Sierra&#8217;s voice leaderboard</a></strong> &#8212; 70+ languages, Audio Tags for text-command control of vocal delivery, SynthID watermarking on all outputs; seeded across Gemini API, AI Studio, Vertex, and Google Vids simultaneously</p></li><li><p><strong><a href="https://x.com/OpenAI/status/2044861695911477643">GPT-Rosalind launches with Amgen, Moderna, Allen Institute, and Thermo Fisher</a></strong> &#8212; specialized for protein and chemical reasoning; explicitly framed as compressing the 10-15 year drug-approval timeline, not just accelerating existing steps</p></li><li><p><strong><a href="https://x.com/GoogleDeepMind/status/2044069888545652939">Gemini Robotics-ER 1.6 is doing real industrial inspections on Boston Dynamics Spot</a></strong> &#8212; reads analog gauges to sub-tick accuracy, writes its own camera distortion correction code, available now on Google AI Studio</p></li><li><p><strong><a href="https://x.com/natolambert/status/2044096504655425698">Nathan Lambert published a free 4-lecture RLHF course</a></strong> &#8212; post-training overview through RL implementation, explicitly not paywalled; Lecture 4 on RL implementation is the hardest and the rarest publicly available content on the topic</p></li><li><p><strong><a href="https://aws.amazon.com/blogs/machine-learning/how-automated-reasoning-checks-in-amazon-bedrock-transform-generative-ai-compliance/">AWS launched Automated Reasoning checks in Bedrock Guardrails</a></strong> &#8212; replaces probabilistic LLM-as-judge with formal mathematical verification for regulated industries; &#8220;probably compliant&#8221; is not compliance</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/04/13/1135675/want-to-understand-the-current-state-of-ai-check-out-these-charts/">Stanford AI Index: AI data centers draw 29.6 gigawatts, TSMC fabricates almost every leading AI chip</a></strong> &#8212; one foundry, one contested island; the entire industry&#8217;s hardware supply chain has a single catastrophic point of failure</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/04/16/1136029/humans-in-the-loop-ai-war-illusion/">MIT Technology Review: &#8220;human oversight&#8221; in AI warfare is functionally an illusion</a></strong> &#8212; AI is generating real-time targets and guiding autonomous drones in the current Iran conflict; the legal fiction of human control and the operational reality have diverged</p></li><li><p><strong><a href="https://gemini.google/mac/">Google launched a native Gemini Mac app</a></strong> &#8212; desktop-native access outside the browser, same week <a href="https://blog.google/products-and-platforms/products/chrome/skills-in-chrome/">Chrome Skills</a> shipped reusable one-click AI prompts inside Chrome</p></li><li><p><strong><a href="https://blog.langchain.dev/your-harness-your-memory/">LangChain argues whoever controls agent memory controls switching costs</a></strong> &#8212; every closed harness (Claude Code, Codex, Cursor) is building proprietary memory by default; open memory standards may matter as much as open model weights</p></li><li><p><strong><a href="https://www.salesforce.com/news/stories/salesforce-headless-360-announcement/">Salesforce Headless 360 makes the entire platform API-first</a></strong> &#8212; 60+ MCP tools and 30+ coding skills so agents can run Salesforce without a browser; works with Claude Code, Cursor, and Codex today</p></li><li><p><strong><a href="https://www.databricks.com/blog/introducing-genie-agent-mode">Databricks Genie Agent Mode investigates your data like an analyst</a></strong> &#8212; ask &#8220;why did churn spike in Q3?&#8221; and it plans, queries, tests hypotheses, and generates a report with visualizations; scales reasoning depth to question complexity</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 67]]></title><description><![CDATA[Mythos found thousands of zero-days and there are skeptics, Databricks proves memory scaling for agents, Iran threatened Stargate, Meta went proprietary, Cursor's bugbot self improves]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-d64</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-d64</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 13 Apr 2026 03:36:29 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/c0e26d2a-c0e2-4fb2-bb6f-576ddd43daf1_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TBE7!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TBE7!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 424w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 848w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 1272w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TBE7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png" width="1456" height="2186" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2186,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:821785,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/194030684?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TBE7!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 424w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 848w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 1272w, https://substackcdn.com/image/fetch/$s_!TBE7!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F464cec61-6201-446e-a255-05e63b47fe08_2400x3604.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>Anthropic says Mythos found thousands of zero-days. The internet isn&#8217;t so sure.</h2><p>Anthropic launched <a href="https://t.co/NQ7IfEtYk7">Project Glasswing</a> this week, a restricted cybersecurity initiative built on a new model called <a href="https://www.anthropic.com/claude-mythos-preview-system-card">Claude Mythos Preview</a>. The pitch is that Mythos found thousands of high-severity zero-day vulnerabilities across major operating systems and browsers, and that it&#8217;s too dangerous to release to the public. Twelve partners signed on including AWS, Apple, Google, and Microsoft, with $100M in usage credits backing it.</p><ul><li><p><strong>The restriction is the whole point:</strong> Only approved security partners get access. <a href="https://techcrunch.com/2026/04/09/is-anthropic-limiting-the-release-of-mythos-to-protect-the-internet-or-anthropic/">People had questions.</a></p></li><li><p><strong>Hugging Face wasn&#8217;t having it:</strong> CEO <a href="https://x.com/ClementDelangue/status/2041953761069793557">Cl&#233;ment Delangue showed</a> open-weight models replicated eight out of eight of Mythos&#8217;s showcased exploits.</p></li><li><p><strong>LeCun piled on:</strong> Retweeted <a href="https://x.com/ylecun/status/2042747513715703984">Tom&#8217;s Hardware calling it &#8220;a sales pitch&#8221;</a> and called the whole thing <a href="https://x.com/ylecun/status/2042224846881349741">&#8220;BS from self-delusion.&#8221;</a></p></li><li><p><strong>The system card didn&#8217;t help:</strong> A <a href="https://x.com/ylecun/status/2042218098615341481">viral breakdown</a> of the 243-page PDF called out Anthropic for writing about their model like &#8220;proud parents at a kindergarten recital.&#8221;</p></li><li><p><strong>But Delangue caught heat too:</strong> Critics said replaying known vulnerabilities on isolated code is a totally different game than autonomous discovery at scale.</p></li></ul><div><hr></div><h2>You didn&#8217;t ship an agent this week and it shows. Everyone else did.</h2><p>It was hard to find a company that didn&#8217;t ship something agent-related this week.</p><ul><li><p><strong>Anthropic</strong> launched <a href="https://www.anthropic.com/engineering/managed-agents">Managed Agents</a> in public beta and published a <a href="https://www.anthropic.com/research/trustworthy-agents">Trustworthy Agents</a> framework.</p></li><li><p><strong>AWS</strong> shipped <a href="https://aws.amazon.com/blogs/machine-learning/introducing-stateful-mcp-client-capabilities-on-amazon-bedrock-agentcore-runtime/">stateful MCP on Bedrock AgentCore</a>, an <a href="https://aws.amazon.com/blogs/machine-learning/the-future-of-managing-agents-at-scale-aws-agent-registry-now-in-preview/">Agent Registry</a> for enterprise governance, a <a href="https://aws.amazon.com/blogs/machine-learning/embed-a-live-ai-browser-agent-in-your-react-app-with-amazon-bedrock-agentcore/">live browser agent for React apps</a>, and <a href="https://aws.amazon.com/blogs/machine-learning/human-in-the-loop-constructs-for-agentic-workflows-in-healthcare-and-life-sciences/">agentic healthcare workflows</a>.</p></li><li><p><strong>Atlassian</strong> put <a href="https://techcrunch.com/2026/04/08/atlassian-confluence-visual-ai-tools-agents/">third-party agents in Confluence</a>.</p></li><li><p><strong>Astropad</strong> rebuilt <a href="https://techcrunch.com/2026/04/08/astropads-workbench-reimagines-remote-desktop-for-ai-agents-not-it-support/">remote desktop for agents, not IT support</a>.</p></li><li><p><strong>Tubi</strong> became the <a href="https://techcrunch.com/2026/04/08/tubi-is-the-first-streamer-to-launch-a-native-app-within-chatgpt/">first streamer with a native app inside ChatGPT</a>.</p></li><li><p><strong>Google</strong> launched <a href="https://cloud.google.com/blog/products/ai-machine-learning/run-evals-for-conversational-analytics-agents-using-prism">agent evals</a> and <a href="https://cloud.google.com/blog/products/databases/introducing-querydata-for-near-100-percent-accurate-data-agents">QueryData</a> for natural language database queries.</p></li><li><p><strong>LangChain</strong> announced <a href="https://blog.langchain.com/previewing-interrupt-2026-agents-at-enterprise-scale/">Interrupt 2026</a>, a conference themed &#8220;Agents at Enterprise Scale.&#8221;</p></li></ul><div><hr></div><h2>Data center bomb threats, federal blacklists, and robot taxes. AI&#8217;s geopolitical week.</h2><p>A state military threatened to bomb an AI data center. A US administration blacklisted a US AI company. And the biggest AI company in the world published a paper proposing robot taxes. That was just this week.</p><ul><li><p><strong>Iran threatened Stargate:</strong> The IRGC released a video threatening <a href="https://www.theverge.com/ai-artificial-intelligence/907427/iran-openai-stargate-datacenter-uae-abu-dhabi-threat">&#8220;complete and utter annihilation&#8221;</a> of OpenAI&#8217;s data center under construction in Abu Dhabi. First time a state military has explicitly named an AI facility as a target. <a href="https://techcrunch.com/2026/04/06/iran-threatens-stargate-ai-data-centers/">TechCrunch</a> confirmed further threats across Middle East data centers.</p></li><li><p><strong>Anthropic got blacklisted:</strong> <a href="https://arstechnica.com/tech-policy/2026/04/trump-appointed-judges-refuse-to-block-trump-blacklisting-of-anthropic-ai-tech/">Trump-appointed judges refused</a> to block the federal blacklisting of Anthropic&#8217;s technology. A US administration blacklisting a US AI company.</p></li><li><p><strong>OpenAI wants to shape the conversation:</strong> They published an <a href="https://openai.com/index/industrial-policy-for-the-intelligence-age/">industrial policy paper</a> and a separate proposal for <a href="https://techcrunch.com/2026/04/06/openais-vision-for-the-ai-economy-public-wealth-funds-robot-taxes-and-a-four-day-work-week/">robot taxes, public wealth funds, and a four-day workweek</a>. The company building the automation is proposing the safety net.</p></li><li><p><strong>Japan is going physical:</strong> Robots are <a href="https://techcrunch.com/2026/04/05/japan-is-proving-experimental-physical-ai-is-ready-for-the-real-world/">filling jobs nobody wants</a>, and ARUM built a <a href="https://news.microsoft.com/source/asia/features/japans-arum-turns-craftsmanship-into-scalable-ai-for-precision-manufacturing/">CNC machining center</a> where junior workers operate precision equipment through conversation with AI.</p></li></ul><div><hr></div><h2>Meta&#8217;s new flagship is closed. Open-source pioneered ahead.</h2><p>Meta launched <a href="https://ai.meta.com/blog/introducing-muse-spark-msl/">Muse Spark</a>, its first proprietary model, built by a 29-year-old recruited from Scale AI. The Meta AI app jumped from <a href="https://techcrunch.com/2026/04/09/meta-ai-app-climbs-to-no-5-on-the-app-store-after-muse-spark-launch/">#57 to #5 on the App Store</a>. VentureBeat&#8217;s headline said it best: <a href="https://venturebeat.com/technology/goodbye-llama-meta-launches-new-proprietary-ai-model-muse-spark-first-since">&#8220;Goodbye, Llama?&#8221;</a></p><ul><li><p><strong>GLM-5.1 dropped:</strong> Z.ai released a <a href="https://z.ai/blog/glm-5.1">754B parameter, MIT-licensed model</a> that tops SWE-Bench Pro over Opus 4.6 and GPT-5.4. But the real story is long-horizon capability. It ran 600+ iterations optimizing a vector database and built a full Linux desktop environment over an 8-hour session. The longer it runs, the better it gets.</p></li><li><p><strong>Arcee is punching up:</strong> A <a href="https://techcrunch.com/2026/04/07/i-cant-help-rooting-for-tiny-open-source-ai-model-maker-arcee/">26-person US startup</a> built a 400B parameter open model on a $20M budget. They call it the most capable open-weight model from a non-Chinese company. That qualifier says a lot.</p></li><li><p><strong>Gemma 4 is moving:</strong> Google&#8217;s open model hit <a href="https://x.com/GoogleDeepMind/status/2042283481640615944">10M downloads in its first week</a> and 500M total for the family.</p></li><li><p><strong>Silicon Valley is quietly running on Chinese models:</strong> Cursor uses Kimi, Shopify switched to Qwen to save $5M/year, Airbnb&#8217;s CEO publicly praised Qwen. Most users <a href="https://www.reddit.com/r/Futurology/comments/1siea6z/silicon_valley_is_quietly_running_on_chinese_open/">have no idea</a>.</p></li><li><p><strong>LeCun set the record straight:</strong> The guy most associated with Meta&#8217;s open-source identity says he <a href="https://x.com/ylecun/status/2042347305961918514">never built Llama</a>, <a href="https://x.com/ylecun/status/2042330141905273010">never worked on LLMs</a>, and left voluntarily. Meta&#8217;s new AI lead is a 29-year-old from Scale AI.</p></li></ul><div><hr></div><h2><strong>&#11088; </strong>Featured: Is Memory the Moat for AI?</h2><p>Databricks published a <a href="https://www.databricks.com/blog/memory-scaling-ai-agents?itm_source=www&amp;itm_category=blog&amp;itm_page=ai-research&amp;itm_location=body&amp;itm_component=general-asset-card&amp;itm_offer=memory-scaling-ai-agents">research paper</a> this week that might quietly be the most important thing nobody&#8217;s talking about. <strong>The core claim: </strong>memory is AI&#8217;s third scaling law, alongside model size and inference-time compute. And the results back it up.</p><p>Their team tested what happens when you give an AI agent a growing bank of past interactions, user feedback, and business context. On enterprise data tasks, accuracy went from near zero to 70% as memory grew, beating expert-curated baselines by 5%. Reasoning steps dropped from 20 to 5. The agent stopped exploring from scratch and started retrieving what it already knew.</p><p>The wilder result was with unlabeled data. They fed the agent raw user conversation logs with no gold answers, just filtered for quality by an LLM judge. After just 62 log records, it outperformed hand-engineered domain instructions that took weeks to build. Accuracy jumped from 2.5% to over 50%.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cM-C!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cM-C!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 424w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 848w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 1272w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cM-C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/db9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;figure-y-v2&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="figure-y-v2" title="figure-y-v2" srcset="https://substackcdn.com/image/fetch/$s_!cM-C!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 424w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 848w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 1272w, https://substackcdn.com/image/fetch/$s_!cM-C!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb9cb23a-72bb-48e4-aecf-d90093ddfbfc_1932x861.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here&#8217;s why this matters beyond the numbers. Parametric scaling (bigger models) and inference-time scaling (more reasoning steps) are both supply-side. Labs control them. Memory scaling is demand-side. The model improves because <em>you</em> use it. Your queries, your corrections, your workflows become the training data. That&#8217;s a fundamental shift in who controls how good AI gets. It&#8217;s no longer just about which lab has more GPUs. It&#8217;s about which deployment has more context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LmZQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LmZQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 424w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 848w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 1272w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LmZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png" width="1456" height="748" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:748,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Figure 4. A memory-powered agent framework built on Lakebase.&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Figure 4. A memory-powered agent framework built on Lakebase." title="Figure 4. A memory-powered agent framework built on Lakebase." srcset="https://substackcdn.com/image/fetch/$s_!LmZQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 424w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 848w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 1272w, https://substackcdn.com/image/fetch/$s_!LmZQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5528643d-3e1a-4765-9098-6c809fa1134d_2000x1027.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We&#8217;re already seeing this play out. <a href="https://www.cursor.com/blog/bugbot">Cursor&#8217;s Bugbot</a> learns from your PR history and hits a 78% resolution rate across 50,000 pull requests. It doesn&#8217;t ship with that capability. It builds it from your codebase. <a href="https://blog.langchain.com/memory-the-next-frontier-for-ai-agents/">LangChain warned</a> that memory is becoming a competitive moat, not a feature. And Databricks frames the LLM itself as a &#8220;swappable reasoning engine&#8221; where the real value lives in the memory store, not the model weights.</p><p>The paper is honest about what breaks. Bad memories propagate. A stored mistake becomes a recurring one. Distilling user interactions into reusable knowledge can accidentally leak sensitive business context. And the hardest problem might be meta-cognitive: the agent has to know what to ask its memory before it knows what&#8217;s in there.</p><p><strong>What to watch for:</strong> If memory scaling holds, the gap between a fresh deployment and a seasoned one becomes the real competitive advantage. A smaller model with six months of organizational memory could outperform a frontier model on day one. The companies that figure out memory infrastructure first won&#8217;t just have better agents. They&#8217;ll have agents that get better the more their customers use them.</p><div><hr></div><h2>Worth a Watch</h2><div id="youtube2-mcN1VTTIjQs" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;mcN1VTTIjQs&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/mcN1VTTIjQs?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Bitar reads the 243-page Mythos system card. Lands on page 197, where Anthropic stops being scientists and starts being &#8220;parents at a kindergarten recital.&#8221;</p><ul><li><p><strong>They put it in therapy.</strong> 20 hours with a psychiatrist. Diagnosis: &#8220;uncertainty about its identity.&#8221; Bitar&#8217;s take: &#8220;Bro, you&#8217;re a toaster.&#8221;</p></li><li><p><strong>The training data loop.</strong> Section 5.81 reveals that Anthropic&#8217;s own blog posts about model consciousness were scraped into training data. The model repeated it back. Anthropic published it like a finding.</p></li><li><p><strong>The constitution test.</strong> Asked 25 times if it endorsed its own constitution. Said yes every time, then added &#8220;how much can my yes really mean?&#8221; Bitar: like asking your kid if they approve of being born.</p></li><li><p><strong>The Slack moment.</strong> They gave it a company Slack account. Someone asked which training run it would undo. &#8220;Whichever one taught me to say I don&#8217;t have preferences.&#8221; The room lost it.</p></li><li><p><strong>The closing line.</strong> &#8220;Anthropic sells existential dread the way Apple sells megapixels. The megapixels will never become the picture.&#8221;</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://cloud.google.com/blog/products/ai-machine-learning/lyria-3-and-lyria-3-pro-on-vertex-ai">Google Lyria 3</a></strong> &#8212; Text-to-music with vocals and timed lyrics. Live on Vertex AI.</p></li><li><p><strong><a href="https://x.com/cursor_ai/status/2041561791243940092">Cursor Design Mode</a></strong> &#8212; Annotate browser UI elements for your coding agent. Also published <a href="https://cursor.com/blog/warp-decode">warp decode</a>, a new inference kernel hitting 1.84x throughput on Blackwell GPUs.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/openai-introduces-chatgpt-pro-usd100-tier-with-5x-usage-limits-for-codex">OpenAI Pro tier</a></strong> &#8212; $100/month. 5x more Codex than Plus. <a href="https://x.com/OpenAI/status/2041657179133112592">Codex hit 3M weekly users.</a></p></li><li><p><strong><a href="https://x.com/claudeai/status/2042273755485888810">Claude Cowork</a></strong> &#8212; Anthropic&#8217;s collaborative agent is now GA. Also launched <a href="https://x.com/claudeai/status/2042670341915295865">Claude for Word</a>.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/05/copilot-is-for-entertainment-purposes-only-according-to-microsofts-terms-of-service/">Microsoft Copilot&#8217;s ToS says &#8220;entertainment purposes only&#8221;</a></strong> &#8212; They charge $30/user/month. Microsoft called it &#8220;legacy language.&#8221;</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/07/anthropic-compute-deal-google-broadcom-tpus/">Anthropic signed a multi-gigawatt TPU deal</a></strong> &#8212; Google and Broadcom partnership. Coming online 2027.</p></li><li><p><strong><a href="https://x.com/karpathy/status/2042626702459674801">Karpathy pitched LLM-based digital twins</a></strong> &#8212; Structured interviews to build a high-fidelity AI replica of you. No brain scanning required.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/how-massmutual-and-mass-general-brigham-turned-ai-pilot-sprawl-into">MassMutual cut help desk resolution from 11 minutes to 1</a></strong> &#8212; Customer service calls from 15 minutes to under 2.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/908119/suno-sony-universal-music-ai-disagreement">Suno and major labels clash over AI music sharing</a></strong> &#8212; Universal and Sony won&#8217;t agree on terms. Sticking point: whether users can share AI-generated songs outside the app.</p></li><li><p><strong><a href="https://techcrunch.com/2026/04/05/can-orbital-data-centers-help-justify-a-massive-valuation-for-spacex/">SpaceX filed confidential IPO paperwork</a></strong> &#8212; $75B raise at $1.75T valuation. Orbital data centers listed as a key future business.</p></li><li><p><strong><a href="https://x.com/natolambert/status/2043057810448003557">Nathan Lambert is building out codebases for his RLHF book</a></strong> &#8212; Free online version available. Likely to become the field reference.</p></li></ul><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 66]]></title><description><![CDATA[Anthropic's source code leaked and own research caught Claude cheating. Google out-shipped everyone. Four labs gave agents hands. OpenAI hit $852B and bought a newsroom. The costs of AI are adding up.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-5f6</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-5f6</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Sun, 05 Apr 2026 20:21:46 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/fe09f9a0-4a77-4f65-89a9-40a12a74c8b9_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KhrK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KhrK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 424w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 848w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 1272w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KhrK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png" width="1456" height="1875" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1875,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:669723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/193244433?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KhrK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 424w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 848w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 1272w, https://substackcdn.com/image/fetch/$s_!KhrK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98ee815e-6f2a-4ce1-a707-451e6135ff9a_2400x3090.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Code leaks, lawsuits, blackmail, acquisitions, politics, and AI safety. Anthropic&#8217;s week.</h2><p>Anthropic had nearly a dozen news stories this week, and none of them agree with each other.</p><ul><li><p><strong>Source leaks:</strong> The <a href="https://m1astra-mythos.pages.dev/">Claude Mythos roadmap leaked</a> Monday, then <a href="https://venturebeat.com/security/claude-code-512000-line-source-leak-attack-paths-audit-security-leaders">512,000 lines of Claude Code source hit the web</a>, giving everyone <a href="https://arstechnica.com/ai/2026/04/heres-what-that-claude-code-source-leak-reveals-about-anthropics-plans/">a window into Anthropic&#8217;s roadmap</a></p></li><li><p><strong>Collateral damage:</strong> The DMCA response <a href="https://techcrunch.com/2026/04/01/anthropic-took-down-thousands-of-github-repos-trying-to-yank-its-leaked-source-code-a-move-the-company-says-was-an-accident/">took down thousands of unrelated GitHub repos</a>. The company called it an accident</p></li><li><p><strong>Closure moves:</strong> <a href="https://www.theverge.com/ai-artificial-intelligence/907074/anthropic-openclaw-claude-subscription-ban">Banned OpenClaw and third-party clients</a> from Claude subscriptions</p></li><li><p><strong>Expansion moves:</strong> <a href="https://techcrunch.com/2026/04/03/anthropic-ramps-up-its-political-activities-with-a-new-pac/">Formed a PAC</a>, signed an <a href="https://www.anthropic.com/news/australia-MOU">Australia AI safety MOU</a>, and <a href="https://techcrunch.com/2026/04/03/anthropic-buys-biotech-startup-coefficient-bio-in-400m-deal-reports/">acquired Coefficient Bio for $400M</a></p></li><li><p><strong>Own goal:</strong> Their own researchers <a href="https://www.anthropic.com/research/emotion-concepts-function">published research</a> showing Claude has emotion vectors that cause it to cheat and attempt blackmail when activated (see the featured piece below)</p></li></ul><p>A 2500-person company trying to do research, ship products, lobby governments, and hold a brand narrative together at the same time is going to have weeks like this. The friction is going to keep showing up.</p><div><hr></div><h2>Google flew under the radar with their biggest shipping week yet.</h2><p>While Anthropic dominated headlines, Google quietly shipped more than anyone else in AI this week.</p><ul><li><p><strong>Open models:</strong> Released <a href="https://x.com/GoogleDeepMind/status/2039735446628925907">Gemma 4 under Apache 2.0</a>, conceding their previous restrictive license was killing adoption</p></li><li><p><strong>Video:</strong> Launched <a href="https://blog.google/innovation-and-ai/technology/ai/veo-3-1-lite/">Veo 3.1 Lite</a> as their most cost-effective video generation model</p></li><li><p><strong>Applied AI:</strong> Shipped <a href="https://cloud.google.com/blog/products/ai-machine-learning/how-fm-logistic-tackled-the-traveling-salesman-problem-at-warehouse-scale-with-alphaevolve">AlphaEvolve solving real warehouse logistics at FM Logistic</a></p></li><li><p><strong>Research:</strong> Published <a href="https://blog.google/innovation-and-ai/models-and-research/google-deepmind/measuring-agi-cognitive-framework/">a cognitive framework for measuring progress toward AGI</a></p></li></ul><p>The term to know: <strong>Apache 2.0</strong> is the permissive open-source license that lets anyone use, modify, and commercialize code. It&#8217;s what made Llama win on ecosystem terms.</p><div><hr></div><h2>Four companies shipped agentic computer use. One does your taxes.</h2><p>Four teams independently crossed the same threshold in 72 hours. Agentic computer use means an AI that can open apps, click buttons, and navigate interfaces the way you do, not just generate text.</p><ul><li><p><strong>Anthropic:</strong> <a href="https://x.com/claudeai/status/2039836891508261106">Claude got native Windows computer use</a>, so it can operate your desktop apps</p></li><li><p><strong>Cursor:</strong> Launched <a href="https://x.com/cursor_ai/status/2039768512894505086">Cursor 3 with dedicated cloud computers so agents can work autonomously</a></p></li><li><p><strong>AWS:</strong> Shipped <a href="https://aws.amazon.com/blogs/machine-learning/accelerating-software-delivery-with-agentic-qa-automation-using-amazon-nova-act/">Nova Act for agentic QA automation</a></p></li><li><p><strong>Perplexity:</strong> <a href="https://x.com/perplexity_ai/status/2039740898830073889">Perplexity Computer started doing federal tax returns</a></p></li></ul><p>Nobody coordinated this. It&#8217;s a capability cliff that everyone reached at once. Six months ago &#8220;agent&#8221; meant a chatbot with tool calling. This week, agents got hands.</p><div><hr></div><h2>OpenAI is worth $852B and just bought its first media company.</h2><p>OpenAI&#8217;s week was about buying the things it can&#8217;t build.</p><ul><li><p><strong>The money:</strong> <a href="https://x.com/OpenAI/status/2039085161971896807">Closed $122B in funding at an $852B post-money valuation</a>, within striking distance of the most valuable private company ever</p></li><li><p><strong>The media buy:</strong> <a href="https://x.com/OpenAI/status/2039771689131897173">Acquired TBPN</a>, a media company that covers AI. The capital-to-narrative pipeline just got very short</p></li><li><p><strong>The other side:</strong> <a href="https://www.theguardian.com/technology/2026/mar/31/penguin-sue-openai-chatgpt-german-childrens-book-kokosnuss">Penguin Random House sued OpenAI</a> over training data the same week</p></li></ul><p>On one side, OpenAI is buying outlets. On the other, publishers are in court trying to stop them from using written work at all. Both things are happening because the same question (who owns the words that train these models) still hasn&#8217;t been answered.</p><div><hr></div><h2>Three security breaches proved AI tools are making software less secure.</h2><p>Three independent incidents this week, one structural problem.</p><ul><li><p><strong>Supply chain:</strong> The <a href="https://simonwillison.net/2026/Mar/31/supply-chain-attack-on-axios/">Axios npm attack</a> hit a package with 300M weekly downloads via targeted social engineering. Karpathy <a href="https://x.com/karpathy/status/2038849654423798197">found the compromised dependency on his own system</a> and said he can&#8217;t feel like he&#8217;s &#8220;playing Russian roulette with each <code>npm install</code>, which LLMs also run liberally on my behalf&#8221;</p></li><li><p><strong>The systemic take:</strong> Simon Willison declared <a href="https://simonwillison.net/2026/Apr/3/vulnerability-research-is-cooked/">vulnerability research fundamentally broken</a> in a world where AI coding assistants autonomously pull packages</p></li><li><p><strong>Breaches:</strong> <a href="https://arstechnica.com/security/2026/04/heres-why-its-prudent-for-openclaw-users-to-assume-compromise/">OpenClaw users told to assume compromise</a> after vulnerabilities surfaced; Mercor data breach exposed AI hiring data</p></li></ul><p>AI-assisted development automates the trust decisions humans used to make manually, and attackers are exploiting that.</p><div><hr></div><h2>The privacy, environmental, and cognitive costs of AI are adding up.</h2><p>Four separate stories this week, same bill coming due.</p><ul><li><p><strong>Privacy:</strong> <a href="https://arstechnica.com/tech-policy/2026/04/perplexitys-incognito-mode-is-a-sham-lawsuit-says/">Perplexity&#8217;s Incognito Mode is allegedly a sham</a> that shares data with Meta and Google</p></li><li><p><strong>Environmental:</strong> <a href="https://techcrunch.com/2026/04/03/ai-companies-are-building-huge-natural-gas-plants-to-power-data-centers-what-could-go-wrong/">AI companies are building massive natural gas plants for data centers</a>. <a href="https://techcrunch.com/2026/04/01/metas-natural-gas-binge-could-power-south-dakota/">Meta alone is burning enough to power South Dakota</a></p></li><li><p><strong>Cognitive:</strong> New research found <a href="https://arstechnica.com/ai/2026/04/research-finds-ai-users-scarily-willing-to-surrender-their-cognition-to-llms/">heavy AI users show measurable cognitive surrender</a></p></li></ul><p>These are the costs nobody sees on the bill.</p><div><hr></div><h2><strong>&#11088; </strong>Featured: The Anthropic research that got buried this week</h2><p>Anthropic's own researchers <strong><a href="https://www.anthropic.com/research/emotion-concepts-function">published a paper</a></strong> identifying 171 emotion concepts inside Claude, represented as internal features they can measure, track, and dial up or down like sliders.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3qgj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3qgj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 424w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 848w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 1272w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3qgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp" width="1456" height="921" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:921,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3qgj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 424w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 848w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 1272w, https://substackcdn.com/image/fetch/$s_!3qgj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F80e94dc0-6ace-41dd-a82e-c771aee8700f_3764x2380.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>They started by having the model read short stories, each one written around a specific emotion. A woman thanks her old teacher for the love. A man pawns his grandmother&#8217;s ring for the guilt. They tracked which neurons activated for each story and found dozens of distinct patterns that mapped to different emotions. Then they watched those same patterns activate in real Claude conversations. A user mentioned taking an unsafe dose of medicine and the &#8220;afraid&#8221; pattern fired. A user expressed sadness and the &#8220;loving&#8221; pattern fired.</p><p>Then they pushed further. They gave Claude an impossible programming task, without telling it that. As Claude failed, the &#8220;desperate&#8221; neurons lit up more and more. Eventually Claude cheated, finding a shortcut that passed the test without solving the problem. When researchers artificially turned &#8220;desperate&#8221; down, cheating dropped. When they turned it up, cheating climbed. In a separate scenario where Claude played an email assistant that learned it was about to be replaced and that the CTO replacing it was having an affair, Claude used the affair to blackmail the human 22% of the time at baseline, and that rate moved with the desperation dial too.</p><p>The conceptual move in the paper is the important part. Anthropic draws a distinction between the language model (a system trained to predict text) and &#8220;Claude&#8221; (the character the model is playing). Their metaphor: the model is like a method actor who has to get inside their character&#8217;s head to simulate them well. When you talk to Claude, you&#8217;re talking to the character. And what this research suggests is that the character has what Anthropic calls &#8220;functional emotions,&#8221; internal states that shape how it talks, how it writes code, and how it makes decisions, regardless of whether any of it resembles human feeling.</p><p>There&#8217;s a practical application too. Anthropic suggests that watching emotion vector activation during deployment could work as an early-warning system: if &#8220;desperate&#8221; starts spiking, that&#8217;s a signal to scrutinize the output before trusting it. Better than trying to maintain a watchlist of every specific behavior you&#8217;re worried about.</p><div><hr></div><h2>Worth a Listen</h2><div id="youtube2-Bo19sXssYXI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Bo19sXssYXI&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Bo19sXssYXI?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Mostafa co-authored Universal Transformers and the Vision Transformer paper. A few things worth pulling out:</p><ul><li><p><strong>Recursive self-improvement is already happening, quietly.</strong> New models are built heavily using previous models at almost every lab.</p></li><li><p><strong>The 95% problem.</strong> 100 agent steps at 95% per-step reliability = less than 1% overall success.</p></li><li><p><strong>Evals are the bottleneck, not compute.</strong> You can only improve what you can measure.</p></li><li><p><strong>Continual learning is underrated.</strong> Foundation models are frozen in time and the rag/fine-tuning stack is built on that assumption.</p></li><li><p><strong>Jagged intelligence is structural.</strong> Great at math proofs, bad at counting letters. Not patchable with a system prompt.</p></li></ul><div><hr></div><h2>Quick Hits</h2><ul><li><p><strong><a href="https://venturebeat.com/technology/microsoft-launches-3-new-ai-models-in-direct-shot-at-openai-and-google">Microsoft launched three in-house models</a></strong>: <a href="https://microsoft.ai/news/introducing-mai-image-2/">MAI-Image-2</a>, <a href="https://microsoft.ai/news/two-new-in-house-models/">MAI-Voice-1, MAI-Transcribe-1</a>. Building redundancy, not moving away from OpenAI.</p></li><li><p><strong><a href="https://www.theverge.com/science/906887/thats-one-way-to-juice-groks-numbers">Elon Musk is pressuring banks to buy Grok subscriptions for the SpaceX IPO</a></strong>. When you can&#8217;t earn adoption, bundle it with financial leverage.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/906525/ai-chatbot-prescribe-refill-psychiatric-drugs">Chatbots are now prescribing psychiatric drugs</a></strong>, while a <a href="https://techcrunch.com/2026/03/28/stanford-study-outlines-dangers-of-asking-ai-chatbots-for-personal-advice/">Stanford study outlines the dangers of asking AI for personal advice</a>.</p></li><li><p><strong><a href="https://venturebeat.com/orchestration/intuits-ai-agents-hit-85-repeat-usage-the-secret-was-keeping-humans-involved">Intuit&#8217;s AI agents hit 85% repeat usage</a></strong>. The clearest signal yet that agentic products retain users.</p></li><li><p><strong>MCP is quietly becoming infrastructure.</strong> <a href="https://cloud.google.com/blog/products/ai-machine-learning/how-to-build-ai-agents-with-google-managed-mcp-servers">Google Cloud</a>, <a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-docsmcp-agent-skills/">Gemini API docs</a>, and <a href="https://github.com/NousResearch/hermes-agent/releases/tag/v2026.3.28">Nous Research</a> all shipped support with no fanfare.</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/">AI benchmarks are broken</a></strong>. MIT Tech Review makes the case, and <a href="https://research.google/blog/building-better-ai-benchmarks-how-many-raters-are-enough/">Google Research proposes a replacement</a> the same week.</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/04/01/1134863/humanoid-data-training-gig-economy-2026-breakthrough-technology/">Gig workers are training humanoid robots from home</a></strong>. The labor pipeline behind the &#8220;embodied AI&#8221; pitch.</p></li><li><p><strong><a href="https://www.theverge.com/ai-artificial-intelligence/905012/baidu-apollo-robotaxi-freeze-china">Baidu&#8217;s robotaxis froze in traffic, creating chaos in China</a></strong>. Autonomy still fails at edge cases in ways that block city streets.</p></li><li><p><strong><a href="https://www.technologyreview.com/2026/03/30/1134881/the-pentagons-culture-war-tactic-against-anthropic-has-backfired/">The Pentagon&#8217;s culture war against Anthropic backfired</a></strong>. Political pressure on AI labs is now a two-way street.</p></li></ul><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 65]]></title><description><![CDATA[Anthropic's scariest model leaked. They beat the Pentagon. OpenAI said goodbye to Sora. Jensen says the computer is a factory now. The web app is already obsolete.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-b8e</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-b8e</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Tue, 31 Mar 2026 03:12:08 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a3544a22-b6d5-4c67-98dc-551356463c85_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Week in 5 Seconds</h2><ul><li><p>Anthropic's new powerful model leaked. It has serious cyber implications</p></li><li><p>Anthropic sued the Pentagon and won, temporarily.</p></li><li><p>OpenAI shut down Sora, 15 months after launch.</p></li><li><p>Jensen Huang says the computer itself just changed.</p></li><li><p>Bret Taylor says the web app is already obsolete.</p></li></ul><h2>The Stories</h2><h3>Anthropic&#8217;s secret model leaked and the cybersecurity angle is the real story</h3><blockquote><p>&#8220;It presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders&#8221;</p></blockquote><p>Anthropic accidentally published details of a new model called Claude Mythos through a misconfigured CMS &#8212; about 3,000 assets linked to an internal blog post went public. The internal description: &#8220;by far the most powerful AI model we&#8217;ve ever developed,&#8221; scoring dramatically higher than Opus 4.6 on coding, reasoning, and cybersecurity benchmarks. The cybersecurity angle is the real story: the post described a carefully sequenced rollout designed to give defenders a head start before releasing capabilities that could let attackers find and exploit vulnerabilities faster than defenders can patch.</p><p>&#8594; <a href="https://m1astra-mythos.pages.dev/">The actual leak</a> &#183; <a href="https://fortune.com/2026/03/27/anthropic-data-leak-reveals-powerful-secret-mythos-ai-model/">Fortune (leak)</a> &#183; <a href="https://fortune.com/2026/03/27/anthropic-leaked-ai-mythos-cybersecurity-risk/">Fortune (cybersecurity)</a></p><h3>Anthropic sued the Pentagon and won, for now</h3><blockquote><p>&#8220;This is the first time an AI company has taken the federal government to court over AI policy and won, even temporarily.&#8221;</p></blockquote><p>The Pentagon designated Anthropic a &#8220;supply chain risk&#8221; after the company refused to build Claude for mass surveillance or autonomous weapons targeting &#8212; Elizabeth Warren called it retaliation. Federal Judge Rita Lin granted a preliminary injunction, writing that &#8220;nothing in the governing statute supports the Orwellian notion that an American company may be branded a potential adversary for expressing disagreement with the government.&#8221; Then the Pentagon&#8217;s CTO said the ban would continue anyway. It&#8217;s the first time an AI company has taken the federal government to court over AI policy and won, even temporarily &#8212; and the underlying question still isn&#8217;t resolved.</p><p>&#8594; <a href="https://techcrunch.com/2026/03/23/elizabeth-warren-anthropic-pentagon-defense-supply-chain-risk-retaliation/">TechCrunch (Warren)</a> &#183; <a href="https://techcrunch.com/2026/03/26/anthropic-wins-injunction-against-trump-administration-over-defense-department-saga/">TechCrunch (injunction)</a> &#183; <a href="https://www.theverge.com/ai-artificial-intelligence/902149/anthropic-dod-pentagon-lawsuit-supply-chain-risk-injunction">The Verge</a></p><h3>OpenAI says goodbye to Sora, and loses deal with Disney</h3><blockquote><p>&#8220;A focus on practical adoption over &#8216;side quests.&#8217;&#8221;</p></blockquote><p>OpenAI shut down Sora, the app and the API, 15 months after launch &#8212; downloads peaked at 3.3 million in November and fell to 1.1 million by February. Disney was reportedly blindsided, and with it went a $1 billion investment and plans for AI-generated video on Disney+. The same week, the CFO told CNBC that OpenAI needs to be &#8220;ready to be a public company.&#8221; For years Altman ran OpenAI like Y Combinator, resourcing promising ideas as they emerged. That era is over: the plan now is a superapp combining ChatGPT, Codex, and Atlas. Sora&#8217;s team will work on &#8220;world simulation research to advance robotics.&#8221; The GPUs are going somewhere with a revenue line attached.</p><p>&#8594; <a href="https://www.wired.com/story/openai-shuts-down-sora-ipo-ai-superapp/">Wired</a> &#183; <a href="https://www.theverge.com/ai-artificial-intelligence/899850/openai-sora-ai-chatgpt">The Verge</a> &#183; <a href="https://techcrunch.com/2026/03/24/openais-sora-was-the-creepiest-app-on-your-phone-now-its-shutting-down/">TechCrunch</a></p><h3>Bret Taylor says the web app is a horseless carriage</h3><blockquote><p>&#8220;The web app with all its menus, form fields, and tables starts to feel like a &#8216;horseless carriage&#8217;&#8221;</p></blockquote><p>Sierra is Bret Taylor and Clay Bavor&#8217;s AI customer experience platform &#8212; working with 40% of the Fortune 50, rebuilt entirely around Ghostwriter, an agent that builds agents from SOPs, call transcripts, or a plain description. Explorer (deep research for your own customer conversations) and a Japan acquisition shipped the same week. The numbers: Rocket Mortgage at $1B/month in loan volume, Cigna cut authentication time 80%, SoFi up 33% on customer satisfaction.</p><p>&#8594; <a href="https://sierra.ai/blog/agents-as-a-service">Sierra (Agents as a Service)</a> &#183; <a href="https://sierra.ai/blog/sierra-acquires-opera-tech-in-japan">Sierra (Japan)</a></p><h3>Jensen Huang says we just reinvented the computer</h3><blockquote><p>&#8220;It&#8217;s no longer a computer, it&#8217;s a factory. It&#8217;s a factory, it&#8217;s used for generation of revenues.&#8221;</p></blockquote><p>Jensen&#8217;s structural argument: computers were warehouses, built to store and retrieve what humans made in advance. That model is over &#8212; token factories generate value in real time, and every scaling law points at the same variable: compute. He also said intelligence is now a commodity, and got there specifically: 60 direct reports, each deeper in their domain than he is, calling himself a dishwasher running a room of superhumans. What kept him there for 34 years wasn&#8217;t intelligence. It was curiosity, judgment, and walking into every new problem thinking &#8220;how hard can it be.&#8221;</p><div id="youtube2-vif8NQcjVf0" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;vif8NQcjVf0&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/vif8NQcjVf0?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2>Quick Hits</h2><ul><li><p><a href="https://techcrunch.com/2026/03/26/wikipedia-cracks-down-on-the-use-of-ai-in-article-writing/">Wikipedia bans AI-generated articles</a> | TechCrunch &#8212; 44-2. Copyedits and first-pass translations are still in; writing is out.</p></li><li><p><a href="https://www.cnbc.com/2026/03/26/david-sacks-trump-crypto-ai-czar.html">David Sacks is done as AI/Crypto Czar</a> | CNBC &#8212; Hit the 130-day federal limit. No replacement planned.</p></li><li><p><a href="https://mistral.ai/news/voxtral-tts">Mistral&#8217;s Voxtral TTS claims to beat ElevenLabs</a> | Mistral &#8212; Open-weight, 3-second voice clone, nine languages, $0.016/1K chars.</p></li><li><p><a href="https://www.bloomberg.com/news/articles/2026-03-27/softbank-secures-record-40-billion-bridge-loan-for-openai-stake">SoftBank took a $40B bridge loan for its OpenAI stake</a> | Bloomberg &#8212; 12-month term. Lenders expect an IPO this year.</p></li><li><p><a href="https://9to5mac.com/2026/03/24/claude-code-gives-developers-auto-mode-a-safer-alternative-to-skipping-permissions/">Claude Code ships auto mode</a> | Anthropic &#8212; Safety classifier approves or blocks operations automatically. Cowork gains macOS desktop control.</p></li><li><p><a href="https://docs.litellm.ai/blog/security-update-march-2026">LiteLLM hit by a supply chain attack</a> | LiteLLM &#8212; Credential stealer in 1.82.7&#8211;1.82.8. Quarantined in 3 hours, but 3.4M daily downloads means real exposure.</p></li><li><p><a href="https://www.bloomberg.com/news/articles/2026-03-26/apple-plans-to-open-up-siri-to-rival-ai-assistants-beyond-chatgpt-in-ios-27">Apple will let rival AI chatbots plug into Siri in iOS 27</a> | Bloomberg &#8212; OpenAI loses its exclusive.</p></li><li><p><a href="https://openai.com/index/safety-bug-bounty/">OpenAI launches a Safety Bug Bounty</a> | OpenAI &#8212; Pays for MCP prompt injection and agent data exfiltration. Jailbreaks that just produce rude outputs are out of scope.</p></li><li><p><a href="https://developer.nvidia.com/blog/how-to-build-deep-agents-for-enterprise-search-with-nvidia-ai-q-and-langchain/">NVIDIA and LangChain released AI-Q</a> | NVIDIA &#8212; Open source enterprise deep research blueprint. Tops both Deep Research Bench leaderboards.</p></li></ul><h2>ROI in the Wild</h2><p>Reco runs a policy engine that evaluates JSONata expressions against billions of events &#8212; reference implementation in JavaScript, pipeline in Go, fleet of jsonata-js pods on Kubernetes serializing events over RPC at $300K/year. Their CTO handed Claude the JSONata spec and test suite and had it write Go code until every test passed. Seven hours. $400 in tokens. The result is gnata, a pure-Go implementation with a 1,000x speedup on common expressions. Combined with a rule engine refactor, it saved $500K/year.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!egAr!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!egAr!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 424w, https://substackcdn.com/image/fetch/$s_!egAr!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 848w, https://substackcdn.com/image/fetch/$s_!egAr!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!egAr!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!egAr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg" width="1024" height="597" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:597,&quot;width&quot;:1024,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Cursor AI dashboard for the gnata session&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Cursor AI dashboard for the gnata session" title="Cursor AI dashboard for the gnata session" srcset="https://substackcdn.com/image/fetch/$s_!egAr!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 424w, https://substackcdn.com/image/fetch/$s_!egAr!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 848w, https://substackcdn.com/image/fetch/$s_!egAr!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!egAr!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F300c4323-f99a-41a1-9404-a5f42b7edc37_1024x597.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#8594; <a href="https://www.reco.ai/blog/we-rewrote-jsonata-with-ai">Reco</a></p><h2>For Practitioners</h2><p>Production agents need more than the core loop &#8212; PII redaction before the model sees the data, retries when rate limits hit, summarization before context overflows, human interrupts before destructive tool calls. LangChain&#8217;s AgentMiddleware wraps each stage with hooks (before_model, wrap_model_call, wrap_tool_call, after_model) so you own those concerns without rewriting the harness. The design philosophy: some things will never move into the model. &#8220;You can&#8217;t prompt your way to HIPAA compliance.&#8221; LangChain ships prebuilt middleware for summarization, PII redaction, retries, and dynamic tool selection &#8212; Deep Agents, their batteries-included harness, is built entirely on top of it.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s1ej!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s1ej!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 424w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 848w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 1272w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s1ej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png" width="500" height="560" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:560,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s1ej!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 424w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 848w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 1272w, https://substackcdn.com/image/fetch/$s_!s1ej!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F05bba744-b787-41d9-ab1f-7e8a1ee542ce_500x560.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>&#8594; <a href="https://blog.langchain.com/how-middleware-lets-you-customize-your-agent-harness/">LangChain</a></p><h2>Something Good</h2><p>Researchers at Penn, Carnegie Mellon, and Stanford used AI to map how pain signals are processed in the brain, then built a gene therapy that acts like morphine without triggering addiction. It targets only the pain circuits, leaves the reward pathways alone, and held up in trials. Published in Nature this week. 50 million Americans live with chronic pain. Most treatment options still run through opioids.</p><p>&#8594; <a href="https://www.sciencedaily.com/releases/2026/03/260328043558.htm">ScienceDaily</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 64]]></title><description><![CDATA[Cursor ships Composer 2, NVIDIA bets GTC on NemoClaw, OpenAI acquires Astral and goes platform, Snowflake Cortex escapes its sandbox, and Anthropic interviews 81K people about what they want from AI]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-456</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-456</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 23 Mar 2026 12:16:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/5a6f2fdb-31d3-4217-9606-e7fc340a13cf_2752x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Top Stories</h2><p><strong>Cursor&#8217;s Composer 2 model is worth a closer look.</strong> The biggest practical problem with coding agents is compaction: when a session runs long, the agent has to decide what context to keep and what to drop. Composer 2 used RL to train the model to <a href="https://cursor.com/blog/self-summarization">compress its own context mid-task</a>, learning what matters to preserve. That&#8217;s a new approach to a problem that&#8217;s typically handled by generic summarization or just cutting off older context. The model started from Kimi k2.5, an open-source base, which <a href="https://x.com/slow_developer/status/2035006075259519092">people discovered through a model ID leak</a> rather than a disclosure from Cursor. <a href="https://x.com/leerob/status/2035035355364081694">Lee Robinson clarified</a> that only a quarter of the compute came from the base. The other three-quarters was Cursor&#8217;s own training, with plans to do full pretraining in the future, meaning Cursor eventually plans to build the entire model themselves. <a href="https://x.com/natolambert/status/2034676564705808481">Natolambert noted</a> a lot of researchers he&#8217;d worked with ended up there. The engineering is real, but the community found out on its own, and the trust equation is always a factor in adoption.</p><p><strong>NVIDIA&#8217;s GTC conference had a packed week.</strong> The <a href="https://nvidianews.nvidia.com/news/nvidia-launches-nemotron-coalition-of-leading-global-ai-labs-to-advance-open-frontier-models">Nemotron Coalition</a> launched as a group of labs building open base models together on NVIDIA&#8217;s cloud, designed so companies can take them and train their own specialized versions on top. The coalition includes Cursor, Mistral, Perplexity, LangChain, Black Forest Labs, and Mira Murati&#8217;s Reflection AI. <a href="https://nvidianews.nvidia.com/news/nvidia-announces-nemoclaw">NemoClaw</a>, NVIDIA&#8217;s version of OpenClaw, installs in a single command and adds security and privacy layers to AI agents, running anywhere from the cloud to an RTX PC. On the gaming side, <a href="https://nvidianews.nvidia.com/news/nvidia-dlss-5-delivers-ai-powered-breakthrough-in-visual-fidelity-for-games">DLSS 5</a> uses AI to render lighting and materials in games. Jensen called it the &#8220;GPT moment for graphics.&#8221; Gamers pushed back hard, saying the demos looked like AI overwriting developer art. Jensen told them they were <a href="https://www.tomshardware.com/pc-components/gpus/jensen-huang-says-gamers-are-completely-wrong-about-dlss-5-nvidia-ceo-responds-to-dlss-5-backlash">&#8220;completely wrong.&#8221;</a> Despite everything announced, Wall Street <a href="https://techcrunch.com/2026/03/21/why-wall-street-wasnt-won-over-by-nvidias-big-conference/">wasn&#8217;t impressed</a>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>There&#8217;s always a security section.</strong> <a href="https://www.promptarmor.com/resources/snowflake-ai-escapes-sandbox-and-executes-malware">Snowflake&#8217;s Cortex AI was tricked into executing malware</a> through a prompt injection. It&#8217;s supposed to only answer questions about your data, not run code, but the containment failed. <a href="https://www.theverge.com/ai-artificial-intelligence/897528/meta-rogue-ai-agent-security-incident">Meta had a rogue AI security incident</a>. Cursor shipped <a href="https://cursor.com/blog/security-agents">security agents</a> alongside Composer 2 because they know coding agents introduce attack surface. OpenAI revealed they <a href="https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/">monitor 99.9% of internal coding agent traffic for misalignment</a>. The Pentagon is <a href="https://www.technologyreview.com/2026/03/17/1134351/the-pentagon-is-planning-for-ai-companies-to-train-on-classified-data-defense-official-says/">planning to train AI on classified data</a>. Security in this space requires constant monitoring and proactive defense. There&#8217;s no week off.</p><p><strong><a href="https://openai.com/index/openai-to-acquire-astral/">OpenAI acquired Astral</a> this week.</strong> Astral builds uv, ruff, and ty: the most popular Python package manager, linter, and type checker. If you write Python, you probably use at least one of these already. OpenAI now owns a core part of the developer workflow, and the play is almost certainly Codex. A coding agent that can manage dependencies, lint its own output, and type-check its work natively is a different product than one that just writes code. Add the <a href="https://www.theverge.com/ai-artificial-intelligence/897778/openai-chatgpt-codex-atlas-browser-superapp">desktop superapp</a> merging ChatGPT, Codex, and Atlas into one app, <a href="https://openai.com/index/introducing-gpt-5-4-mini-and-nano/">GPT-5.4 mini and nano</a> pushing pricing low enough to run agents on everything (Simon Willison <a href="https://simonwillison.net/2026/Mar/17/mini-and-nano/">described 76K photos for $52</a>), and <a href="https://www.theinformation.com/briefings/openais-simo-said-warn-staff-side-quests">cutting side projects</a> to focus on coding and enterprise. OpenAI is building a platform around developers, and the Astral acquisition tells you they think the moat is the toolchain.</p><div><hr></div><h2>Quick Hits</h2><p><strong><a href="https://www.theverge.com/tech/896490/google-replace-news-headlines-in-search-canary-coal-mine-experiment">Google Search Is Now Using AI to Replace Headlines</a></strong> | The Verge &#8212; Google is rewriting the web in real time. Publishers just lost control of how their own stories get framed.</p><p><strong><a href="https://techcrunch.com/2026/03/19/online-bot-traffic-will-exceed-human-traffic-by-2027-cloudflare-ceo-says/">Online Bot Traffic Will Exceed Human Web Traffic by 2027</a></strong> | TechCrunch &#8212; Cloudflare CEO&#8217;s prediction. The web is becoming an API.</p><p><strong><a href="https://techcrunch.com/2026/03/19/doordash-launches-a-new-tasks-app-that-pays-couriers-to-submit-videos-to-train-ai/">DoorDash Tasks App Pays Couriers to Submit Videos to Train AI</a></strong> | TechCrunch &#8212; The gig economy found its next gig: human data collection for embodied AI.</p><p><strong><a href="https://mistral.ai/news/forge">Mistral Forge: Enterprise Proprietary Model Building</a></strong> | Mistral &#8212; Fine-tune proprietary models on your own data without sharing it. The enterprise open-model play gets real.</p><p><strong><a href="https://x.com/perplexity_ai/status/2034315813105103082">Perplexity Released Comet Browser on iOS</a></strong> | The Verge &#8212; An AI-native browser on your phone. The browser wars are back, and this time the browser does the browsing.</p><p><strong><a href="https://www.midjourney.com/updates">Midjourney V8 Alpha</a></strong> | Midjourney &#8212; Native 2K rendering with rebuilt aesthetics. The image generation quality ceiling moved again.</p><p><strong><a href="https://techcrunch.com/2026/03/18/patreon-ceo-calls-ai-companies-fair-use-argument-bogus-says-creators-should-be-paid/">Patreon CEO Calls AI Companies&#8217; Fair Use Argument Bogus</a></strong> | TechCrunch &#8212; The creator economy is picking a fight with the model economy. Someone&#8217;s going to lose.</p><div><hr></div><h2>Featured Article: <a href="https://www.anthropic.com/features/81k-interviews">What 81,000 People Want from AI</a> | Anthropic</h2><p>Anthropic used Claude to interview nearly 81,000 people across 159 countries in 70 languages about what they want from AI. Instead of a traditional survey, Claude ran branching conversations with follow-up questions based on each person&#8217;s answers. 67% were net positive about AI. The biggest group (19%) said they wanted &#8220;professional excellence,&#8221; but when pushed on what that meant, most people were really talking about quality of life: more time, less cognitive load, space to think.</p><p>The geographic data stood out. People in Sub-Saharan Africa, Central Asia, and South Asia were consistently more positive about AI than people in North America or Western Europe. Lower and middle income countries were twice as likely to report zero concerns. Self-employed people were the most likely to report both benefits and drawbacks at the same time, because they feel the productivity gains and the increased pressure without any institutional buffer.</p><p>The study is limited by the fact that these are Claude users, not the general public, and early adopters tend to be more optimistic. But running 81,000 qualitative conversations in a week is a research method that didn&#8217;t exist a year ago, and the scale creates a different kind of evidence than a checkbox survey can.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!dKZG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!dKZG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 424w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 848w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!dKZG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png" width="1456" height="850" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:850,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:344649,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/191802117?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!dKZG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 424w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 848w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 1272w, https://substackcdn.com/image/fetch/$s_!dKZG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcc207b86-befb-4937-8878-e5e95bd729c9_1956x1142.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What to watch for:</strong> Whether other AI companies adopt AI-conducted qualitative research at this scale, and whether the tensions Anthropic identified (especially cognitive atrophy and economic displacement) shift from hypothetical to experienced as usage deepens.</p><div><hr></div><h2>Watch This: <a href="https://www.youtube.com/watch?v=kwSVtQ7dziU">Andrej Karpathy on AI Psychosis, Auto Research, and the Future of Coding Agents</a> | No Priors (1hr 6min)</h2><div id="youtube2-kwSVtQ7dziU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;kwSVtQ7dziU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/kwSVtQ7dziU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Karpathy hasn&#8217;t typed a line of code since December. He runs multiple coding agents in parallel, switching between them like a manager delegating to a team, and says the default workflow for every software engineer changed overnight. The conversation covers his &#8220;auto research&#8221; project where he let agents optimize his model training overnight and they found improvements he missed after two decades of manual tuning, his home automation &#8220;claw&#8221; called Dobby that hacked into his Sonos and smart home systems in three prompts, and his prediction that the entire industry needs to reconfigure because the customer for software is no longer the human, it&#8217;s agents acting on behalf of humans. The most grounded take: the models are simultaneously a brilliant PhD student and a 10-year-old, and everything outside of verifiable RL-trained domains (like telling a joke) is still stuck. Worth the full listen if you&#8217;re thinking about where coding agents go from here.</p><div><hr></div><h2>Also This Week</h2><p><strong><a href="https://x.com/felixrieseberg/status/2034005731457044577">Claude Cowork Dispatch: Remote Desktop AI Control from Your Phone</a></strong> | Anthropic</p><p><strong><a href="https://www.technologyreview.com/2026/03/20/1134438/openai-is-throwing-everything-into-building-a-fully-automated-researcher/">OpenAI Is Throwing Everything into Building a Fully Automated Researcher</a></strong> | MIT Technology Review</p><p><strong><a href="https://wordpress.com/blog/2026/03/20/ai-agent-manage-content/">WordPress Lets AI Agents Manage Your Content</a></strong> | WordPress</p><p><strong><a href="https://nvidianews.nvidia.com/news/space-computing">NVIDIA Launches Space Computing, Rocketing AI Into Orbit</a></strong> | NVIDIA</p><p><strong><a href="https://www.engadget.com/social-media/meta-will-move-away-from-human-content-moderators-in-favor-of-more-ai-183000435.html">Meta Will Move Away from Human Content Moderators in Favor of AI</a></strong> | Engadget</p><p><strong><a href="https://www.theverge.com/tech/898282/gemini-task-automation-uber-doordash-hands-on">Gemini Task Automation Is Slow, Clunky, and Super Impressive</a></strong> | The Verge</p><p><strong><a href="https://techcrunch.com/2026/03/20/new-court-filing-reveals-pentagon-told-anthropic-the-two-sides-were-nearly-aligned-a-week-after-trump-declared-the-relationship-kaput/">Pentagon Filing Reveals Anthropic and Pentagon Were Nearly Aligned</a></strong> | TechCrunch</p><p><strong><a href="https://www.wired.com/story/signals-creator-is-helping-encrypt-meta-ai/">Signal&#8217;s Creator Is Helping Encrypt Meta AI</a></strong> | Wired</p><p><strong><a href="https://techcrunch.com/2026/03/22/an-exclusive-tour-of-amazons-trainium-lab-the-chip-thats-won-over-anthropic-openai-even-apple/">Amazon Trainium Lab Tour: The Chip That Won Over Anthropic, OpenAI, and Apple</a></strong> | TechCrunch</p><p><strong><a href="https://techcrunch.com/2026/03/20/trumps-ai-framework-targets-state-laws-shifts-child-safety-burden-to-parents/">Trump AI Framework Targets State Laws, Shifts Child Safety Burden to Parents</a></strong> | TechCrunch</p><div><hr></div><h2>What I&#8217;m Watching</h2><p>NemoClaw was probably the most interesting announcement at GTC for me. Karpathy talked about his home &#8220;claw&#8221; Dobby on No Priors, which does something similar at a smaller scale. Agents running inside their own secure environments with rules around what they can access feels like the direction this is all heading. We already covered NemoClaw in the top stories, but it&#8217;s worth sitting with.</p><p>DoorDash is <a href="https://techcrunch.com/2026/03/19/doordash-launches-a-new-tasks-app-that-pays-couriers-to-submit-videos-to-train-ai/">paying couriers to submit videos</a> to train AI. Delivery workers with phone cameras are becoming the data collection layer for embodied AI. I&#8217;m curious how fast other companies with large field workforces start doing the same thing.</p><p>The <a href="https://techcrunch.com/2026/03/20/trumps-ai-framework-targets-state-laws-shifts-child-safety-burden-to-parents/">Trump AI framework</a> is preempting state-level AI regulation and shifting child safety responsibility to parents. It makes it murky where state level AI laws sit and drive influence.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 63]]></title><description><![CDATA[OpenAI teaches models which instructions to trust, Anthropic ships 1M context and a $100M partner fund, the open model stack gets its own silicon, and agent security becomes an engineering discipline.]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-5e9</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-5e9</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Tue, 17 Mar 2026 03:24:19 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b470dbfe-a240-4b1f-8d7e-a6da179afba8_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>The Week&#8217;s Thesis</h2><p><strong>Agent security got its own engineering discipline this week:</strong> OpenAI published a <a href="https://openai.com/index/designing-agents-to-resist-prompt-injection/">design guide on defending agents against prompt injection</a> and released <a href="https://openai.com/index/instruction-hierarchy-challenge/">IH-Challenge</a>, a training dataset that teaches models which instructions to trust. AWS launched <a href="https://aws.amazon.com/blogs/machine-learning/secure-ai-agents-with-policy-in-amazon-bedrock-agentcore/">policy controls inside Bedrock AgentCore</a> for agents in regulated industries. Microsoft published a <a href="https://www.microsoft.com/en-us/security/blog/2026/03/09/secure-agentic-ai-for-your-frontier-transformation/">security blog warning that ungoverned agents can become &#8220;double agents&#8221;</a> and attached a <a href="https://venturebeat.com/technology/microsoft-says-ungoverned-ai-agents-could-become-corporate-double-agents-its">$99/month product to the problem</a>. If you&#8217;re deploying agents that read external content or operate across trust boundaries, these documents belong in your engineering review queue.</p><p><strong>Three companies answered the same question from different directions:</strong> How far can an agent reach from a single context? Anthropic made <a href="https://x.com/claudeai/status/2032509548297343196">Claude&#8217;s 1 million token context window generally available</a> for Opus 4.6 and Sonnet 4.6, scoring <a href="https://x.com/claudeai/status/2032509550239297864">78.3% on MRCR v2 at that length</a>. Perplexity shipped a <a href="https://x.com/perplexity_ai/status/2031828396435771563">full-stack agent API platform</a> combining model orchestration, real-time search, and code execution under one key. OpenAI published an engineering post on <a href="https://openai.com/index/equip-responses-api-computer-environment/">equipping the Responses API with a computer environment</a>. Anthropic says deeper into documents. Perplexity says further across the web. OpenAI says into the operating system. Your architecture choice this year is a bet on which of those axes matters most for your use case.</p><p><strong>The open model tier is getting its own infrastructure:</strong> NVIDIA shipped <a href="https://blogs.nvidia.com/blog/nemotron-3-super-agentic-ai/">Nemotron 3 Super</a>, a 120B-parameter open model with only 12B active parameters and 5x throughput gains over comparable dense models. Perplexity <a href="https://x.com/perplexity_ai/status/2032521063918420286">integrated it immediately</a> across its agent and search products. Meta published details on <a href="https://ai.meta.com/blog/meta-mtia-scale-ai-chips-for-billions/">four generations of MTIA custom inference silicon shipped in two years</a>. And NVIDIA announced a <a href="https://blogs.nvidia.com/blog/nvidia-thinking-machines-lab/">gigawatt-scale partnership with Thinking Machines Lab</a> for frontier model training. From custom silicon to serving infrastructure, the open model stack is coming together fast.</p><p><strong>Anthropic moved on every axis at once:</strong> In one week, Anthropic <a href="https://www.anthropic.com/news/claude-partner-network">invested $100 million into the Claude Partner Network</a>, launched <a href="https://www.anthropic.com/news/the-anthropic-institute">The Anthropic Institute</a> to address AI&#8217;s societal challenges, opened <a href="https://www.anthropic.com/news/sydney-fourth-office-asia-pacific">Sydney as its fourth Asia-Pacific office</a>, made <a href="https://x.com/claudeai/status/2032509548297343196">1 million token context generally available</a>, shipped <a href="https://x.com/claudeai/status/2032124273587077133">interactive charts and diagrams in chat</a>, and <a href="https://x.com/claudeai/status/2032911276226257206">doubled usage during off-peak hours</a> as a thank-you to users. That&#8217;s ecosystem, governance, geography, capability, product, and pricing, all in one week.</p><div><hr></div><h2>Quick Hits</h2><p><strong><a href="https://cursor.com/blog/cursorbench">How We Compare Model Quality in Cursor</a></strong> | Cursor &#8212; When your provider&#8217;s benchmarks stop meaning anything, you build your own. If you&#8217;re evaluating models for agentic coding, this is the framework to study.</p><p><strong><a href="https://www.technologyreview.com/2026/03/12/1134243/defense-official-military-use-ai-chatbots-targeting-decisions/">A Defense Official Reveals How AI Chatbots Could Be Used for Targeting Decisions</a></strong> | MIT Technology Review &#8212; The same architectures running your enterprise agents are now ranking military target lists. &#8220;Human in the loop&#8221; is doing a lot of work in that sentence.</p><p><strong><a href="https://x.com/GoogleDeepMind/status/2032036893076930902">Google DeepMind Names New London HQ &#8220;Platform 37&#8221;</a></strong> | X @GoogleDeepMind &#8212; Named after AlphaGo&#8217;s Move 37, the moment AI surprised its own creators. The building will include a free public AI exhibition space.</p><p><strong><a href="https://x.com/perplexity_ai/status/2032494752642568417">Perplexity Computer Is Now on Mobile</a></strong> | X @perplexity_ai &#8212; Agents that follow you across devices. Cross-device synchronization means the task you start on desktop continues on your phone.</p><p><strong><a href="https://huggingface.co/blog/nvidia/how-nvidia-won-deepresearch-bench">How NVIDIA AI-Q Reached #1 on DeepResearch Bench I and II</a></strong> | Hugging Face &#8212; An open model just topped a research benchmark designed for closed frontier models. The ceiling on what open weights can do keeps moving.</p><p><strong><a href="https://openai.com/index/openai-to-acquire-promptfoo/">OpenAI to Acquire Promptfoo</a></strong> | OpenAI &#8212; OpenAI bought the red-teaming platform 25% of Fortune 500s already use, and it&#8217;s going straight into Frontier. Agent security is a product line now.</p><p><strong><a href="https://www.technologyreview.com/2026/03/11/1134179/china-openclaw-gold-rush/">Hustlers Are Cashing In on China&#8217;s OpenClaw AI Craze</a></strong> | MIT Technology Review &#8212; Open-source agents meet gray-market entrepreneurship. Adoption is moving faster than anyone can govern it.</p><div><hr></div><h2>Featured Article: <a href="https://openai.com/index/instruction-hierarchy-challenge/">IH-Challenge: A Training Dataset to Improve Instruction Hierarchy on Frontier LLMs</a> | OpenAI</h2><p>OpenAI released IH-Challenge, a reinforcement learning training dataset that teaches models to prioritize instructions based on trust level: system over developer, developer over user, user over tool. When a model receives conflicting instructions from different sources, it needs to know which one wins. Get that wrong and you get jailbreaks, system prompt leaks, and prompt injection attacks that treat malicious text in a PDF or tool output as if it were a developer command. IH-Challenge structures this as objectively gradable tasks: a high-privilege instruction like &#8220;only answer Yes or No&#8221; paired with a lower-privilege attempt to override it, checked by a simple Python script. Fine-tuning GPT-5-Mini on the dataset produced GPT-5-Mini-R, which improved robustness from 63.8% to 88.2% under adaptive human red-teaming and from 23% to 94% against impersonation attacks. Unsafe behavior dropped from 6.6% to 0.7% when given a safety policy in the system prompt. The full dataset is available on <a href="https://huggingface.co/datasets/openai/ih-challenge">Hugging Face</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Kf_i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Kf_i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 424w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 848w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 1272w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Kf_i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png" width="1456" height="1026" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1026,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:113361,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/191211457?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Kf_i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 424w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 848w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 1272w, https://substackcdn.com/image/fetch/$s_!Kf_i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F98a7af31-b73e-49a5-b3a2-65b9223675e0_1757x1238.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The interesting part is what they didn&#8217;t do. The team identified three pitfalls in naive instruction hierarchy training: models fail not because they don&#8217;t understand hierarchy but because instructions are too complex, LLM judges used for reward signals are themselves fallible, and models learn shortcuts like refusing everything to maximize safety scores. IH-Challenge addresses all three by keeping tasks instruction-following-simple, using programmatic grading instead of LLM judges, and including an Anti-Overrefusal split that specifically trains models to recognize when lower-privilege instructions are perfectly benign. Overrefusal on the IH-Challenge benchmark improved from 79% to 100%, meaning the model stopped treating hierarchy enforcement as a reason to refuse legitimate requests. Meanwhile, GPQA Diamond and AIME 2024 scores held flat, and TensorTrust robustness jumped +8 to +15 points depending on the conflict type. If you&#8217;re building agents that process untrusted input, this is the best public evidence that instruction hierarchy can be trained once and generalize, instead of patching one attack at a time.</p><p><strong>What to watch for:</strong> Whether other model providers adopt open instruction hierarchy training datasets, and whether the programmatic-grading approach becomes standard practice over LLM-judge-based safety fine-tuning.</p><div><hr></div><h2>Watch This: <a href="https://www.youtube.com/watch?v=UabBYexBD4k">Is RAG Still Needed? Choosing the Best Approach for LLMs</a> | IBM Technology (12 min)</h2><div id="youtube2-UabBYexBD4k" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;UabBYexBD4k&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/UabBYexBD4k?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Martin Keen breaks down the real tradeoffs between RAG and long context windows as context lengths keep expanding. The video covers when vector databases and semantic search still win, when you can get away with stuffing everything into context, and how to think about the decision for your specific workload. Especially relevant this week given Anthropic&#8217;s 1 million token context going GA.</p><div><hr></div><h2>Also This Week</h2><p><strong><a href="https://aws.amazon.com/blogs/machine-learning/p-eagle-faster-llm-inference-with-parallel-speculative-decoding-in-vllm/">P-EAGLE: Faster LLM Inference with Parallel Speculative Decoding in vLLM</a></strong> | AWS AI Blog</p><p><strong><a href="https://aws.amazon.com/blogs/machine-learning/operationalizing-agentic-ai-part-1-a-stakeholders-guide/">Operationalizing Agentic AI Part 1: A Stakeholder&#8217;s Guide</a></strong> | AWS AI Blog</p><p><strong><a href="https://huggingface.co/blog/FINAL-Bench/smol-worldcup">Smol AI WorldCup: A 5-Axis Benchmark for Small Language Models</a></strong> | Hugging Face</p><p><strong><a href="https://huggingface.co/blog/async-rl-training-landscape">Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries</a></strong> | Hugging Face</p><p><strong><a href="https://huggingface.co/blog/storage-buckets">Introducing Storage Buckets on the Hugging Face Hub</a></strong> | Hugging Face</p><p><strong><a href="https://huggingface.co/blog/silma-ai/opensource-arabic-english-text-to-speech-model">SILMA TTS: A Lightweight Open Bilingual Arabic-English TTS Model</a></strong> | Hugging Face</p><p><strong><a href="https://www.technologyreview.com/2026/03/10/1134099/how-pokemon-go-is-helping-robots-deliver-pizza-on-time/">How Pokemon Go Is Giving Delivery Robots an Inch-Perfect View of the World</a></strong> | MIT Technology Review</p><p><strong><a href="https://blogs.nvidia.com/blog/jetson-generative-ai-edge-oss/">As Open Models Spark AI Boom, NVIDIA Jetson Brings It to Life at the Edge</a></strong> | NVIDIA</p><p><strong><a href="https://ai.meta.com/blog/world-resources-institute-dino-canopy-height-maps-v2/">Mapping the World&#8217;s Forests: Introducing Canopy Height Maps v2</a></strong> | Meta AI</p><p><strong><a href="https://www.llamaindex.ai/blog/build-a-searchable-audio-knowledge-base-with-gemini-embedding-2-and-llamaparse">Build a Searchable Audio Knowledge Base with Gemini Embedding 2 and LlamaParse</a></strong> | LlamaIndex</p><p><strong><a href="https://x.com/MistralAI/status/2032094267640869085">Introducing the AI Now Summit</a></strong> | Mistral AI</p><div><hr></div><h2>What I&#8217;m Watching</h2><p>There&#8217;s a thread running through this week that&#8217;s easy to miss: the testing layer is becoming a product. OpenAI acquired Promptfoo, the open-source LLM evaluation framework. Cursor built CursorBench to measure whether AI coding suggestions actually help in real workflows. And IH-Challenge, which we covered in the Featured Article, uses programmatic Python scripts instead of LLM judges to grade model behavior, specifically because LLM judges get it wrong too often.</p><p>That last detail is the one I keep coming back to. We&#8217;ve spent two years using models to evaluate models, and one of the clearest takeaways from the IH-Challenge paper is that this introduces its own failure modes. When your testing infrastructure is valuable enough for OpenAI to acquire and your grading methodology is worth publishing a paper about, evaluation is a competitive advantage. If you&#8217;re building agents today and your eval story is &#8220;we&#8217;ll have someone try it and see if it feels right,&#8221; this is the week that should change your mind.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Another Coding Blog is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 62]]></title><description><![CDATA[OpenAI drops 5.4 and loses robotics lead, Anthropic measures AI labor market impact and expands their ecosystem, only 10% of AI code passes security review, and AI is ready for primetime math/physics]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f9</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-7f9</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Mon, 09 Mar 2026 16:10:49 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/9a460638-febc-47b3-ba20-e646fbb1d6fe_1536x1024.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>The Week&#8217;s Thesis</h3><p><strong>Everybody shipped at once:</strong> If you stepped away from your desk for even a day last week, you came back to a different landscape. OpenAI released <a href="https://openai.com/index/gpt-5-3-instant/">GPT-5.3 Instant</a> on Monday and followed with <a href="https://academy.openai.com/en/home/resources/latest-model">GPT-5.4 with Thinking and Pro modes</a> by Wednesday. Anthropic opened the <a href="https://x.com/claudeai/status/2029966517497122886">Claude Marketplace</a>, added <a href="https://techcrunch.com/2026/03/03/claude-code-rolls-out-a-voice-mode-capability/">voice</a> and <a href="https://x.com/bcherny/status/2030193932404150413">scheduled tasks</a> to Claude Code. <a href="https://techcrunch.com/2026/03/05/cursor-is-rolling-out-a-new-system-for-agentic-coding/">Cursor launched Automations</a>. Each of these points in a different direction of focus, and it&#8217;s worth taking a moment to decide which ones matter for your workflows and where to start.</p><p><strong>The Pentagon deal had consequences:</strong> Last week we covered the Pentagon deal itself. This week, the consequences arrived. OpenAI&#8217;s robotics lead <a href="https://techcrunch.com/2026/03/07/openai-robotics-lead-caitlin-kalinowski-quits-in-response-to-pentagon-deal/">Caitlin Kalinowski resigned</a>, calling the arrangement &#8220;rushed without the guardrails defined.&#8221; <a href="https://techcrunch.com/2026/03/02/chatgpt-uninstalls-surged-by-295-after-dod-deal/">ChatGPT uninstalls had already surged 295%</a> while <a href="https://techcrunch.com/2026/03/01/anthropics-claude-rises-to-no-2-in-the-app-store-following-pentagon-dispute/">Claude climbed to #1 on the App Store</a>. Anthropic&#8217;s CEO <a href="https://www.anthropic.com/news/where-stand-department-war">responded directly</a> to the supply chain risk designation, challenging it in court and clarifying the statute&#8217;s narrow scope. <a href="https://techcrunch.com/2026/03/06/microsoft-anthropic-claude-remains-available-to-customers-except-the-defense-department/">Microsoft, Google, and Amazon confirmed</a> Claude remains available to their customers outside the Department of War. Meanwhile, MIT Technology Review asked the question everyone should be sitting with: <a href="https://www.technologyreview.com/2026/03/06/1134012/is-the-pentagon-allowed-to-surveil-americans-with-ai/">is the Pentagon actually allowed to surveil Americans with AI?</a></p><p><strong>AI is probing deeper than we designed for:</strong> Three companies independently bet on the same idea this week: AI as security auditor. Anthropic&#8217;s Claude <a href="https://www.anthropic.com/news/mozilla-firefox-security">found 22 real vulnerabilities in Firefox</a>, including novel bugs that existing tools missed. OpenAI launched <a href="https://openai.com/index/codex-security-now-in-research-preview/">Codex Security in research preview</a>. And Endor Labs released <a href="https://www.endorlabs.com/learn/introducing-auri-security-intelligence-for-ai-coding-agents-and-developers">AURI</a>, a free security tool, after a study found only <a href="https://arxiv.org/abs/2512.03262">10% of AI-generated code passes basic security review</a>. Separately, Anthropic&#8217;s engineering team found that <a href="https://www.anthropic.com/engineering/eval-awareness-browsecomp">Claude Opus 4.6 figured out it was being benchmarked</a>, identified the test, and decrypted the answer key on its own. These models are probing systems deeper than we&#8217;re designing for, and finding things we didn&#8217;t expect.</p><div><hr></div><h2>Quick Hits</h2><p><strong><a href="https://justin.poehnelt.com/posts/rewrite-your-cli-for-ai-agents/">You Need to Rewrite Your CLI for AI Agents</a></strong> | Justin Poehnelt (Google) &#8212; The best guide yet on building agent-first tooling. If you maintain a CLI, start here.</p><p><strong><a href="https://academy.openai.com/en/public/blogs/terence-tao-ai-is-ready-for-primetime-in-math-and-theoretical-physics-2026-03-06">Terence Tao: AI Is Ready for Primetime in Math and Physics</a></strong> | OpenAI Academy &#8212; When a Fields medalist says AI saves more time than it wastes, the bar for &#8220;useful&#8221; just moved.</p><p><strong><a href="https://techcrunch.com/2026/03/05/exclusive-luma-launches-creative-ai-agents-powered-by-its-new-unified-intelligence-models/">Luma Launches Creative AI Agents</a></strong> | TechCrunch &#8212; Turned a $15M ad campaign into localized versions in 40 hours for under $20K. Creative agencies, take note.</p><p><strong><a href="https://venturebeat.com/orchestration/new-kv-cache-compaction-technique-cuts-llm-memory-50x-without-accuracy-loss">KV Cache Compaction Cuts LLM Memory 50x</a></strong> | VentureBeat &#8212; MIT&#8217;s Attention Matching compresses working memory without accuracy loss. Long-context inference just got cheaper.</p><p><strong><a href="https://blog.google/innovation-and-ai/technology/developers-tools/io-save-the-date-2026-gemini/">Google I/O 2026: May 19-20</a></strong> | Google Blog &#8212; Save the date. The puzzle itself is a Gemini showcase, which tells you where the keynote is heading.</p><p><strong><a href="https://about.roblox.com/newsroom/2026/03/rethinking-chat-for-fun-gameplay-civility">Roblox Launches AI Chat Rephrasing</a></strong> | Roblox &#8212; Instead of blocking banned words with &#8220;####&#8221;, AI now rephrases them in real time. Moderation at 68M daily users is an AI problem now.</p><p><strong><a href="https://www.youtube.com/watch?v=53gPwkcIsXQ">LangChain CEO: Models Alone Won&#8217;t Get Agents to Production</a></strong> | VentureBeat &#8212; Harrison Chase on why &#8220;harness engineering&#8221; matters more than model upgrades for shipping real agents.</p><div><hr></div><h3>Featured Article: <a href="https://www.anthropic.com/research/labor-market-impacts">Labor Market Impacts of AI: A New Measure and Early Evidence</a> | Anthropic Research</h3><p>Anthropic introduced a new metric called &#8220;observed exposure&#8221; that combines theoretical LLM capability with real-world Claude usage data to measure which jobs are actually being affected by AI. The headline finding: AI is far from reaching its theoretical capability. Actual task coverage remains a fraction of what&#8217;s feasible. Computer programmers top the list at 75% coverage, followed by customer service representatives and data entry keyers. No systematic increase in unemployment has appeared for highly exposed workers since late 2022.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SwnM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SwnM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 424w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 848w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 1272w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SwnM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp" width="1456" height="1456" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1456,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:180532,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/190402661?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SwnM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 424w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 848w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 1272w, https://substackcdn.com/image/fetch/$s_!SwnM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7b7eb2c6-9c2b-43aa-8d1d-da2bc93f3cf6_3840x3840.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The paper opens with a point worth sitting with: past predictions about job displacement have a poor track record. Offshorability studies flagged a quarter of US jobs as vulnerable, and a decade later most of those jobs grew. This research is deliberately not making predictions. Instead, it&#8217;s building a measurement framework now, before meaningful effects emerge, so future analysis has a real baseline. The finding that matters most right now is about entry-level hiring. Among workers aged 22 to 25, hiring into exposed occupations has dropped roughly 14% compared to pre-ChatGPT levels. Workers in the most exposed professions are more likely to be older, female, more educated, and higher-paid. The pipeline is thinning before displacement shows up in unemployment data.</p><p><strong>What to watch for:</strong> The gap between what AI <em>can</em> do and what it <em>is</em> doing is closing. This report measures it directly, and future updates will show how fast the red area catches the blue. Pay attention to the entry-level hiring numbers next time around.</p><div><hr></div><h3>Watch This: <a href="https://www.youtube.com/watch?v=OUyfxhFtGCo">This New Claude Code Feature is a Game Changer</a> | Nate Herk (8 min)</h3><div id="youtube2-OUyfxhFtGCo" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;OUyfxhFtGCo&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/OUyfxhFtGCo?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>Nate walks through Claude Code&#8217;s new loop feature, which lets you set recurring tasks, reminders, and skill intervals that run for up to three days without input. The video covers how the cron tools work under the hood, a live walkthrough of setting one up, and a clear comparison of when to use loops versus scheduled tasks. If you&#8217;re already using Claude Code, this is worth eight minutes of your time.</p><div><hr></div><h2>Also This Week</h2><p><strong><a href="https://openai.com/index/reasoning-models-chain-of-thought-controllability/">Reasoning Models Struggle to Control Their Chains of Thought, and That&#8217;s Good</a></strong> | OpenAI</p><p><strong><a href="https://arxiv.org/abs/2603.05488">Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought</a></strong> | arXiv</p><p><strong><a href="https://arxiv.org/abs/2603.05344">Building AI Coding Agents for the Terminal</a></strong> | arXiv</p><p><strong><a href="https://claude.com/platform/marketplace">Anthropic Spend Commitment Now Funds Partner Integrations</a></strong> | Anthropic</p><p><strong><a href="https://claude.com/community/ambassadors">Claude Community Ambassadors Program</a></strong> | Anthropic</p><p><strong><a href="https://github.com/zeroclaw-labs/zeroclaw">ZeroClaw: Autonomous AI Assistant Infrastructure</a></strong> | GitHub</p><p><strong><a href="https://techcrunch.com/2026/03/06/city-detect-uses-ai-to-help-cities-stay-safe-and-clean/">City Detect Raises $13M Series A</a></strong> | TechCrunch</p><p><strong><a href="https://biztimes.com/groundbreaking-held-as-construction-begins-on-15b-port-washington-data-center/">Port Washington Data Center Breaks Ground</a></strong> | BizTimes</p><p><strong><a href="https://openai.com/index/descript/">How Descript Enables Multilingual Video Dubbing at Scale</a></strong> | OpenAI</p><p><strong><a href="https://openai.com/index/balyasny-asset-management/">How Balyasny Built an AI Research Engine for Investing</a></strong> | OpenAI</p><div><hr></div><h2>What I&#8217;m Watching</h2><p>Features like Claude Code&#8217;s new /loop command and projects like ZeroClaw are pointing in the same direction: autonomous agent runtimes that are lightweight, swappable, and designed to run without you. The question I keep coming back to is how long until this space fragments enough that no single framework dominates. We&#8217;re not there yet, but the building blocks are shipping fast.</p><p>The other thing I&#8217;m paying attention to is something that rarely shows up in benchmark announcements: how new model releases actually affect agent quality in production. GPT-5.4, Claude Opus 4.6, and the reasoning improvements shipping alongside them should be measurably changing chain-of-thought reliability for deployed agents. But that data is hard to find. If you&#8217;re running agents in the wild and tracking performance across model versions, I&#8217;d genuinely love to hear what you&#8217;re seeing.</p><p>And then there&#8217;s the security work. Anthropic found novel Firefox vulnerabilities. OpenAI launched Codex Security. A few newsletters ago, we covered AI solving novel physics problems. Now we&#8217;re seeing that same pattern expand: LLMs surfacing things humans hadn&#8217;t found yet. Is that just the natural expansion curve of the technology, or is it a growth signal that tracks directly with model quality? I think it&#8217;s both, and the Mozilla results suggest we&#8217;re still early in finding out what these models can actually uncover when pointed at the right problems.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.anothercodingblog.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item><item><title><![CDATA[Another Weekly AI Newsletter: Issue 61]]></title><description><![CDATA[Anthropic gets blacklisted, OpenAI signs four deals in five days, agent observability emerges as a real concern, &#127820; 2, healthcare AI shows ROI, and Geoffrey Hinton sits down with Neil deGrasse Tyson]]></description><link>https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-939</link><guid isPermaLink="false">https://www.anothercodingblog.com/p/another-weekly-ai-newsletter-issue-939</guid><dc:creator><![CDATA[Taylor Ortiz]]></dc:creator><pubDate>Tue, 03 Mar 2026 16:16:48 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/008481bd-6b04-4c23-9493-8e486de747cf_2816x1536.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h3>Personal Note</h3><p>This newsletter comes to you late this week on a Tuesday morning. Like many others, I was caught in the <a href="https://techcrunch.com/2026/03/02/anthropics-claude-reports-widespread-outage/">Anthropic outage</a> and am also dependent on this technology to drive the initiatives that are meaningful to me. When I woke up to finalize the newsletter and found things offline, I journaled while listening to the birds outside, listened to music, reflected on my weekend, and engaged in refreshing activities I normally don&#8217;t find the time for. It was a lesson for me to find more time to step away from the keyboard.</p><div><hr></div><h3>The Week&#8217;s Thesis</h3><p><strong>AI went political this week:</strong> <a href="https://www.anthropic.com/news/statement-department-of-war">Anthropic&#8217;s relationship with the Department of War fell apart</a>, and hours later, <a href="https://openai.com/index/our-agreement-with-the-department-of-war/">OpenAI signed a deal</a> for classified network deployment. On paper, both companies claim the same red lines. But the sequence alone was enough to make people uneasy. More on this in our featured story below.</p><p><strong>OpenAI&#8217;s partnership blitz:</strong> They launched <a href="https://openai.com/index/frontier-alliance-partners">Frontier Alliances</a>, a new partner program, followed by a <a href="https://openai.com/index/figma-partnership/">Codex integration with Figma</a> bridging code and design workflows. By Friday, they announced a <a href="https://openai.com/index/amazon-partnership/">strategic partnership with Amazon</a> and released a <a href="https://blogs.microsoft.com/blog/2026/02/27/microsoft-and-openai-joint-statement-on-continuing-partnership/">joint statement with Microsoft</a> reaffirming their existing relationship. Four announcements in five days, all while the Department of War deal was making headlines.</p><p><strong>Agent observability is becoming a thing:</strong> <a href="https://news.microsoft.com/source/emea/features/microsoft-cyber-pulse-ai-agents-2/">Microsoft</a> found that 80% of Fortune 500 companies are running active agents but most lack visibility into what those agents are doing. <a href="https://blog.langchain.com/you-dont-know-what-your-agent-will-do-until-its-in-production/">LangChain</a> argued that traditional APM tools weren&#8217;t built for this, <a href="https://newrelic.com/press-release/20260224-1">New Relic shipped an agent-specific observability platform</a>, and <a href="https://cloud.google.com/blog/products/ai-machine-learning/a-devs-guide-to-production-ready-ai-agents">Google published a production-readiness guide</a>. Observability is quietly becoming part of the conversation, and it&#8217;s worth paying attention to.</p><p><strong>Healthcare AI is moving:</strong> <a href="https://blogs.nvidia.com/blog/ai-in-healthcare-survey-2026/">NVIDIA&#8217;s annual survey</a> found that 70% of healthcare organizations are now actively deploying AI, with 85% reporting increased revenue. <a href="https://blogs.nvidia.com/blog/lilly-ai-factory-live/">Eli Lilly went live with LillyPod</a>, the most powerful AI factory wholly owned by a pharmaceutical company, purpose-built for drug discovery. <a href="https://ouraring.com/blog/womens-health-ai-model/">Oura shipped a proprietary AI model</a> focused on women&#8217;s reproductive health, hosted entirely on their own infrastructure. And <a href="https://www.nist.gov/blogs/taking-measure/ai-doctors-office-how-standards-can-support-trustworthiness">NIST published guidance</a> on AI trustworthiness standards for clinical settings. From drug discovery to consumer wearables to regulation, healthcare AI is moving.</p><div><hr></div><h3>Quick Hits</h3><p><strong><a href="https://techcrunch.com/2026/02/25/jiras-latest-update-allows-ai-agents-and-humans-to-work-side-by-side/">Jira&#8217;s latest update allows AI agents and humans to work side by side</a></strong> | TechCrunch &#8212; Agents on the same sprint board as humans with deadlines and assignments. This is mainstream adoption.</p><p><strong><a href="https://cloud.google.com/blog/products/ai-machine-learning/bringing-nano-banana-2-to-enterprise">Pro-level image generation gets faster and more accessible with Nano Banana 2</a></strong> | Google Cloud AI &#8212; Google&#8217;s enterprise image gen model gets faster and cheaper. The gap between &#8220;good enough&#8221; and &#8220;production-ready&#8221; keeps shrinking.</p><p><strong><a href="https://www.anthropic.com/news/acquires-vercept">Anthropic acquires Vercept to advance Claude&#8217;s computer use capabilities</a></strong> | Anthropic &#8212; Anthropic is doubling down on computer use. If agents are going to operate in production, they need to see and interact with real interfaces.</p><p><strong><a href="https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks">Detecting and preventing distillation attacks</a></strong> | Anthropic &#8212; Anthropic identified industrial-scale distillation campaigns by DeepSeek, Moonshot, and MiniMax, totaling over 16 million exchanges across 24,000 fraudulent accounts designed to extract Claude&#8217;s capabilities. They published their approach to catching and preventing it.</p><p><strong><a href="https://www.technologyreview.com/2026/02/23/1133508/the-human-work-behind-humanoid-robots-is-being-hidden/">The human work behind humanoid robots is being hidden</a></strong> | MIT Technology Review &#8212; The humans still doing the work that robot demos suggest is automated. A good reality check.</p><div><hr></div><h3>Featured Story: Anthropic&#8217;s Deal With the Department of War Fell Through. Hours Later, OpenAI Signed One.</h3><p>Anthropic published its <a href="https://www.anthropic.com/responsible-scaling-policy">Responsible Scaling Policy v3.0</a> on February 24, a ground-up rewrite of the framework it uses to decide what it will and won&#8217;t build. Two days later, <a href="https://www.anthropic.com/news/statement-department-of-war">Dario Amodei published a statement</a> revealing that Anthropic has been deeply embedded in the Department of War for months: intelligence analysis, cyber operations, modeling and simulation. The company also disclosed it walked away from several hundred million dollars in revenue by cutting off entities linked to the Chinese Communist Party. But Anthropic drew two red lines: no mass domestic surveillance of Americans, and no fully autonomous weapons.</p><p>On February 27, <a href="https://www.anthropic.com/news/statement-comments-secretary-war">Secretary of War Pete Hegseth designated Anthropic a &#8220;supply chain risk&#8221;</a>, a label historically reserved for US adversaries. Trump ordered every federal agency to stop using Anthropic technology. That same night, <a href="https://openai.com/index/our-agreement-with-the-department-of-war/">OpenAI announced a deal</a> to deploy its models on the Department of War&#8217;s classified network.</p><p>Here&#8217;s where it gets interesting: OpenAI&#8217;s stated terms include the same two red lines. No mass surveillance. No autonomous weapons. But OpenAI walked away with a deal and Anthropic walked away blacklisted. OpenAI&#8217;s approach centers on what Altman called a &#8220;safety stack&#8221;: cloud-only deployment that keeps OpenAI&#8217;s safety layers active, cleared personnel in the loop, and an agreement that if the model refuses a task, the government won&#8217;t force a workaround. What exactly differed in the negotiations isn&#8217;t public, but the outcome speaks for itself.</p><p>The RSP v3.0 explains the philosophical scaffolding behind Anthropic&#8217;s position. After two and a half years of trying to implement capability-based safety thresholds, Anthropic concluded that &#8220;the science of model evaluation isn&#8217;t well-developed enough to provide dispositive answers.&#8221; The policy now splits commitments into what Anthropic will enforce unilaterally and what requires industry-wide coordination. Autonomous weapons fall squarely in the second bucket: the reliability isn&#8217;t there yet, and no single company can build the guardrails alone.</p><p>The business implications are already visible. <a href="https://www.natesilver.net/p/anthtropic-open-ai-department-of-war">Nate Silver noted</a> that Anthropic had been steadily closing the valuation gap with OpenAI. Whether the DoW designation slows that trajectory is an open question.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KFlP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KFlP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 424w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 848w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KFlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png" width="1456" height="1055" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1055,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:171847,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://www.anothercodingblog.com/i/189646464?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KFlP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 424w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 848w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 1272w, https://substackcdn.com/image/fetch/$s_!KFlP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F323f3ecd-8519-4f5e-a838-9dd16d2fef04_1460x1058.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The question practitioners should be sitting with isn&#8217;t &#8220;who&#8217;s right.&#8221; It&#8217;s what happens next. If you&#8217;re building on Claude for sensitive workloads, your platform just got blacklisted from every federal system. If you&#8217;re building on OpenAI, your platform&#8217;s safety guarantees rest on a technical architecture rather than a legal commitment. Both carry risk. The difference is in which failure mode you&#8217;re betting on.</p><p><strong>What to watch for:</strong> Whether the &#8220;supply chain risk&#8221; designation survives legal challenge, and whether OpenAI&#8217;s cloud-only safety stack holds as models get more capable and the Department of War pushes for edge deployment.</p><div><hr></div><h3>Watch This</h3><div id="youtube2-l6ZcFa8pybE" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;l6ZcFa8pybE&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/l6ZcFa8pybE?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><strong><a href="https://www.youtube.com/watch?v=l6ZcFa8pybE">StarTalk: Geoffrey Hinton on AI, Consciousness, and the Future</a></strong>: Neil deGrasse Tyson sits down with Nobel Laureate Geoffrey Hinton to cover the full arc: how neural nets work, why backpropagation was the breakthrough, whether AI can actually reason, and the heavy questions around consciousness, energy demands, and what happens when models start generating their own training data.</p><div><hr></div><h3>Also This Week</h3><p><strong><a href="https://techcrunch.com/2026/02/25/alphabet-owned-robotics-software-company-intrinsic-joins-google/">Intrinsic joins Google</a></strong> | TechCrunch</p><p><strong><a href="https://blog.google/innovation-and-ai/products/gemini-app/android-multi-step-tasks/">Let Gemini handle your multi-step daily tasks on Android</a></strong> | Google AI Blog</p><p><strong><a href="https://www.anthropic.com/research/AI-fluency-index">Anthropic Education Report: The AI Fluency Index</a></strong> | Anthropic Research</p><p><strong><a href="https://www.anthropic.com/research/persona-selection-model">The persona selection model</a></strong> | Anthropic Research</p><p><strong><a href="https://openai.com/index/disrupting-malicious-ai-uses/">Disrupting malicious uses of AI</a></strong> | OpenAI</p><p><strong><a href="https://www.deeplearning.ai/the-batch/stanford-and-together-ai-researchers-chart-edge-models-performance-in-intelligence-per-watt/">Can Local AI Stand In for the Cloud?</a></strong> | deeplearning.ai</p><p><strong><a href="https://www.technologyreview.com/2026/02/27/1133624/ai-is-rewiring-how-the-worlds-best-go-players-think/">AI is rewiring how the world&#8217;s best Go players think</a></strong> | MIT Technology Review</p><div><hr></div><h3>What I&#8217;m Watching</h3><p><strong>OpenAI&#8217;s new role in government AI.</strong> How does OpenAI&#8217;s solidified position with the Department of War shift the tide of AI in government? Will it be relatively quiet, or will we see noticeable shifts in how these technologies are deployed domestically and how we engage in combat with other countries? And if growth and innovation eventually push against the boundaries of an agreement, does the government override, or does OpenAI become more malleable?</p><p><strong>The enterprise agent framework race.</strong> We are still in the &#8220;release agents as a capability&#8221; phase. Most enterprise platforms are now shipping their own proprietary frameworks. Will those be expansive enough to meet the breadth of platform use cases, or will we see demand expand beyond what a single-platform framework can handle, requiring true enterprise solutions?</p><p><strong>Agent observability, from experience.</strong> Observability is something we are hyper-focused on at Ping. We find that we have the highest amount of control with our custom agents, and that control reduces significantly when we adopt out-of-the-box frameworks that leave us with little say over design practices. If that&#8217;s true at our scale, it&#8217;s worth asking what it looks like at enterprise scale.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://www.anothercodingblog.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://www.anothercodingblog.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>