<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Sayantan Das]]></title><description><![CDATA[Internal Deployed Engineer at Manulife Back in the day: R&D at Vector Institute, ETH and ISRO]]></description><link>https://blog.ucalyptus.me</link><image><url>https://substackcdn.com/image/fetch/$s_!VTCg!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F77080f77-c6cc-41dd-98a2-840904db4e01_400x400.jpeg</url><title>Sayantan Das</title><link>https://blog.ucalyptus.me</link></image><generator>Substack</generator><lastBuildDate>Sun, 07 Jun 2026 06:50:59 GMT</lastBuildDate><atom:link href="https://blog.ucalyptus.me/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Sayantan Das]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[ucalyptus@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[ucalyptus@substack.com]]></itunes:email><itunes:name><![CDATA[Sayantan Das]]></itunes:name></itunes:owner><itunes:author><![CDATA[Sayantan Das]]></itunes:author><googleplay:owner><![CDATA[ucalyptus@substack.com]]></googleplay:owner><googleplay:email><![CDATA[ucalyptus@substack.com]]></googleplay:email><googleplay:author><![CDATA[Sayantan Das]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Humans Aren't Leaving the Loop]]></title><description><![CDATA[Agents compress the execution cycle. Humans move up to the strategic one.]]></description><link>https://blog.ucalyptus.me/p/humans-arent-leaving-the-loop</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/humans-arent-leaving-the-loop</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Thu, 04 Jun 2026 01:45:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!6oBN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The popular story about agents is that they will remove humans from the loop. That framing misses what is actually changing. Humans are not disappearing. They are moving to a different loop.</p><p>The fast loop belongs to the agent: observe, decide, act, verify, repeat. The slower loop belongs to the human: set intent, define guardrails, evaluate outcomes, and change the system when the pattern of results says it should change.</p><p>When people say "the agent did the work," what they usually mean is that the agent handled the short-cycle execution path. The human still shaped the mission, constrained the operating envelope, and decided whether the results were acceptable. That is not removal. That is promotion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6oBN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6oBN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6oBN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!6oBN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!6oBN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e0c1921-9865-4699-81ac-25ec543bb503_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The old mental model was simple: human thinks, human acts, software assists.</p><p>The emerging model is different: human defines the frame, agent runs inside it, and software records enough state for the system to be inspectable and correctable.</p><p>That changes the role of human judgment. It matters less in the moment-to-moment mechanics of execution and more in the structure around execution.</p><p>The new human job is not to click faster than the machine. It is to decide:</p><ul><li><p>What outcome actually matters</p></li><li><p>What constraints cannot be violated</p></li><li><p>What tradeoffs are acceptable</p></li><li><p>What signals mean the system is drifting</p></li><li><p>When the frame itself needs to change</p></li></ul><p>This is a more strategic role, but it is also a more demanding one. A human in the strategic loop cannot hide behind activity. The system exposes whether the goals, thresholds, and review criteria were well chosen.</p><p>That is why the future of agents is not just about autonomy. It is about governance with shorter feedback cycles.</p><p>An agent can iterate through a task dozens or hundreds of times faster than a person. It can retry, branch, compare, and recover without waiting for a human to press the next button. But that speed is only useful if someone has defined the objective and the stopping conditions well enough for the fast loop to stay productive.</p><p>Without that frame, autonomy degrades into expensive motion. The agent looks busy. Logs fill up. Tools fire. Tokens burn. But the system is not converging on anything worth keeping.</p><p>The practical implication is that good human operators will start to look less like operators and more like system designers.</p><p>They will spend more time on:</p><ul><li><p>Intent definition</p></li><li><p>Policy and permissions</p></li><li><p>Success metrics</p></li><li><p>Review checkpoints</p></li><li><p>Exception routing</p></li><li><p>Post-run diagnosis</p></li></ul><p>And less time on:</p><ul><li><p>Manual execution</p></li><li><p>Repetitive coordination</p></li><li><p>Low-level tool invocation</p></li><li><p>Short-horizon status checking</p></li></ul><p>This is not a hand-wavy organizational shift. It maps cleanly to the engineering structure of an agent system.</p><p>At the execution layer, the agent runs a tight control loop. It reads the current state, chooses an action, calls a tool, checks the result, updates the state, and continues.</p><p>Above that sits a control plane shaped by humans: goals, policies, task boundaries, escalation rules, budgets, and evaluation criteria.</p><p>Above that sits an even slower loop: review the outcomes across many runs, identify patterns of failure or waste, and redesign the frame.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RRYz!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RRYz!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RRYz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RRYz!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RRYz!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe0fac278-64a9-49b9-9532-e5a90c22e6f6_1600x900.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In other words, humans move from the transaction layer to the architecture layer.</p><p>That does not make humans less important. It makes their mistakes more consequential and their judgment more leveraged.</p><p>A weak worker can be corrected one task at a time. A weak frame gets multiplied across every autonomous run.</p><p>That is the real shift agents introduce. They amplify not only execution, but also whatever upstream judgment shaped the execution.</p><p>The organizations that benefit most from agents will not be the ones that simply automate the most steps. They will be the ones that become good at designing loops.</p><p>They will know how to:</p><ul><li><p>Give agents clear intent without over-specifying every action</p></li><li><p>Set hard guardrails without freezing the system</p></li><li><p>Review outputs at the right altitude</p></li><li><p>Turn repeated failures into better policies</p></li><li><p>Decide when a human must step back into the fast loop</p></li></ul><p>This also explains why "human in the loop" is now too vague a phrase. The real question is: which loop?</p><p>Are humans approving every action in the fast path? Are they only reviewing samples after execution? Are they setting policy weekly and auditing exceptions daily? Are they redesigning the workflow monthly based on aggregate outcomes?</p><p>Those are very different systems, and they produce very different economics.</p><p>The companies that win with agents will treat loop design as a core capability. They will explicitly decide what runs at machine speed, what gets escalated, what gets logged, what gets measured, and what gets changed when the results stop matching intent.</p><p>That is the future of agents.</p><p>Not humans removed from the loop.</p><p>Humans promoted to a slower, higher-leverage one.</p>]]></content:encoded></item><item><title><![CDATA[Log-Centric Agent Architecture]]></title><description><![CDATA[How event logs, deterministic projections, and replayable state change the way advanced agents are built]]></description><link>https://blog.ucalyptus.me/p/log-centric-agent-architecture</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/log-centric-agent-architecture</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Thu, 04 Jun 2026 01:14:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!TT_i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This tutorial is aimed at advanced readers who want to design agents around an append-only event log, deterministic projections, and replayable state instead of a chat loop with bolted-on memory and observability.</p><h2>The Problem With Conventional Agent Frameworks</h2><p>Most agent frameworks are <strong>LLM-first</strong>: the conversation loop is the core, tools are attached to it, rules layer on top, and logging is bolted on at the end for observability. State is persisted as retrievable "memory."</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TT_i!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TT_i!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TT_i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TT_i!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!TT_i!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8f3bfbba-60df-4fed-9d20-c1ae972887a8_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Figure 1. Conventional LLM-centric stacks treat logging as a secondary concern. In a log-centric design, the append-only event log becomes the source of truth, the working graph is derived from it, and the LLM is just one behavior among many.</em></p><p><strong>Consequences of this design:</strong></p><ul><li><p><strong>State is opaque:</strong> working memory lives in conversation context, not a queryable store.</p></li><li><p><strong>Replaying a run is hard:</strong> there is no source of truth; logs are side-effects, not the primary record.</p></li><li><p><strong>Forking is expensive:</strong> you must re-run from scratch because there is no checkpoint mechanism.</p></li><li><p><strong>Coordination between agents is explicit:</strong> agents pass messages directly instead of reacting to a shared world state.</p></li><li><p><strong>Observability is retroactive:</strong> tracing gets added after bugs appear.</p></li></ul><h2>The Core Inversion: Log-First Architecture</h2><p>The core move is to <strong>invert the stack</strong>. Make the append-only event log the source of truth. Everything else &#8212; the working graph, the agent's beliefs, and task state &#8212; becomes a <strong>deterministic projection</strong> of that log.</p><p>The architecture shift is simple but profound: instead of treating the LLM loop as the runtime and storage as a helper, the event log becomes the runtime backbone. The live graph is derived from that log, and behaviors operate against the graph rather than owning the whole system.</p><p><strong>The log is not observability. The log IS the agent.</strong></p><h2>A Concrete Runtime Shape</h2><p>To make the idea concrete, think in terms of a runtime with three layers: authoritative events, projected graph state, and reactive behaviors.</p><h3>Three-Layer Model</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AAOL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AAOL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AAOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AAOL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AAOL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcffa7f4e-d4a7-434a-ae16-cd174c63e952_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Figure 2. A three-layer runtime separates source-of-truth events, projected state, and reactive behaviors. Relation behaviors are first-class, and every behavior writes back to the shared event log.</em></p><h3>Relation Behaviors &#8212; Edges as First-Class Citizens</h3><p>Traditional agent graphs put all logic in nodes. A stronger design puts <strong>semantic logic on edges</strong>:</p><ul><li><p>A task can <code>depend_on</code> another task</p></li><li><p>A task or belief can <code>contradict</code> an existing belief</p></li><li><p>Evidence can <code>support</code> or weaken a claim</p></li><li><p>Blocking, dependency, and contradiction logic live in the relationship itself</p></li></ul><p>When the <code>contradicts</code> relation fires, a behavior automatically triggers &#8212; no explicit coordination code needed.</p><h2>What the Log Enables: Four Superpowers</h2><h3>Deterministic Replay</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nFb6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nFb6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nFb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/dc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nFb6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nFb6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdc3cf407-cfa2-47db-9b4b-8e727f07de70_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Figure 3. Once execution history is stored as events, replay becomes deterministic and branch exploration becomes cheap: reuse the shared prefix, pay only for divergent work, and compare outcomes directly.</em></p><p>Any run can be reconstructed exactly from its log. This means:</p><ul><li><p>Full audit trail from goal to individual model call</p></li><li><p>Debugging without re-running expensive LLM calls</p></li><li><p>Compliance/reproducibility for regulated domains</p></li></ul><h3>Efficient Forking</h3><p>Fork at any historical event. The shared prefix replays from cache &#8212; no redundant LLM calls for the common history. Branch, diff, compare outputs.</p><h3>Complete Causal Lineage</h3><p>Every event knows what caused it:</p><pre><code>Goal: "Analyze market"
  &#9492;&#9472; spawned Task: "Research competitors"
       &#9492;&#9472; LLM call: "List top 5 competitors"
            &#9492;&#9472; Tool call: web_search("competitor analysis 2026")
                 &#9492;&#9472; Evidence node: search result
                      &#9492;&#9472; Behavior: "contradicts" edge fired
                           &#9492;&#9472; LLM call: "Revise belief about market stability"</code></pre><p>No more "why did the agent do that?" &#8212; the log has full provenance.</p><h3>Implicit Coordination (No A2A Protocol Needed)</h3><p>Agents coordinate by <strong>reacting to the graph</strong>, not by calling each other. No explicit A2A protocol. No DAG. No workflow engine.</p><h2>Deep Dive &#8212; Runtime Design Details</h2><h3>The Event Schema</h3><p>Every event in the log carries:</p><ul><li><p><strong>`id`:</strong><code>id</code>:** unique identifier.</p></li><li><p><strong>`type`:</strong><code>type</code>:** event category such as <code>task.created</code>, <code>llm.response</code>, or <code>tool.called</code>.</p></li><li><p><strong>`payload`:</strong><code>payload</code>:** event-specific data.</p></li><li><p><strong>`actor`:</strong><code>actor</code>:** which behavior emitted the event.</p></li><li><p><strong>`causality`:</strong><code>causality</code>:** the triggering event or events.</p></li><li><p><strong>`timestamp`:</strong><code>timestamp</code>:** wall-clock time of emission.</p></li></ul><p>The causality chain is what makes full lineage possible &#8212; every event knows its parent.</p><h3>Handling LLM Non-Determinism</h3><p>The hardest problem in event-sourced agents is that LLMs are not pure functions. The practical answer is <strong>content-addressed caching</strong> keyed on a normalized hash of the request such as system message, model parameters, and output schema. On replay, a matching hash serves the cached response from the log instead of making a fresh API call.</p><p><strong>Two replay modes:</strong></p><ul><li><p><strong>Permissive</strong> &#8212; serves cached responses, allows new calls for edited prompts (default)</p></li><li><p><strong>Strict</strong> &#8212; validates byte-for-byte reproducibility; flags the first offending event if anything diverges</p></li></ul><h3>Forking Economics</h3><p>The cost savings from cached prefix replay are concrete:</p><pre><code>200-step run, fork at step 150:
  &#9500;&#9472; Steps 0&#8211;149  &#8594;  replay from cache  &#8594;  $0, ~0ms per step
  &#9492;&#9472; Steps 150&#8211;200 &#8594;  execute normally  &#8594;  full cost

Without forking: pay 200 steps &#215; N variants
With forking:    pay 50 steps &#215; N variants + 150 steps once</code></pre><p>Fork lineage is verified by <strong>literal shared event IDs</strong> &#8212; not a copy, the same events. The structural diff then shows exactly which graph objects, relations, and patches differ between branches.</p><h3>The Determinism Contract</h3><p>Behaviors must not:</p><ul><li><p>Call <code>random()</code> directly (use event-record entropy instead)</p></li><li><p>Read wall-clock time directly (use event timestamp)</p></li><li><p>Generate fresh UUIDs (obtain from event records)</p></li><li><p>Perform I/O outside framework primitives</p></li><li><p>Depend on mutable global state</p></li></ul><p>The runtime does <strong>not</strong> statically enforce this. Violations surface at replay as a divergence error pinned to the first mismatched event. For LLM calls, the contract applies to replay (via cache), not initial execution.</p><h3>What a Real Run Looks Like</h3><p>In a realistic agent workflow, the event stream gets large quickly. A run can look like this:</p><pre><code>671 events
 93 objects (3 companies, 24 questions, 25 claims, 3 memos)
 76 relations
103 model calls
 48 tool calls
  0 lines of orchestration code</code></pre><p>Coordination was entirely reactive &#8212; no explicit scheduling, no DAG, no workflow engine.</p><p><em>Lineage is the deliverable</em> &#8212; every claim links back to its behavior, triggering event, and specific model request. The causal chain from goal to output is reconstructable from the log alone.</p><h3>Comparison to Memory-Layer Approaches</h3><p>Memory-layer systems treat memory as a derived layer atop the agent. A log-centric design inverts that relationship: the log is primary, and memory in the form of a graph is a projection of it. That means you can reconstruct any past memory state, not just the current one.</p><h3>Known Limitations</h3><p>This architecture still has real limitations:</p><ul><li><p><strong>Reactive cascade risk:</strong> behaviors can trigger loops, so runs need budgets such as event caps, cost limits, and recursion depth.</p></li><li><p><strong>Log scaling:</strong> replay cost is linear, and the system does not yet have mature compaction or checkpointing for million-event runs.</p></li><li><p><strong>Schema evolution:</strong> graph schema changes require migration tooling.</p></li><li><p><strong>Side-effect replay:</strong> external writes and emails happen only on first run; only responses replay cleanly.</p></li><li><p><strong>No empirical benchmarks:</strong> the contribution is architectural rather than a head-to-head performance comparison.</p></li></ul><h2>Where Akka Fits &#8212; Filling the Production Gap</h2><p>If you want to take this architecture into production, <strong>Akka</strong> is one of the clearest reference points for the surrounding infrastructure. It gives you battle-tested event sourcing, clustering, sharding, projections, and recovery semantics without asking you to invent those mechanisms from scratch.</p><h3>Conceptual Mapping</h3><p>The easiest way to read the mapping is this: the agent runtime contributes the agent-native abstractions, while Akka contributes the production-grade event-sourcing primitives. The concepts line up surprisingly cleanly.</p><h3>What Akka Brings That A Minimal Agent Runtime Doesn't Have Yet</h3><ul><li><p><strong>Log scaling:</strong> Akka Persistence snapshots let you snapshot state every N events and replay only the tail.</p></li><li><p><strong>Schema evolution:</strong> Akka Persistence event adapters can transform older event formats on read.</p></li><li><p><strong>Distributed multi-agent coordination:</strong> Akka Cluster Sharding routes behaviors to stable nodes and keeps state alive across failures.</p></li><li><p><strong>External side-effect deduplication:</strong> Akka's at-least-once delivery model pairs with idempotent receivers so tools do not get re-executed incorrectly on recovery.</p></li><li><p><strong>Backpressure in reactive cascades:</strong> Akka Streams gives you typed, backpressured pipelines so behaviors cannot flood one another.</p></li><li><p><strong>Observability and metrics:</strong> Akka Telemetry provides message-rate, mailbox, and journal-latency visibility out of the box.</p></li></ul><h3>How a Production Log-Centric Agent Would Look With Akka</h3><p>At production scale, you would expect persistent behavior actors writing to an event journal, projections building graph views from that journal, snapshots reducing replay cost, and a dedicated layer wrapping LLM calls as event-producing behaviors.</p><h3>The Shared Philosophical DNA</h3><p>Both this style of agent runtime and Akka are grounded in the same insight that the distributed systems community learned in the 2010s:</p><p><strong>Mutable shared state is the enemy. Immutable events are the foundation.</strong></p><p>Akka's <code>EventSourcedBehavior</code> (in Akka Typed) is nearly a direct implementation of a behavior attached to a projected log:</p><pre><code>// Akka EventSourcedBehavior &#8776; log-projected behavior
EventSourcedBehavior[Command, Event, State](
  persistenceId = PersistenceId("agent", agentId),
  emptyState   = AgentState.empty,
  commandHandler = (state, cmd) =&gt; Effect.persist(toEvent(cmd)),
  eventHandler   = (state, evt) =&gt; state.applyEvent(evt)   // deterministic projection
)</code></pre><p>The difference is scope. Akka is a general distributed computing toolkit. An LLM-native runtime adds content-addressed LLM caching, a causality-aware event schema, fork-and-diff workflows, and graph projections typed around agent concepts such as beliefs, tasks, evidence, and relations.</p><h3>When to Use Each</h3><p>The practical split is straightforward:</p><ul><li><p>Use a purpose-built log-centric agent runtime when you want the architecture directly, fast iteration, and agent-native concepts out of the box.</p></li><li><p>Use <strong>Akka</strong> when you already live in the JVM world and need battle-tested event sourcing, clustering, sharding, snapshots, and durable distributed runtime behavior.</p></li><li><p>Use a hybrid approach when you want the causal and replay model of a log-centric agent runtime but need the surrounding production infrastructure that mature event-sourcing stacks already solved.</p></li></ul><h2>Log-Centric vs Conventional &#8212; Side-by-Side</h2><p>The core difference is not UI, prompt style, or framework ergonomics. It is where truth lives and what you can do once that truth is durable.</p><ul><li><p><strong>Source of truth:</strong> conventional systems rely on LLM conversation context; log-centric systems rely on an append-only event log.</p></li><li><p><strong>State:</strong> conventional state is opaque and in-context; log-centric state is explicit and queryable through the graph.</p></li><li><p><strong>Replay:</strong> conventional systems re-run from scratch; log-centric systems replay deterministically from the log.</p></li><li><p><strong>Forking:</strong> conventional branching is expensive; log-centric branching is cheap because the shared prefix is cached.</p></li><li><p><strong>Agent coordination:</strong> conventional systems depend on explicit messages or A2A protocols; log-centric systems coordinate implicitly through graph reactivity.</p></li><li><p><strong>Observability:</strong> conventional observability is bolted on; in log-centric systems the record is intrinsic.</p></li><li><p><strong>LLM role:</strong> conventional systems make the LLM the loop; log-centric systems treat it as one behavior among many.</p></li><li><p><strong>Long-running agents:</strong> conventional systems hit context-window limits; log-centric systems persist durable graph state over time.</p></li></ul><h2>Where This Fits in the Agent Landscape</h2><p>This model sits between classic orchestration frameworks and full distributed-systems thinking. It borrows from event sourcing, stream processing, Redux-style projection, and actor systems, then applies those ideas to LLM agents in a way most agent stacks still do not.</p><h2>Resources</h2><ul><li><p><a href="https://arxiv.org/abs/2605.21997">arXiv paper</a></p></li><li><p><a href="https://activegraph.ai">Active Graph site</a></p></li><li><p><a href="https://github.com/yoheinakajima/activegraph">Active Graph GitHub</a></p></li><li><p><a href="https://docs.activegraph.ai">Active Graph docs</a></p></li><li><p><a href="https://x.com/yoheinakajima/status/2057812713045377055">Tweet &#8212; paper announcement</a></p></li><li><p><a href="https://x.com/yoheinakajima/status/2060068279574843614">Tweet &#8212; better visualization</a></p></li><li><p><a href="https://x.com/yoheinakajima/status/2057099245430222926">Active Graph open source tweet</a></p></li><li><p><a href="https://doc.akka.io/docs/akka/current/typed/persistence.html">Akka Persistence docs</a></p></li><li><p><a href="https://doc.akka.io/docs/akka-projection/current/">Akka Projection docs</a></p></li></ul>]]></content:encoded></item><item><title><![CDATA[Evals Are a Feedback Loop, Not a Score]]></title><description><![CDATA[What the best agent teams in 2026 know about measuring their systems]]></description><link>https://blog.ucalyptus.me/p/evals-are-a-feedback-loop-not-a-score</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/evals-are-a-feedback-loop-not-a-score</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Wed, 20 May 2026 01:46:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!b5wE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Evals Are a Feedback Loop, Not a Score</h1><p>Most teams I talk to treat evals like a test suite. You build the agent, write some tests, watch it pass them, ship it. Done.</p><p>That model is wrong &#8212; and it's why so many agents plateau after the first sprint.</p><p>Evals are not a scoreboard. They're a training signal. Your agent will hill-climb on whatever you give it to optimize for. If your evals don't encode the real behavior you want &#8212; not the proxy, not the offline approximation, the <em>actual thing your users need</em> &#8212; you'll ship an agent that passes your tests and fails your users.</p><p>Here's what I've learned building agents in production, and what the field is converging on in 2026.</p><div><hr></div><h2>1. Evals are the substrate</h2><blockquote><p><em>"Evals are the substrate that determines what your agent does in production. They're training data for agents because we literally fit our agent to pass Evals via hill-climbing algorithms and human edits to pass failure modes. Once you get your agent into users hands, the eval generation loop compounds."</em></p></blockquote><p>The word "substrate" is doing real work here. Not tests. Not benchmarks. Substrate &#8212; the medium in which something grows.</p><p>Your agent grows to fit its evals. Every prompt edit, every tool tweak, every model swap &#8212; you measure the result against your evals and keep what passes. That's hill-climbing. It's happening whether you've named it or not.</p><p>The question isn't whether you're hill-climbing. It's whether you're climbing the right hill.</p><p>I've built systems where the eval was a holdout dataset &#8212; historical labels, frozen in time. We got very good at predicting the past. The metric looked great. Production results were flat. The hill we climbed wasn't the hill that mattered.</p><p>The gap between "passes offline evals" and "works in production" is exactly the gap between the frozen labels you have and the live behavior you actually care about. A user who asks a follow-up because the first answer was wrong &#8212; that's signal. A document that fails to extract correctly &#8212; signal. A recommendation that gets ignored &#8212; signal. None of those were in my eval set at first.</p><p>The moment you wire those production signals back into your evals, the hill changes shape. And the agent starts improving at what you actually care about.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!b5wE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!b5wE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!b5wE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!b5wE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!b5wE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc078caa3-9607-4dbc-9267-fe7c33ca3c3b_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>2. Hill-climbing is the mechanism</h2><blockquote><p><em>"This harness optimization process is becoming much more agent driven with humans reviewing and curating evals/rewards to hill climb on. Evals are a moat and thus data to produce evals is a moat. Especially true for vertical agent companies. This is because agents can fit to most Eval sets today."</em></p></blockquote><p>Here's the operational picture: you have an agent, you have a harness &#8212; the runtime, tools, prompts, skill definitions. You run the agent against your eval set. Some pass, some fail.</p><p>For failures, you do one of three things:</p><ol><li><p>Edit the prompt</p></li><li><p>Change a tool definition</p></li><li><p>Add the failure as a new eval case</p></li></ol><p>Then you run again. That loop &#8212; run &#8594; fail &#8594; edit &#8594; run &#8212; is hill-climbing. The harness is the surface you're optimizing over.</p><p>What I've found is that <em>curation</em> is the hardest part. It's easy to write a hundred evals. It's hard to write evals that are:</p><ul><li><p>Representative of what users actually encounter (not what you imagine they encounter)</p></li><li><p>Sensitive enough to distinguish genuinely good responses from superficially good ones</p></li><li><p>Robust enough not to be gamed by surface changes</p></li></ul><p>In document processing systems I've worked on, the best evals came from the failure queue &#8212; messages that failed in production, got stuck, or produced wrong output. Those aren't hypothetical edge cases. They're real ones. Each is a new data point about exactly where the agent breaks.</p><p>That failure queue is your eval generator, if you treat it as one.</p><div><hr></div><h2>3. The compound loop</h2><blockquote><p><em>"new goal unlocked: grind so everyone forgets what a harness is and they just tell us what their agent needs to do, and we just chef it all together for them (harness + evals + envs + infra + observability + self-improvement + good-vibes)"</em></p></blockquote><p>The vision here is compounding. Not just "evals improve the agent" &#8212; evals improve the agent &#8594; the improved agent reaches users &#8594; users surface new failure modes &#8594; those become new evals &#8594; the agent improves again.</p><p>Once this loop is running, it's hard to stop. Early on you're writing evals manually, dogfooding your own product to find where it breaks. Later, production data does most of the discovery for you. The best teams I've seen have an almost-automated pipeline from "something went wrong in prod" to "there's a new eval case in the backlog."</p><p>The key enablers:</p><p><strong>Observability</strong>: you have to be able to <em>see</em> failures. Structured logging with correlation IDs, run IDs, case IDs. If you can't query "show me all the cases where the agent's output was flagged," you can't close the loop.</p><p><strong>Categorization</strong>: not all failures are equal. Some are model failures. Some are tool failures. Some are eval failures (the eval expected the wrong answer). Knowing which is which tells you where to intervene.</p><p><strong>Low friction to add evals</strong>: if adding a new eval case takes an hour, you won't do it for every production failure. If it takes a minute, you will. This is infrastructure, not process.</p><p>The "chef it all together" framing is the right end state: a team that describes what the agent should do, and has infrastructure that measures whether it does that thing, surfaces the gaps, and closes them. The eval harness <em>is</em> the product, not the afterthought.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KlRV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KlRV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KlRV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KlRV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KlRV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5719ee2-42f7-46f4-ac36-6ee2576cfffb_2481x859.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>4. Gall's Law: earn your complexity</h2><blockquote><p><em>"Gall's law: 'a complex system that works is invariably found to have evolved from a simple system that worked.' Also very relevant to agent systems. Most teams are trying to jump straight to autonomous complexity before they have evals, observability, or feedback loops in place. 2026 is the year of evals."</em></p></blockquote><p>The agent teams I've seen struggle have almost always skipped the observable-simple-system phase. They built an orchestrator with five subagents, a memory layer, and tool use on day one. When it breaks &#8212; and it breaks &#8212; there's no way to tell which part failed. The eval is "the user seemed unhappy." That's not a training signal; it's a symptom.</p><p>My rule: before you add complexity, make sure you can <em>see</em> what's happening. Traffic interception, request logging, structured output validation, human review queues &#8212; pick your poison, but have something. A simple agent with good observability will outrun a complex agent with none. You can improve what you can measure. You can only guess at what you can't.</p><p>The practical implication: your first eval should be manual. A human reviewing the agent's outputs and labeling them good or bad is a valid eval harness. It's slow, it doesn't scale, but it forces you to articulate what "good" looks like. That articulation is the hard part. Everything else is tooling.</p><p>Complexity earns its place when the simple system is working and you know exactly which constraint it's hitting. Not before.</p><div><hr></div><h2>5. The frozen benchmark trap</h2><blockquote><p><em>"A benchmark is a frozen view of a solvable world. An RL environment is the same world made interactive. A harness is the runtime that lets agents act in it. A recipe is a compressed solution trajectory. A data generator is the world sampled at scale."</em></p></blockquote><p>This mental map is the clearest framing I've seen of where evals need to go.</p><p>Most agent teams are at level 1: <strong>frozen benchmark</strong>. A test set, a score, a ship decision. The problem is that this test set was assembled from a distribution of inputs that existed at a point in time, labeled by people who had to make assumptions about what the right answer was. It doesn't capture the full distribution of what the agent will actually encounter. It captures what someone had time to label.</p><p>The unlock is level 2: <strong>interactive environment</strong>. Instead of "predict the right answer for these labeled inputs," it's "act in this environment and we'll observe what happens." The eval stops being a static check and starts being a simulation of real usage. Failures are now observable <em>behaviors</em>, not output mismatches.</p><p>From there, the natural progression:</p><ul><li><p><strong>Harness</strong>: you build the runtime around the environment. Tools, skills, APIs, prompts &#8212; the harness is the surface the agent operates on, and the surface you optimize over.</p></li><li><p><strong>Recipe</strong>: accumulated knowledge compressed into structure. A wireframe template, a skill definition, a system prompt that encodes what good behavior looks like. Recipes are what you get when a domain expert and an agent engineer work together long enough.</p></li><li><p><strong>Data generator</strong>: at scale, the world generates evals for you. Production traffic, user interactions, edge cases you didn't anticipate &#8212; they surface continuously and become the raw material for the next round of eval curation.</p></li></ul><p>Most teams in 2026 are at level 1. The best teams are building toward level 5. The distance between them is compounding every month.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cF34!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cF34!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cF34!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cF34!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cF34!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cF34!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!cF34!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!cF34!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!cF34!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!cF34!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe5be088c-6e7f-468d-92a0-1e394dcc41d2_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>6. Evals as measurement infrastructure</h2><blockquote><p><em>"Evals, benchmarks, and RL environments. Measurement infrastructure that decides whether agents are safe to ship."</em></p></blockquote><p>"Measurement infrastructure" &#8212; not tests, not QA, infrastructure &#8212; is the right framing.</p><p>Infrastructure implies:</p><ul><li><p>Built once, used many times</p></li><li><p>Maintained and improved as the system evolves</p></li><li><p>A prerequisite for everything that depends on it</p></li><li><p>Skimping on it creates debt that compounds</p></li></ul><p>When I look back at agent systems I've built that stayed healthy versus ones that rotted quickly, the difference is almost always measurement infrastructure. Systems with eval harnesses improved iteratively. Systems without froze at their initial performance level because there was no way to know if a change was an improvement or a regression.</p><p>The "moat" framing is real. If your eval set encodes all the good behavior your agent needs to exhibit &#8212; edge cases, difficult inputs, real failure modes from production &#8212; that set is genuinely hard to replicate. It took real production data and real failure events to build. A competitor starting from scratch doesn't have that signal. They have benchmarks. You have signal.</p><p>This is why measurement infrastructure compounds in value over time, not just in the short run.</p><div><hr></div><h2>7. The human layer: who actually does this work</h2><blockquote><p><em>"Companies need help figuring out which models will work best for their workflows, they need extensive evals setup often, they need change management support for workflows, they need to get their data setup for the agents, and constant tuning of the agentic system for their process."</em></p></blockquote><p>The eval harness doesn't build itself. Someone has to:</p><ul><li><p>Watch production for failures</p></li><li><p>Translate failures into eval cases</p></li><li><p>Run the hill-climbing loop</p></li><li><p>Know when the metric is being gamed versus genuinely improving</p></li><li><p>Change the eval when the eval is wrong</p></li></ul><p>That's a human job. And it's one of the most technically demanding jobs in AI right now &#8212; precisely because it sits at the intersection of domain knowledge, engineering, and judgment about what actually matters.</p><p>The best AI teams I've worked alongside treat eval curation as a primary engineering discipline. Not something that happens after the "real work" is done. The eval harness <em>is</em> the real work. It's the system that makes the agent improvable. Everything else &#8212; the model choice, the infrastructure, the UI &#8212; is replaceable. The evals, and the process for generating them, aren't.</p><p>The practical implication: whoever is closest to production failures should have a direct line to your eval set. Whether that's a data scientist triaging a failure queue, a product manager reviewing user feedback, or an engineer watching structured logs &#8212; the signal is there. The question is whether you've built the pipeline to capture it, and whether the person who sees the failure can turn it into a new eval case in under five minutes.</p><p>If not, that's the first thing to fix.</p><div><hr></div><h2>Close the loop</h2><p>The teams winning on agents in 2026 aren't the ones with the best models. Models are increasingly commoditized. They're the teams with the best eval infrastructure &#8212; the ones who've turned production failures into training signal, moved from frozen benchmarks to interactive environments, and made "add an eval" a reflex rather than a project.</p><p>Hill-climbing never stops. The question is whether you're climbing the right hill, and whether your instrumentation is sensitive enough to tell.</p><p>Start with one eval. Make it manual. Make it honest. Wire it to something real &#8212; an actual failure mode, an actual user scenario your agent encountered yesterday. Then close the loop.</p><p>The compounding starts when you do.</p>]]></content:encoded></item><item><title><![CDATA[How I Run Long-Running Agents at an Insurance Company]]></title><description><![CDATA[Durable orchestration, checkpointing, and human-in-the-loop in production]]></description><link>https://blog.ucalyptus.me/p/how-i-run-long-running-agents-at</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/how-i-run-long-running-agents-at</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Wed, 20 May 2026 00:46:37 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!THal!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Insurance companies run on processes that take days, not milliseconds. A claim gets filed. It sits in a queue. An adjuster reviews it. Someone requests more documents. A supervisor signs off. Weeks pass. This is the reality I work in &#8212; and it turns out, it's a perfect forcing function for building robust long-running AI agents.</p><p>Over the past year, I've shipped several agents into production at the insurance company where I work. These aren't chatbots. They're systems that autonomously process documents, query internal databases, draft correspondence, escalate edge cases, and hand off to humans &#8212; across hours, sometimes days. Here's what I learned.</p><div><hr></div><h2>Why "Long-Running" Is a Different Problem</h2><p>Most agent tutorials show you something that completes in under 30 seconds. Prompt in, tool calls, response out. Clean.</p><p>Real enterprise workflows don't look like that. My agents need to:</p><ul><li><p>Wait for a human to upload a document before continuing</p></li><li><p>Pause mid-task because a downstream API is throttled</p></li><li><p>Resume after a weekend without losing context</p></li><li><p>Survive infrastructure restarts, deployments, and the occasional Kubernetes eviction</p></li></ul><p>The moment your agent spans more than one process lifetime, you've entered a different design space. You need <strong>persistent state, explicit checkpointing, and failure recovery</strong> &#8212; not just a retry decorator.</p><div><hr></div><h2>Architecture: The Three Layers</h2><p>After a lot of iteration, I landed on a three-layer architecture.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!THal!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!THal!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!THal!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!THal!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!THal!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!THal!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!THal!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!THal!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!THal!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!THal!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3847b8f1-e0dd-4551-8815-c61e238714ef_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Layer 1: The Orchestrator</strong></p><p>This is the durable brain. It knows what the agent is supposed to accomplish, what steps have been completed, and what comes next. It lives in the database, not in memory. Every time the agent advances a step, the orchestrator writes a checkpoint before doing anything else.</p><p>I model each agent run as a state machine. States are explicit: <code>WAITING_FOR_DOCUMENT</code>, <code>EXTRACTING_DATA</code>, <code>AWAITING_HUMAN_REVIEW</code>, <code>COMPLETE</code>, <code>FAILED</code>. Transitions are logged. You can reconstruct exactly where any agent run is &#8212; and was &#8212; at any point.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ol88!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ol88!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ol88!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ol88!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ol88!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ol88!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ol88!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 424w, https://substackcdn.com/image/fetch/$s_!ol88!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 848w, https://substackcdn.com/image/fetch/$s_!ol88!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!ol88!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6d8d05f6-3a45-40c2-ad77-12aebce4c570_1131x696.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Layer 2: The Worker</strong></p><p>Workers are stateless. They pick up a task from a queue, execute one step of the state machine, write results back to the orchestrator, and die. If a worker crashes mid-step, the orchestrator detects it via heartbeat timeout and reassigns the task. No task is lost because no task lives only in the worker's memory.</p><p>This is the biggest mindset shift from building "chatbots": <strong>treat every agent step as a unit of work that can fail and be retried independently</strong>.</p><p><strong>Layer 3: The Human-in-the-Loop Interface</strong></p><p>Some steps require a human. I don't fight this &#8212; I model it explicitly. When the agent reaches a step that needs human input, it writes a task to a work queue and suspends itself. A human completes the task (via a simple UI). The agent is notified and resumes.</p><p>This means agents can be "paused" for days waiting on a human without holding any compute. The orchestrator just has a row in the database marked <code>WAITING_FOR_HUMAN</code>.</p><div><hr></div><h2>Checkpointing: The Thing That Actually Saves You</h2><p>I checkpoint aggressively. Before every tool call. Before every LLM call. After every meaningful piece of work.</p><p>Here's the pattern I follow:</p><ol><li><p><strong>Write intent before acting.</strong> Before calling an external API, write "I am about to call X with these parameters" to the checkpoint store. If the process crashes, you know what was about to happen.</p></li></ol><ol><li><p><strong>Write result after completing.</strong> After the call succeeds, update the checkpoint to "X returned Y." Now recovery is: did we complete this step? If yes, skip it. If no, re-execute.</p></li></ol><ol><li><p><strong>Make tool calls idempotent.</strong> This isn't always possible, but where it is, do it. Generate a UUID per intent, pass it as an idempotency key to external APIs, and you can safely retry without duplicating effects.</p></li></ol><p>The checkpoint store is just a Postgres table. Nothing exotic. Each row has: <code>run_id</code>, <code>step_name</code>, <code>status</code>, <code>input_hash</code>, <code>output</code>, <code>created_at</code>. The agent loads its checkpoint on startup and skips any steps already marked complete.</p><div><hr></div><h2>Observability: What I Wish I'd Built Earlier</h2><p>When an agent run fails at step 7 of 12, three days into its execution, you need to be able to answer: <em>What did it do? What did it see? Why did it make that decision?</em></p><p>I now instrument every agent run with:</p><ul><li><p><strong>Structured event logs</strong>: Every LLM call, every tool invocation, every state transition. Logged as structured JSON with <code>run_id</code>, <code>step</code>, <code>timestamp</code>, <code>input</code>, <code>output</code>, <code>latency_ms</code>.</p></li><li><p><strong>Token usage tracking</strong>: LLM costs compound fast when you have hundreds of parallel agent runs. I log input and output tokens per call and alert when a run exceeds 2&#215; the expected token budget.</p></li><li><p><strong>A trace viewer</strong>: I built a simple internal UI that lets me pull up any agent run by ID and see its full timeline. This was a weekend project that has saved me countless hours of log-diving.</p></li><li><p><strong>Anomaly alerts</strong>: If a run has been in <code>WAITING_FOR_HUMAN</code> for more than 48 hours, we alert. If a run exceeds its expected duration by 3&#215;, we alert. These are just cron jobs querying the orchestrator table.</p></li></ul><p>The key insight: <strong>treat your agent runs like distributed transactions</strong>. The tooling exists for observing distributed systems &#8212; adapt it.</p><div><hr></div><h2>Handling Failures Gracefully</h2><p>Agents fail. LLMs hallucinate. APIs time out. Documents are malformed. I've accepted this and designed for it.</p><p><strong>Retry with backoff on transient failures.</strong> Network errors, rate limits, and temporary API outages are transient. I retry these with exponential backoff, up to a maximum. After max retries, the step fails and the run enters <code>FAILED_RETRIABLE</code> state.</p><p><strong>Escalate on semantic failures.</strong> If the LLM returns something structurally valid but semantically wrong (extracted a date that doesn't exist, classified a document into the wrong category), I don't silently swallow it. I escalate to human review. The agent writes a note explaining what it saw and why it's uncertain, and a human resolves it.</p><p><strong>Dead-letter runs for catastrophic failures.</strong> If a run hits an unrecoverable error, it goes to a dead-letter queue. A human triages it. We track dead-letter rate as a KPI. If it spikes, something is systematically wrong and we investigate immediately.</p><p><strong>Never let a run disappear silently.</strong> This sounds obvious but it's easy to miss. An agent that crashes without updating its status looks identical to an agent that's still running. I use heartbeats: every worker pings the orchestrator every 30 seconds. If a heartbeat is missed for 2 minutes, the run is marked as <code>SUSPECTED_DEAD</code> and reassigned.</p><div><hr></div><h2>Lessons From Production</h2><p><strong>Lesson 1: The context window is not your memory.</strong> Early on, I passed the entire history of a run into every LLM call. Costs exploded. Latency ballooned. Now I summarize completed steps into a compact "run summary" and only pass recent context in full. The agent doesn't need to re-read everything it's done &#8212; just know what it's already accomplished.</p><p><strong>Lesson 2: Humans are part of the system.</strong> I used to think of human-in-the-loop as a failure mode &#8212; something to minimize. Now I treat it as a first-class system component. Some decisions <em>should</em> go to humans. The agent's job is to make those handoffs clean and well-documented, not to eliminate them.</p><p><strong>Lesson 3: Idempotency is load-bearing.</strong> Every time I skipped making a tool idempotent because "it's unlikely to be retried," it got retried, and something bad happened. It's always worth the 10 extra minutes.</p><p><strong>Lesson 4: Schema-validate every LLM output.</strong> Every LLM call in my agents returns structured output validated against a schema at runtime. If the schema check fails, the step fails &#8212; loudly, with a detailed error &#8212; rather than propagating garbage downstream.</p><p><strong>Lesson 5: Separate the LLM from the decision.</strong> The LLM suggests. The orchestrator decides. I don't let LLM output directly trigger irreversible actions. There's always a thin validation and approval layer between "the model said X" and "the system did X."</p><div><hr></div><h2>The Payoff</h2><p>Running agents in this environment is genuinely hard. The failure modes are subtle, the stakes are real (insurance decisions affect people's lives), and the organizational context is conservative by design.</p><p>But it's also deeply rewarding. The agents I've shipped process hundreds of cases a week that would otherwise queue for days. They catch document inconsistencies humans miss at 2am. They make the humans who do review cases faster and more confident because the boring legwork is already done.</p><p>The patterns here &#8212; durable orchestration, aggressive checkpointing, explicit human handoff, observable runs &#8212; aren't specific to insurance. They apply anywhere agents need to survive contact with the real world.</p><p>Build for failure from day one. Your future self will thank you.</p>]]></content:encoded></item><item><title><![CDATA[They Said Extensions Were Dead. Then AI Needed a Browser.]]></title><description><![CDATA[The arc of browser plug-ins &#8212; from power-user toy, to security liability, to the quiet backbone of the agentic web.]]></description><link>https://blog.ucalyptus.me/p/they-said-extensions-were-dead-then</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/they-said-extensions-were-dead-then</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Mon, 18 May 2026 14:03:05 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!NDWt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>They Said Extensions Were Dead. Then AI Needed a Browser.</h1><p><em>The arc of browser plug-ins &#8212; from power-user toy, to security liability, to the quiet backbone of the agentic web.</em></p><div><hr></div><p>There's a moment in late 2024 that I keep coming back to. Anthropic ships Computer Use &#8212; a way for Claude to take screenshots and control a mouse cursor. The cleanest path to doing this in a browser isn't a new protocol or a custom runtime. It's a Chrome extension.</p><p>Something that had been declared dead a hundred times.</p><div><hr></div><h2>Act I: The Golden Age (2008&#8211;2014)</h2><p>Firefox invented the modern browser extension. Not just as a feature &#8212; as a philosophy. A browser was an OS-in-waiting, and anyone could extend it. Greasemonkey let you rewrite any webpage with a few lines of JavaScript. AdBlock Plus blocked every ad on the internet. Firebug built the first real devtools before browsers had them.</p><p>When Chrome launched in 2008, Google took the model and made it faster, more sandboxed, and distributable via the Web Store. By 2013, there were tens of thousands of extensions. The category had its own celebrity developers, acquisition stories, and venture bets.</p><p>This era's defining trait: extensions as <em>consumer products</em>. You installed them for yourself. Password managers, tab managers, grammar checkers, ad blockers. The browser was a platform and extensions were apps.</p><div><hr></div><h2>Act II: The Long Decline (2015&#8211;2021)</h2><p>Then came the rot.</p><p>It arrived slowly &#8212; a few extensions caught data harvesting in 2017, then a few more. Between 2024 and 2026, coordinated malware campaigns would go on to affect over 8.8 million users across Chrome, Edge, and Firefox. The Web Store turned out to be a decent distribution channel for malware. Extensions requesting <code>&lt;all_urls&gt;</code> permission got acquired for five figures, had their update payload quietly replaced, and started exfiltrating browsing history to ad brokers.</p><p>Google's response was Manifest V3, announced in 2018, which became one of the most contentious spec debates in browser history. Replacing persistent background pages with short-lived service workers. Replacing the <code>webRequest</code> blocking API that ad blockers depended on with a declarative <code>declarativeNetRequest</code> API capped on rules.</p><p>The ad blocker developers were furious. uBlock Origin's developer wrote long posts explaining exactly why the new model was insufficient. The EFF weighed in. Users signed petitions.</p><p>The <em>vibe</em> around extensions shifted entirely. Security-conscious companies started blocking extensions in managed Chrome deployments. The narrative calcified: extensions are a supply-chain risk, a performance tax, a privacy gamble. Between 2018 and 2021, if you worked in developer tools or security tooling, extensions were something you used reluctantly &#8212; not a platform you built for.</p><div><hr></div><h2>Act III: The Resurgence (2022&#8211;Present)</h2><p>Four things happened nearly simultaneously, and together they changed the calculus entirely.</p><h3>1. Computer Use Agents</h3><p>In October 2024, Anthropic launched Computer Use &#8212; a capability that lets Claude take screenshots and control a computer. OpenAI followed with Operator in January 2025, powered by a model they called the Computer-Using Agent (CUA).</p><p>The performance numbers were striking. The open-source Browser Use framework hit 89% on WebVoyager (a standardized web task benchmark), compared to 87% for OpenAI Operator and 56% for Anthropic's initial Computer Use release. These numbers moved fast &#8212; by mid-2025, Anthropic's computer use tooling was shipping production-ready headers across Claude Opus 4.7 and Sonnet 4.6.</p><p>In August 2025, Anthropic launched Claude for Chrome &#8212; a Chrome extension that gives Claude a persistent sidecar in the browser, with permission to take actions on the user's behalf. Rolling out to 1,000 subscribers on the Max plan as a research preview. Google launched Gemini integrations with Chrome. Perplexity launched its own AI browser, Comet. OpenAI merged Operator directly into ChatGPT as "agent mode" in July 2025.</p><p>The browser extension is the agent's body. It has the right permissions &#8212; <code>&lt;all_urls&gt;</code>, content script injection, synthetic event dispatch &#8212; and it's already trusted by the browser's security model in a way an external process is not.</p><p>Extensions didn't become popular again because they got better. They became necessary because the use case <em>required them</em>.</p><h3>2. Remote CDP and Browser Harness Tooling</h3><p>The Chrome DevTools Protocol &#8212; the WebSocket API that Chrome exposes when launched with <code>--remote-debugging-port</code> &#8212; has existed since 2011. For years it was mostly used by Puppeteer and later Playwright for test automation.</p><p>What changed was the architecture around it. A new pattern emerged: a lightweight daemon process holds the CDP connection to an already-running Chrome, and scripts send one-shot JSON commands over a Unix socket. No Playwright overhead, no Node.js runtime, no process spawning &#8212; just a direct line to the user's real browser, with their cookies and authenticated sessions.</p><pre><code>browser-harness &lt;&lt;'PY'
new_tab("https://example.com")
wait_for_load()
print(page_info())
PY</code></pre><p>This made browser automation feel like a UNIX tool. First navigation is <code>new_tab()</code> not <code>goto()</code> &#8212; because you're attaching to the user's <em>live</em> browser, not a clean test instance.</p><p>Cloud providers took the same insight and productized it at scale. Browserbase, founded in 2024, raised $67.5M total &#8212; including a $40M Series B led by Notable Capital in June 2025, valuing them at $300M. By 2025 they were running 50 million browser sessions per year, with customers including Airtable, Instacart, Notion, Stripe, Perplexity, and Vercel. Forbes named them to their Next Billion-Dollar Startups list.</p><p>The pattern is the same everywhere: get a <code>cdpUrl</code>, resolve the WebSocket endpoint via <code>/json/version</code>, drive the remote Chrome instance. The browser becomes stateless infrastructure you rent by the session.</p><h3>3. WebGPU-Based In-Browser LLMs</h3><p>This one is weirder and arguably bigger in the long run.</p><p>WebGPU shipped in Chrome 113 in May 2023. Unlike WebGL, it's a proper compute API &#8212; shaders, compute pipelines, buffer access. For ML inference, this means matrix multiplications at speeds that benchmark at roughly 80% of native Metal or CUDA throughput. On an Apple M3 Max, WebLLM runs Llama 3.1 8B at 4-bit quantization at ~41 tokens per second &#8212; about 80% of native MLC-LLM performance. Phi 3.5 Mini hits 71 tokens per second. On discrete NVIDIA hardware, WebGPU is 10&#8211;15&#215; faster than WASM for token generation.</p><p>WebLLM, MediaPipe LLM Inference, and Transformers.js proved that Gemma 2B, Phi-3-mini, and Qwen-1.5B could run entirely in the browser &#8212; no server, no API key, no round-trip latency. The practical sweet spot is 1B&#8211;3B parameter models at 4-bit quantization for reliable cross-device performance.</p><p>Where do extensions re-enter? An extension can load a model <em>once</em> into the background service worker and expose it via message passing to every tab. This is meaningfully better than each page re-downloading and re-initializing a 2GB model file. The extension becomes a local model host.</p><p>The privacy angle is also genuinely new. An extension running Gemma via WebGPU processes data that never leaves the machine &#8212; legally and technically. No server to subpoena. No request to intercept. For health data, legal documents, personal finance: this is a capability class that didn't exist three years ago.</p><h3>4. Manifest V3 Actually Landed (And It's Fine)</h3><p>Here's the part that surprises people: after all the drama, Manifest V3 turned out to be largely fine for the new use cases.</p><p>Chrome 139, released July 2025, completed the MV3 transition by fully removing MV2 extensions. By August 2025, 73.4% of actively maintained extensions had migrated. More telling: 90% of <em>new</em> extension uploads are already in MV3. The developer community absorbed the change.</p><p>For agent workloads specifically, the MV3 constraints are non-issues or improvements. Event-driven service workers rather than persistent background pages? Better for agent tasks that should only run when triggered. The controversial <code>declarativeNetRequest</code> cap? Irrelevant to screenshot capture, DOM injection, or synthetic input dispatch.</p><p>The ad blocker developers had a real grievance &#8212; for ad blockers. For computer use agents, browser harnesses, and in-browser LLMs, MV3 is not a blocker. The new use cases were inadvertently designed around the new constraints.</p><div><hr></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NDWt!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NDWt!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NDWt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NDWt!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NDWt!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F487a8508-8bbd-4af4-be8a-29e859495933_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2>What This Actually Means</h2><p>The first wave of browser extensions was <em>consumer software</em>. The second wave is <em>infrastructure</em>.</p><p>A consumer extension is something you choose to install for yourself. Infrastructure is something an application depends on &#8212; something that has to be there, permissioned correctly, composing with other systems.</p><p>The AI extensions market reflects this shift. There were 238 AI-powered Chrome extensions with meaningful user bases in 2025. By early 2026, that number was 442 &#8212; an 85.7% year-over-year increase, with 115.5 million combined downloads. The market was valued at roughly $2.3 billion in 2025, with projections ranging from $8.2 billion to $17.5 billion by the early 2030s.</p><p>The technical primitives that make this possible &#8212; CDP, content scripts, native messaging, WebGPU &#8212; aren't new. They've been there for years. What changed is that we found use cases that need <em>exactly</em> those capabilities and nothing simpler.</p><div><hr></div><h2>Where It Goes</h2><p><strong>Extension-as-agent-runtime becomes a standard pattern.</strong> Just as every serious product eventually shipped a mobile app, every AI agent product involving the web will ship an extension. Not because it's trendy, but because the alternative &#8212; screen capture plus pixel-coordinate clicking &#8212; is brittle, slow, and has no access to the DOM or authenticated session state.</p><p><strong>In-browser model inference becomes a tier, not a novelty.</strong> As models shrink and WebGPU matures, the question won't be "can you run a model in the browser" but "which tier do you want &#8212; local, edge, or cloud?" Extensions that manage this routing will be the abstraction layer.</p><p><strong>CDP browser farms consolidate around a few infrastructure players.</strong> Browserbase's $300M valuation after 16 months of existence is a signal. The pattern is too clean not to become commodity infrastructure. Three or four players will own this the way AWS owns EC2.</p><p><strong>The security concerns didn't go away.</strong> The same properties that make extensions powerful for agents &#8212; <code>&lt;all_urls&gt;</code> permission, access to every page, background persistence &#8212; are the properties that made them dangerous before. Security researchers have already flagged that agentic browser extensions are prompt-injection surfaces: a malicious page can try to hijack the agent's actions by embedding instructions in its content. Simon Willison has argued that the entire concept of an agentic browser extension "is fatally flawed and cannot be built safely." That's probably too strong &#8212; but it's not wrong that the security model is unsolved.</p><div><hr></div><p>Extensions had a decade as consumer software, a half-decade as a liability, and are now becoming infrastructure for the agentic web. The category didn't change. The demand did.</p><p>The browser turns out to be the last-mile interface for AI agents &#8212; not because it was designed for that, but because it's where humans already are, where the data already lives, and where the sessions are already authenticated. The extension is the seam between the model and that world.</p><p>They were never really dead. They were just waiting for something to actually need them.</p><div><hr></div><p><em>Building in this space &#8212; browser agents, CDP tooling, WebGPU inference? I'd love to hear what you're working on.</em></p>]]></content:encoded></item><item><title><![CDATA[Agent Design Patterns Used in Enterprise AI Systems]]></title><description><![CDATA[What I've learned building LLM agents that actually run in production]]></description><link>https://blog.ucalyptus.me/p/agent-design-patterns-used-in-enterprise</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/agent-design-patterns-used-in-enterprise</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 23:26:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!3za4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Agent Design Patterns Used in Enterprise AI Systems</h1><p><em>What I've learned building LLM agents that actually run in production</em></p><div><hr></div><p>There's a gap between the demo-tier agent tutorials you'll find online and what it actually takes to ship an LLM agent inside a large organization. The demos show you a loop. The enterprise asks: what happens when the LLM is slow? What happens when your tool returns garbage? What happens when the user asks a follow-up question and the agent has forgotten everything?</p><p>Over the past year I've been building agentic AI systems in a regulated financial services environment. The patterns below aren't academic &#8212; each one solved a real failure mode I ran into. I've stripped out all the proprietary details, but the engineering is real.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3za4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3za4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3za4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3za4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3za4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3za4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3za4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3za4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3za4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3za4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f23ab44-5583-4014-b82e-db54b1059ada_1376x768.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>The Enterprise Constraint Set</h2><p>Before diving into patterns, it helps to understand what's different in enterprise:</p><ul><li><p><strong>Latency budgets are tight.</strong> Users in ops or analytics workflows won't wait 8 seconds for a response.</p></li><li><p><strong>Cost per query matters at scale.</strong> An extra LLM call per message &#215; 10,000 queries/month adds up fast.</p></li><li><p><strong>Auditability is non-negotiable.</strong> In regulated industries, you need to be able to explain <em>why</em> the agent used a particular tool or returned a particular answer.</p></li><li><p><strong>Data lives behind access controls.</strong> You can't just "give the agent everything" &#8212; tool access has to be scoped to what the user is allowed to see.</p></li></ul><p>These constraints shape every pattern below.</p><div><hr></div><h2>Pattern 1: Hybrid Intent Classification (Model Router)</h2><p><strong>Problem:</strong> You need to know what the user is asking before you can pick the right tools. But calling an LLM for every classification is expensive and slow.</p><p><strong>Solution:</strong> Two-stage classification &#8212; regex heuristics first, LLM fallback only when heuristics fail.</p><pre><code>def classify_intent(query: str) -&gt; dict:
    # Stage 1: fast heuristics (&lt; 1ms)
    result = _classify_by_heuristics(query)
    if result is not None:
        return result

    # Stage 2: LLM fallback (~400&#8211;600ms)
    return _classify_by_llm(query, llm)

def _classify_by_heuristics(query: str) -&gt; Optional[str]:
    query_lower = query.lower()
    if CLINIC_PATTERNS.search(query_lower):
        return {"intent": "clinic", "entity": _extract_clinic_name(query)}
    if POSTAL_PATTERNS.search(query_lower):
        return {"intent": "postal", "entity": _extract_postal_code(query)}
    if DOCTOR_MENTION.search(query):
        return {"intent": "name", "entity": _extract_person_name(query)}
    return None  # signal: fall through to LLM</code></pre><p><strong>In practice:</strong> ~80% of queries are handled by heuristics. The LLM only fires for ambiguous cases. This keeps median latency under 100ms while preserving accuracy for edge cases.</p><p>The LLM classifier returns structured JSON:</p><pre><code>prompt = """You are an intent classifier. Return ONLY valid JSON:
{"intents": [{"type": "&lt;name|clinic|postal|unknown&gt;", "object": "&lt;value_or_null&gt;"}]}

Query: {query}"""</code></pre><p>Forcing JSON output makes the downstream routing deterministic. No parsing surprises.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7rki!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7rki!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7rki!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7rki!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7rki!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7rki!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7rki!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7rki!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7rki!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7rki!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8d8a38b3-c5c7-43c2-aa2d-32d4e29ea031_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Pattern 2: Dynamic Tool Selection</h2><p><strong>Problem:</strong> Giving the agent access to every tool at all times is wasteful and confusing. An agent with 8 tools available will sometimes use the wrong one.</p><p><strong>Solution:</strong> Only enable tools that are relevant to the detected intent.</p><pre><code># Base set: always enabled
enabled_tools = [tools["analyze_data"]]

# Entity-specific tools: only when we have a named entity
has_entity = any(
    i["type"] in ["name", "clinic", "postal"]
    for i in classified_intents
)

if has_entity:
    enabled_tools.extend([
        tools["search_registry"],
        tools["get_records"],
        tools["aggregate_monthly"],
        tools["analyze_activity"],
        tools["get_summary_stats"],
    ])

agent = Agent(instructions=SYSTEM_PROMPT, tools=enabled_tools, model=model)</code></pre><p><strong>Why it matters:</strong> Fewer tools = smaller decision space = faster tool selection + fewer wrong-tool errors. It also gives you implicit routing &#8212; the agent physically cannot call a tool you haven't enabled.</p><div><hr></div><h2>Pattern 3: Multi-Algorithm Entity Search (Search Agent)</h2><p><strong>Problem:</strong> Users don't spell consistently. "Dr. Johnson" vs "Johnson, Sarah" vs "Dr. Sara Jonhson" are all the same person.</p><p><strong>Solution:</strong> Three-layer matching &#8212; exact, fuzzy (Levenshtein), n-gram (Jaccard). Apply all three; keep results that pass any one threshold.</p><pre><code>def _fuzz_score(x: str, y: str) -&gt; float:
    """Levenshtein-based partial ratio."""
    return 1.0 if x == y else fuzz.partial_ratio(x, y) / 100.0

def _ngram_sim(x: str, y: str) -&gt; float:
    """Jaccard similarity on token sets."""
    set_x, set_y = set(x.split()), set(y.split())
    inter = set_x &amp; set_y
    union = set_x | set_y
    return len(inter) / len(union) if union else 0.0

# Apply both scores, pass if either exceeds threshold
df["fuzz"] = df["NAME"].apply(lambda n: _fuzz_score(n.upper(), query_upper))
df["ngram"] = df["NAME"].apply(lambda n: _ngram_sim(n.upper(), query_upper))

matches = df[
    (df["NAME"].str.upper() == query_upper) |   # exact
    (df["fuzz"] &gt;= 0.6) |                        # fuzzy
    (df["ngram"] &gt;= 0.5)                         # n-gram
]</code></pre><p><strong>Why both fuzzy and n-gram?</strong></p><ul><li><p>Fuzzy catches typos: "Jonhson" &#8594; "Johnson"</p></li><li><p>N-gram catches reorderings: "Sarah Johnson" &#8594; "Johnson Sarah"</p></li><li><p>Combined: you get recall without sacrificing precision</p></li></ul><div><hr></div><h2>Pattern 4: Layered Fallback</h2><p><strong>Problem:</strong> Production systems fail. The LLM times out. The database is slow. The SDK isn't installed in some environment.</p><p><strong>Solution:</strong> Multiple independent fallback layers, each one degrading gracefully to the next.</p><pre><code># Layer 1: SDK availability
try:
    from agents import Agent, Runner, function_tool
    SDK_AVAILABLE = True
except ImportError:
    SDK_AVAILABLE = False

if not SDK_AVAILABLE:
    return "Agent SDK not available. Please install openai-agents."

# Layer 2: Classification fallback (heuristic &#8594; LLM &#8594; default)
intent = _classify_by_heuristics(query)
if intent is None:
    try:
        intent = _classify_by_llm(query, llm)
    except RateLimitError:
        intent = _classify_by_llm(query, llm)  # single retry
    except Exception:
        intent = {"type": "unknown", "object": None}  # safe default

# Layer 3: Tool execution fallback
try:
    result = run_tool(params)
except Exception as e:
    return json.dumps({"error": str(e), "success": False})</code></pre><p><strong>Key principle:</strong> Every layer has exactly one job. The SDK check doesn't know about rate limits. The rate limit handler doesn't know about tool failures. Each layer handles exactly one class of failure and passes everything else down.</p><div><hr></div><h2>Pattern 5: Session Memory and Context Reuse (Feedback Loop)</h2><p><strong>Problem:</strong> Users ask follow-up questions. "What about that second provider?" doesn't make sense without memory of the previous turn.</p><p><strong>Solution:</strong> SQLite-backed session memory + explicit context reuse for ambiguous follow-ups.</p><pre><code># On session start
session_id = user_session.get("id")
memory = SQLiteSession(session_id=session_id, db_path=":memory:")
user_session.set("agent_memory", memory)

# On each message
result = await Runner.run(
    agent,
    input=query,
    session=memory  # agent sees full conversation history
)

# Context reuse for unknown intents (follow-up questions)
if any(i["type"] == "unknown" for i in intents):
    previous_tools = user_session.get("previous_enabled_tools")
    if previous_tools:
        # "What about that other provider?" &#8594; reuse last search context
        enabled_tools = previous_tools</code></pre><p>The context reuse heuristic is simple: if the intent is "unknown" (no entities detected), the user is probably asking about something already established in the conversation. Reuse the tool set from the previous turn instead of starting fresh.</p><div><hr></div><h2>Pattern 6: In-Memory Data Caching</h2><p><strong>Problem:</strong> Loading a large DataFrame from a data lake on every query adds 2+ seconds of latency.</p><p><strong>Solution:</strong> Lazy-load into a module-level cache. First call pays the load cost; every subsequent call uses the in-memory copy.</p><pre><code>_registry_df: Optional[pd.DataFrame] = None
_records_df: Optional[pd.DataFrame] = None

def _load_registry() -&gt; pd.DataFrame:
    global _registry_df
    if _registry_df is None:
        dt = DeltaTable(REGISTRY_PATH)
        _registry_df = dt.to_pandas()
    return _registry_df</code></pre><p><strong>Result:</strong> First query: ~2s. Subsequent queries: ~50ms. 40&#215; improvement for a few lines of code.</p><p><strong>Watch out for:</strong> Memory pressure if DataFrames are very large, and stale data if the underlying source updates. For most analytics use cases, session-lifetime caching is a reasonable trade-off.</p><div><hr></div><h2>Pattern 7: Selective Result Limiting</h2><p><strong>Problem:</strong> Returning 50,000 rows to the LLM context window is expensive, slow, and usually useless.</p><p><strong>Solution:</strong> Return summaries and limited result sets by default. Only go wide when the user explicitly asks.</p><pre><code>@function_tool
def get_records(entity_ids: list[str]) -&gt; str:
    """Get records for the given entity IDs."""
    result = fetch_records(entity_ids, limit=100)  # never return everything
    return json.dumps(result, default=str)

@function_tool
def get_summary_stats(entity_ids: list[str]) -&gt; str:
    """Get summary statistics: first/last date, total amounts, counts."""
    # Returns 5 numbers instead of 5000 rows
    result = compute_summaries(entity_ids)
    return json.dumps(result, default=str)</code></pre><p>Design your tool set so the agent reaches for summaries first and raw records only when it genuinely needs them. This is a prompt engineering + tool design problem together &#8212; your tool descriptions need to clearly signal when to use which.</p><div><hr></div><h2>By the Numbers</h2><p>Two patterns that pay for themselves immediately &#8212; the classification split and the caching improvement:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uOe2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uOe2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uOe2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uOe2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 424w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 848w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!uOe2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F580bb802-48ce-433d-941f-c93d8d0c5dd5_1622x673.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The caching win is almost embarrassingly simple. The classification split is worth thinking about carefully upfront &#8212; the 80% heuristic coverage number will vary by domain, but even 50% coverage halves your classification cost.</p><div><hr></div><h2>How These Patterns Combine</h2><p>These patterns aren't independent &#8212; they're designed to compose:</p><pre><code>Pattern 1 (Model Router)
    &#8594; classifies intent
    &#8594; feeds Pattern 2 (Dynamic Tool Selection)
        &#8594; scopes agent's decision space
        &#8594; Pattern 3 (Search Agent) handles entity lookup
        &#8594; Pattern 7 (Result Limiting) keeps context lean
            &#8594; Pattern 5 (Feedback Loop) handles follow-ups
                &#8594; Pattern 4 (Fallback) catches everything that breaks
                    &#8594; Pattern 6 (Caching) keeps it fast</code></pre><p>The compounding effect: classification is fast (Pattern 1), so tool selection is accurate (Pattern 2), so the agent stays focused, so responses are faster and cheaper, so you can afford the LLM fallback for genuinely hard cases.</p><div><hr></div><h2>What I'd Apply First</h2><p>If you're building an enterprise agent today and you can only pick three:</p><ol><li><p><strong>Hybrid intent classification</strong> &#8212; the cost/latency savings pay for everything else.</p></li><li><p><strong>Dynamic tool selection</strong> &#8212; directly reduces agent confusion in multi-tool systems.</p></li><li><p><strong>Layered fallback</strong> &#8212; production systems fail; don't let one failure mode take down the whole chain.</p></li></ol><p>The memory and caching patterns matter a lot once you're at scale, but the first three are table stakes for any agent you're running in front of real users.</p><div><hr></div><p><em>These patterns aren't specific to any one framework &#8212; I've used them with OpenAI Agents SDK, LangChain, and raw API calls. The underlying logic transfers.</em></p><p><em>If you're building something similar or have patterns that worked for you, I'd love to hear about it in the comments.</em></p>]]></content:encoded></item><item><title><![CDATA[Be an Internal FDE for Your Company]]></title><description><![CDATA[The most underrated career move in data science isn't a promotion. It's becoming the person everyone already trusts.]]></description><link>https://blog.ucalyptus.me/p/be-an-internal-fde-for-your-company</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/be-an-internal-fde-for-your-company</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 20:52:21 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Of2V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1>Be an Internal FDE for Your Company</h1><p>Forward Deployed Engineers are the hottest job in tech right now.</p><p>OpenAI just launched its own FDE agency. Anthropic is hiring them at scale. Palantir built its entire go-to-market on them. Stripe created a new role &#8212; the "Forward Deployed AI Accelerator" &#8212; embedding AI-native engineers directly inside their marketing teams, assigned to cohorts of 20 people each, building custom tools alongside them until every person is self-sufficient. Box's CEO wrote publicly that building AI agents for internal functions is "a highly technical job, very much akin to a forward deployed engineer."</p><p>The external FDE market is exploding because someone has to do the hard part of deploying AI inside real organizations &#8212; understanding business processes end to end, wiring up the right models, setting up evals, managing workflow change, and tuning agentic systems continuously until they actually work.</p><p>Here's the thing: your company almost certainly won't bring one of these people in for your internal data team. The going rate is reportedly $10K/day for the big-name providers. And even if they did &#8212; it wouldn't be enough.</p><p>The CEO of Box said it clearly: "External FDEs, in my opinion, will not make your company an AI-first company. You can have the sleekest multi-agent orchestrations and still have the majority of your employee base hating AI, avoiding AI, and distrusting leadership decisions on it."</p><p>Stripe understood this when they designed their accelerator role. They built it specifically because "most employees won't upskill themselves. They'll need someone who is embedded within their teams to build alongside them."</p><p>That someone can be you &#8212; from the inside, with years of organizational context, at a fraction of the friction.</p><p>That's the Internal FDE. And if you're a data scientist or AI engineer inside a large enterprise, this is the most important role you're not being asked to play.</p><div><hr></div><h2>Why Your Work Disappears Without This Role</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Of2V!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Of2V!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Of2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Of2V!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Of2V!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14583eba-f9a0-48b9-a448-bf5fce5c070c_1904x783.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let me describe something that happens constantly in enterprise data teams.</p><p>A team of smart people spends six months building a genuinely excellent model. It performs well on holdout. The business case is solid. The model goes into production.</p><p>And then... it gets used by three analysts, never seen by the decision-makers it was built for, and quietly deprecated when a new VP comes in with different priorities.</p><p>The code was fine. The data science was fine. The problem was that nobody translated it.</p><p>Nobody walked the floors. Nobody told the story in language that connected to the business team's quarterly priorities. Nobody made the downstream teams feel like partners instead of an upstream data source. Nobody built the internal credibility that makes a model a business asset instead of a technical artifact.</p><p>This is the FDE gap.</p><p>Deploying agents &#8212; or any complex AI capability &#8212; is fundamentally harder than deploying software. Software generally works the same way every time. With agents, you're deploying the equivalent of <em>work output</em> inside the enterprise. The business expects tasks solved nearly end-to-end. That means someone has to deeply understand the business process, handle model selection, set up evals, manage workflow change, get the data right, and tune the system continuously. It's not a side project. It's a mission-critical engineering role.</p><p>In the external FDE model, a vendor does this for you &#8212; at enormous cost, with no lasting institutional knowledge. In the internal model, it's you. With years of organizational context they could never have.</p><div><hr></div><h2>What an Internal FDE Actually Does</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NB3b!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NB3b!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NB3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NB3b!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 424w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 848w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!NB3b!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F76074257-fd64-407d-8ddb-29ed15dace80_2218x732.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>You are the connective tissue between deep technical work and business reality. That means four things, specifically.</p><p><strong>You are a translator.</strong> When you talk to the underwriting team, you don't lead with AUC scores. You lead with "here's what this means for your loss ratio." When you're presenting to the executive leadership team, you don't lead with model architecture. You lead with the business decision this changes. Translation is not dumbing down &#8212; it's respecting your audience enough to meet them in their world.</p><p><strong>You are a scout.</strong> Before anyone on your team builds anything, you've already had the conversations. You know what the business unit is worried about this quarter. You know where the downstream teams are frustrated. You know which operations lead would be an incredible champion if someone just brought them a relevant prototype. You don't wait for a project brief to discover business needs. You accumulate that intelligence continuously.</p><p><strong>You are a credibility builder.</strong> Technical credibility comes from doing good work. Business credibility comes from showing up consistently, being honest when something doesn't work, and making people's lives easier before you ask them for anything. The Internal FDE is known before any project starts. When a new initiative needs a data partner, your team's name comes up because you've been in those rooms.</p><p><strong>You are an amplifier.</strong> Great work that nobody knows about is just expensive hobby time. The Internal FDE makes sure that when the team delivers something valuable, the right people understand what it does, how it was built, and what's possible next. Not with self-promotion, but with clear, consistent communication that connects outputs to outcomes.</p><div><hr></div><h2>Your Unfair Advantage</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tY0P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tY0P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tY0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tY0P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 424w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 848w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!tY0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcfb201be-76e1-4fa2-9982-d362ee4978ef_1904x863.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>External FDEs bring technical depth and pattern recognition from deploying similar systems across many companies. That's genuinely valuable.</p><p>But they lack something you already have: years of accumulated context about how <em>this specific organization</em> thinks, decides, and resists change.</p><p>There's a reason the best specialist consultants are almost always people who spent a decade inside the industry before going independent. If you tear your ACL, you want to go to an ACL surgeon &#8212; someone who has seen this specific failure mode hundreds of times in this specific context, not a generalist learning on your dime.</p><p>You are the ACL surgeon for your company's data problems. You know the internal politics. You know why the last initiative failed. You know which VP will champion something and which one will quietly kill it. You know which downstream team has the best data hygiene and which one will fight every integration request.</p><p>External FDEs have to earn that context from scratch, and they're on the clock while they do it. You already have it. The question is whether you're using it.</p><div><hr></div><h2>Lead With Beliefs, Not Features</h2><p>The most common mistake in internal technical communication: leading with the work instead of the so-what.</p><p>"We've built a gradient boosted model with 87 features and a cross-validated AUC of 0.84" is accurate. It also doesn't start a conversation with anyone who isn't already thinking about the same technical problem you are.</p><p>Compare it to a belief statement: "I believe our models will only drive value if the business team owns the output, not the AI team." That tells your stakeholders immediately where you stand and why. It gives them something to react to, agree with, or push back on. It starts a real conversation.</p><p>This is what the best FDEs do with their clients &#8212; they don't open with product specs, they open with a diagnosis of the problem and a clear point of view on what matters. The same applies internally.</p><p>Before any major presentation or communication, ask yourself: what's the belief I want this audience to leave with? Work backwards from that.</p><div><hr></div><h2>Building Your Internal Audience</h2><p>The first people who engage with your work set the tone for everyone else.</p><p>If the first reaction to your model readout is a VP saying "this is exactly what I've been asking for," everyone else in the room recalibrates. If the first reaction is "I don't understand what this means," you're spending the rest of the meeting recovering.</p><p>This means you need to do pre-work. Not to game the room, but to make sure the first engagers are informed engagers. Before any major internal presentation:</p><p><strong>Talk to one or two stakeholders in advance.</strong> Not to get their approval, but to understand their current mental model and make sure your framing lands. Ask them what question they'd most want this work to answer. Then answer that question first.</p><p><strong>Find your internal champion before you need one.</strong> In every business unit that interacts with your team, there's usually someone who "gets it" &#8212; who thinks quantitatively, who's frustrated with the status quo, who would love to have a data partner. That person is your ally. Invest in the relationship before you have a project that needs them.</p><p><strong>Make your first engagement easy.</strong> Leave stakeholders with something clear to do or respond to. Not "let us know if you have questions" &#8212; but "we're going to pilot this with two underwriters in Q3 &#8212; can you connect us with the right people on your team by end of next week?"</p><div><hr></div><h2>The Communication Types That Build Credibility</h2><p>Different types of internal communication build different kinds of trust. The most effective Internal FDEs cycle through all four.</p><p><strong>Thought leadership:</strong> Share a perspective on where AI and data are headed in your specific domain &#8212; pricing, risk, claims, whatever your function is. You have a unique vantage point. Nobody else at your company sits at the intersection of the data and the business problem you're closest to. That vantage point is valuable, but only if you share it. Write a short internal piece. Send it to three people you respect. If it resonates, expand it.</p><p><strong>Personal stories:</strong> The most underused communication type inside enterprise data teams. "Here's what I thought when we started this project, here's what we found, here's what we were wrong about" is more compelling than any polished methodology deck. It's also more credible. Anyone can dress up a success. The willingness to narrate the failure modes, the pivots, the things that surprised you &#8212; that's what makes people trust your judgment on the next project.</p><p><strong>How-to guides:</strong> Technical teams build institutional knowledge that lives in people's heads and disappears when they leave. The Internal FDE turns that knowledge into artifacts. A one-pager on how to use your team's outputs. A short guide on how to read a lift chart for a non-data audience. Documentation that enables the business team to get value from your work without needing you in the room every time. This is leverage.</p><p><strong>Predictions:</strong> "Here's what I think is going to matter in our space in the next twelve months" is a powerful thing to say internally, especially if you're right more often than not. It positions you as forward-looking, gives leadership something to react to, and often turns into a real project when someone says "actually, we've been thinking about that too."</p><div><hr></div><h2>The Practical Mechanics</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Z4TX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Z4TX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Z4TX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Z4TX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Z4TX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F816af639-25f9-4a0b-a295-94c859874e81_1376x768.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Here's what actually building an internal FDE practice looks like, week to week.</p><p><strong>Have one non-project coffee per week.</strong> One conversation with someone outside your immediate team that isn't about a current project. Just: what are you working on, what's keeping you up, what do you wish was different? This is how you build your scout intelligence. Over a year, these conversations compound into a map of the organization that nobody else has.</p><p><strong>Write one short internal thing per month.</strong> It doesn't have to be long. A one-page take on something interesting you've learned. A short summary of a paper that has implications for your work. A retrospective on a project that didn't go as expected. Share it with a small list of people who would genuinely find it interesting. Do not mass-blast to the whole data org. Quality of audience matters more than quantity.</p><p><strong>Create one reusable artifact per quarter.</strong> The Internal FDE leaves infrastructure behind. What would help the business teams work better with data, even when you're not in the room? A template. A guide. A decision framework. A cheat sheet. Something that gives you leverage without requiring your continuous presence.</p><p><strong>Show up before you're needed.</strong> The worst time to build relationships with a business unit is when you need something from them on a deadline. Show up at their all-hands when you're invited. Ask to sit in on a team meeting just to understand their workflow better. This is not political maneuvering. It's the basics of being a good colleague in a large organization where teams default to siloing.</p><div><hr></div><h2>The Objection: "I'm a Data Scientist, Not a Communicator"</h2><p>I've heard this from genuinely excellent technical people, and I want to push back on it.</p><p>Communication isn't a personality trait. It's a skill, and like any skill, it improves with practice and feedback. The data scientists who are most effective in enterprise settings aren't necessarily the most naturally outgoing people. They're the people who've built the habit of translating their work into business language, and who do it consistently.</p><p>There's also a structural argument here. In a large company, the value of technical work is not determined by the quality of the technical work. It's determined by whether decision-makers trust the team that produced it, understand what it does, and see a path to acting on it. That trust is built through human interaction, not through model cards.</p><p>You can be the best data scientist in the building, and if nobody above the technical level knows what you're working on or why it matters, you will have less impact than someone half as technically skilled who communicates well. That's not cynical &#8212; it's just how organizations work.</p><p>The Internal FDE role is not about becoming a politician or a salesperson. It's about deciding that you care enough about the impact of your work to do the last mile that makes it real.</p><div><hr></div><h2>What You're Really Building</h2><p>The frustrating thing about doing excellent technical work in a large organization is that excellence is necessary but not sufficient. A model that lives in a notebook, or gets used by three people, or gets deprecated when priorities shift &#8212; that model didn't matter, regardless of how technically sound it was.</p><p>The Internal FDE closes the gap between good work and real impact. It's the part of the job that nobody assigns you, that doesn't show up in your performance criteria, and that most technical people quietly resent having to do.</p><p>But it's also the part that determines whether your work changes anything.</p><p>Companies are now paying $10K a day to bring in external people to do this. They're creating new job titles for it. They're embedding engineers inside teams specifically to solve the change management and translation problem that technical teams consistently fail to solve on their own.</p><p>You can be that person from the inside. With more context, more continuity, and a fraction of the friction.</p><p>That's worth building.</p><div><hr></div><p><em>If you're building an AI or data practice inside a large enterprise and want to think through what the Internal FDE role looks like on your specific team, I'd love to hear what you're running into. Drop a reply or send a message.</em></p>]]></content:encoded></item><item><title><![CDATA[How to Build a Killer LLM Studio in 2026]]></title><description><![CDATA[A system design deep-dive: six services, one gateway, and the engineering decisions that actually matter]]></description><link>https://blog.ucalyptus.me/p/how-to-build-a-killer-llm-studio</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/how-to-build-a-killer-llm-studio</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 20:13:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SRU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most teams building on top of LLMs in 2026 are doing one of two things: calling an API and calling it a day, or drowning in MLOps complexity trying to replicate what hyperscalers do.</p><p>Neither works if you actually want to own your model's behaviour.</p><p>This post is a system design walkthrough of an <strong>LLM Studio</strong> &#8212; the software infrastructure that lets a small team run the full model-improvement loop end to end. Think of it like a HelloInterview deep dive, but the system being designed is your own fine-tuning platform.</p><div><hr></div><h2>What Problem Are We Actually Solving?</h2><p>The core loop of model ownership looks like this: you have a task, you have data, you want a model that gets better at that task over time. The challenge isn't the ML math &#8212; libraries handle that. The challenge is the <strong>software infrastructure</strong> around it.</p><p>You need to:</p><ul><li><p>Ingest and version training data (including synthetically generated data)</p></li><li><p>Run fine-tuning jobs without blocking everything else</p></li><li><p>Serve the resulting models without redeploying when new checkpoints land</p></li><li><p>Evaluate models against each other without writing one-off scripts</p></li><li><p>Let non-ML engineers trigger experiments through a UI</p></li></ul><p>A notebook can do one of these. A studio does all of them, reliably, repeatedly.</p><div><hr></div><h2>High-Level Architecture</h2><p>The studio is six loosely coupled services. Each has a single responsibility, its own database schema, and communicates over HTTP.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SRU-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SRU-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SRU-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SRU-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SRU-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0bec842b-89d2-4f57-973b-879c4f0c46db_2064x815.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The two most important separations:</p><p><strong>Finetuning and inference have different lifecycles.</strong> Training is GPU-hungry, bursty, and failure-tolerant &#8212; if a job crashes, retry it. Inference is always-on and latency-sensitive &#8212; if it crashes, users notice. Coupling them in one process means a bad training job can kill your serving layer.</p><p><strong>The data service is its own thing.</strong> Synthetic data generation is slow (minutes per batch), LLM-heavy, and produces intermediate artefacts that need versioning. Baking it into the API would turn every data request into a blocking call.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fwYS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fwYS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fwYS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fwYS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fwYS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd1abdbc9-8f98-45a0-ae9b-b6a88cd068b7_2064x1183.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Deep Dive: The Finetuning Service</h2><p>This is the most complex service. Its job is to take a submitted training request and reliably produce a fine-tuned model checkpoint.</p><h3>Job Scheduling</h3><p>The service uses <strong>APScheduler</strong> with a background scheduler running three recurring jobs:</p><pre><code>scheduler.add_job(finetuning_scheduler.run_fine_tuning_jobs, "interval", seconds=N)
scheduler.add_job(finetuning_scheduler.check_running_jobs,    "interval", seconds=N)
scheduler.add_job(finetuning_scheduler.check_jobs_waiting_for_checkpointing, "interval", seconds=N)</code></pre><p>Why a polling scheduler instead of a task queue? Because fine-tuning jobs are long-running (minutes to hours) and stateful. Celery is designed for short tasks. APScheduler lets the service own the full job lifecycle &#8212; queuing, running, checkpointing, failure &#8212; without fighting a task framework's assumptions.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_lKR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_lKR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_lKR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_lKR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 424w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 848w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!_lKR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F54f8599c-40f4-4e0f-a950-dd4a78c2b66d_2224x655.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Training Runners</h3><p>Two runners live inside the finetuning service: one for SFT (Supervised Fine-Tuning) and one for GRPO (the reinforcement learning variant). They're separate processes, not just functions &#8212; the runner is launched as a subprocess with explicit hyperparameter flags. This means:</p><ul><li><p>A runner crash doesn't take down the service</p></li><li><p>You can run multiple jobs in parallel on separate GPUs</p></li><li><p>Runner code can be updated without restarting the scheduler</p></li></ul><p>Checkpoints get written to a shared filesystem path and the service marks the job as <code>CHECKPOINTING</code> before promoting the adapter to the model registry.</p><div><hr></div><h2>Deep Dive: The Inference Service</h2><p>The inference service exposes an <strong>OpenAI-compatible chat completions endpoint</strong>. The model underneath can be Ollama (local dev) or vLLM (production), but the contract to callers doesn't change.</p><pre><code>POST /inference/v1/chat/completions
{ "model": "my-finetuned-adapter-v3", "messages": [...] }</code></pre><h3>Why Ollama Locally, vLLM in Prod?</h3><p>Ollama is trivially easy to set up on a laptop &#8212; <code>ollama pull llama3</code> and you're serving. But Ollama's throughput is single-request, which is fine for development and the model playground, but not for evaluation runs that fire 200 completions in parallel.</p><p>vLLM uses <strong>continuous batching</strong> &#8212; incoming requests share KV cache and GPU compute. For the GRPO training loop, which needs to generate 8 completions per prompt across thousands of prompts, vLLM is 5&#8211;10x faster. The service switches backends by environment variable; no code changes needed.</p><p>For burst production workloads, the same service can be pointed at a Modal-hosted endpoint &#8212; GPU spins up on demand, billing stops when idle.</p><div><hr></div><h2>Deep Dive: The Data Service (`svc`)</h2><p>The data service handles everything that touches training data: ingestion, synthetic generation, augmentation, curation, and evaluation. It's the most Celery-heavy service in the stack.</p><h3>The Synthetic Data Pipeline</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2IWu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2IWu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2IWu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2IWu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 424w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 848w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!2IWu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe127c2d8-d859-49b1-a441-b014f1fad661_2224x783.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The pipeline is async by design. A generation request comes in via the API, gets dispatched to a Celery worker, and the frontend polls for progress. The worker:</p><ol><li><p>Ingests source documents (PDFs, plain text, YouTube transcripts via the API)</p></li><li><p>Chunks the content into token-bounded windows</p></li><li><p>Calls the LLM gateway to generate pairs (QA, CoT, or summaries)</p></li><li><p>Streams completed pairs back to the dataset in real-time</p></li><li><p>Marks the task complete and updates the dataset snapshot</p></li></ol><p>The three generation modes matter architecturally, not just academically:</p><ul><li><p><strong>QA pairs</strong> are cheap and fast to generate, good for instruction-following baselines</p></li><li><p><strong>Chain of Thought</strong> pairs cost ~3x more tokens but produce dramatically better structured-output models &#8212; the model learns to reason before answering</p></li><li><p><strong>Summaries</strong> are useful when your inference task involves compression, not extraction</p></li></ul><h3>Evaluation</h3><p>The evaluation service can run multiple models against a snapshot in parallel and score them using either an LLM-as-judge rubric or a bring-your-own-evaluator endpoint. Results stream back per-datapoint, so you see the leaderboard filling in live rather than waiting for a batch job to complete.</p><div><hr></div><h2>Deep Dive: The Gateway</h2><p>Every LLM call in the entire system &#8212; synthetic data generation, evaluation, GRPO reward scoring, the model playground &#8212; goes through a single gateway layer. This is not optional architecture astronautics. It pays for itself immediately.</p><p><strong>What you get for free:</strong></p><ul><li><p><strong>Cost visibility</strong> &#8212; which service is spending what, on which model</p></li><li><p><strong>Routing</strong> &#8212; fall back from GPT-4o to Sonnet if one provider is down</p></li><li><p><strong>Caching</strong> &#8212; identical prompts during eval runs don't get billed twice</p></li><li><p><strong>Rate limit management</strong> &#8212; the gateway absorbs burst traffic so individual services don't need to implement retry logic</p></li></ul><p>The gateway is a thin sidecar &#8212; one Docker container, no business logic. Portkey, LiteLLM, or a self-hosted proxy all work. The key is that it's a <strong>hard architectural boundary</strong>: no service calls an LLM provider directly.</p><div><hr></div><h2>Deep Dive: The API and Frontend</h2><p>The <code>api</code> service is the single front door. It handles:</p><ul><li><p><strong>Auth</strong> &#8212; API key verification for all service-to-service calls</p></li><li><p><strong>Job dispatch</strong> &#8212; accepts training requests and forwards them to the finetuning service</p></li><li><p><strong>Model registry</strong> &#8212; tracks which checkpoints exist and their metadata</p></li><li><p><strong>Dataset management</strong> &#8212; versions snapshots, stores split configurations</p></li></ul><p>The frontend is SvelteKit. The key UI surfaces are: dataset view (upload, inspect, generate), experiment configurator (pick base model + hyperparameters), job monitor (live training logs), and the model playground (side-by-side completion comparison).</p><div><hr></div><h2>Local vs Production</h2><p>Running locally and running in production are architecturally identical &#8212; same six services, same HTTP contracts. The only differences are the backing implementations:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7Abx!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7Abx!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7Abx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7Abx!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7Abx!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd7c31af4-9378-4149-b64f-4fe43dc19ebf_2224x1058.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The local stack boots with a single <code>docker compose up</code>. Each service gets its own container, Redis runs as a sidecar, and Ollama handles inference. Switching to production means swapping environment variables &#8212; no code changes.</p><div><hr></div><h2>Key Design Decisions</h2><p><strong>Why not a monolith?</strong> A monolith works until you need to scale the finetuning service to GPU machines while keeping the API on cheap CPU containers. Service separation lets each component scale independently and fail independently.</p><p><strong>Why polling for job state instead of webhooks?</strong> Training jobs run for minutes to hours. Webhook delivery has a real failure surface over that timeframe. The frontend polls <code>/api/jobs/:id</code> every few seconds &#8212; simple, robust, no webhook infrastructure needed.</p><p><strong>Why Celery for data tasks but APScheduler for training jobs?</strong> Celery is optimised for many short tasks with a shared worker pool. Training jobs are one long task per GPU &#8212; APScheduler's interval-based polling and explicit state machine fits better.</p><p><strong>Why OpenAI-compatible inference endpoints?</strong> Every evaluation library, every agent framework, every LLM client already speaks this protocol. Wrapping your fine-tuned model in a compatible endpoint means zero integration cost downstream.</p><div><hr></div><h2>What Breaks at Scale</h2><p>A few things that look fine in development but become problems in production:</p><p><strong>Shared filesystem for checkpoints.</strong> Locally, all services can read the same path. In production, you need an object store (S3, Azure Blob) as the checkpoint backend, with the finetuning and inference services both mounting it. This is a one-line config change if you design for it upfront.</p><p><strong>Celery worker saturation.</strong> A single LLM call for synthetic data generation can take 10&#8211;30 seconds. If you have 10 concurrent generation tasks each firing 100 LLM calls, your worker pool fills up fast. Separate worker queues for fast tasks (evaluation) and slow tasks (generation) early.</p><p><strong>No rate limiting on the evaluation service.</strong> Running evals against all models in your registry against a large dataset can fire thousands of LLM calls in minutes. The gateway handles rate limits, but the evaluation service should have a concurrency cap per eval job.</p><div><hr></div><h2>SGLang vs vLLM: Which One Goes in the Inference Service?</h2><p>The studio's inference service is deliberately abstracted behind an environment variable. Locally it points at Ollama. In production, you pick an engine. And in 2026, that means choosing between vLLM and SGLang &#8212; two projects that started as competitors but have now specialised into different niches.</p><p>The 2024 narrative of "SGLang is 3&#215; faster" no longer holds. vLLM's V1 rewrite (January 2025) pulled the two within 3&#8211;5% on dense 70B models at typical concurrency. But diverge from that workload profile and the gap opens sharply in one direction or the other.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!nkCI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!nkCI!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!nkCI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!nkCI!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 424w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 848w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!nkCI!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff89d43ae-c75b-4dcd-a595-5bb14aac22a4_2224x724.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>The Core Architectural Difference</h3><p>The engines diverge fundamentally on how they store KV cache.</p><p><strong>vLLM uses PagedAttention</strong> &#8212; fixed-size physical blocks, hash-based per-block sharing, similar to OS memory paging. Predictable, well-understood, maps cleanly to most hardware.</p><p><strong>SGLang uses RadixAttention</strong> &#8212; the entire live KV cache is a compressed prefix trie. New requests walk the tree from root; matched nodes mean their KV state is already on-GPU. When multiple requests share a system prompt, a retrieved document, or a conversation history, RadixAttention shares that prefix automatically across tenants with no configuration. The cache-aware router biases routing toward whichever worker holds the warmest matching prefix.</p><p>This isn't a marginal implementation detail. On agent loops, RAG pipelines, and multi-turn chat &#8212; anywhere prefixes repeat across requests &#8212; RadixAttention is the correct data structure and PagedAttention is the wrong one. The papers report up to 6.4&#215; speedup on those workloads.</p><h3>Where Each Wins</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kQxa!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kQxa!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kQxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/bd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kQxa!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kQxa!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fbd0f0812-f1be-437f-981e-aedb0ecf3ff5_2064x943.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The MoE picture is stark. On GPT-OSS-120B with a tuned configuration, vLLM hit 6,095 tok/s versus SGLang's 1,814 &#8212; a 3.4&#215; lead. DeepSeek V3/R1 runs in the other direction: SGLang's 96-H100 production deployment matches DeepSeek's own infrastructure throughput at roughly $0.20 per million output tokens, about one-fifth the cost of the public API. The DeepSeek GitHub repo officially recommends SGLang.</p><p>vLLM's durable advantage is breadth. It runs on NVIDIA, AMD, Google TPU, AWS Inferentia, Intel Gaudi, Apple Silicon, and IBM Z mainframes. SGLang covers most of those but TPU support lags. If you might multi-cloud, vLLM is the safer choice. It also covers ~200 model architectures &#8212; encoder-decoder, Mamba, Whisper, T5 &#8212; that SGLang doesn't support at all.</p><h3>The Production Decision</h3><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SHQZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SHQZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SHQZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SHQZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SHQZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0565443b-1f73-4f52-80fb-95099dc15b9d_2064x1103.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The default for a generalist team is <strong>start with vLLM</strong>. Then add SGLang as a second inference backend behind the gateway for the specific workloads where it wins: prefix-heavy pipelines, DeepSeek/MoE serving, structured output at scale, or RL training rollouts.</p><p>Both engines expose identical <code>/v1/chat/completions</code> semantics. The application-layer switching cost is genuinely near zero. Run both, route by workload type, measure on real traffic before committing to one.</p><p>The deeper shift is that in 2026 you're not picking an inference engine &#8212; you're picking a <strong>stack</strong>. Dynamo or llm-d on top, vLLM or SGLang underneath, with the gateway routing between them. The orchestration layer is increasingly engine-agnostic. What remains different &#8212; RadixAttention's tree, vLLM's hardware breadth, SGLang's DeepSeek depth &#8212; is exactly what makes the choice still matter.</p><div><hr></div><p>The architecture is deliberately boring. Six services, HTTP, Postgres, Redis, one LLM gateway. The interesting engineering is in the details: the job state machine, the streaming data pipeline, the dual-backend inference layer. Those are where the real decisions live.</p><div><hr></div><p><em>If you're building something like this or have questions about any of the design choices, I'd love to hear from you.</em></p>]]></content:encoded></item><item><title><![CDATA[11 tools I built for myself]]></title><description><![CDATA[Small utilities that do one thing. Zero onboarding.]]></description><link>https://blog.ucalyptus.me/p/11-tools-i-built-for-myself</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/11-tools-i-built-for-myself</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 18:24:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!rGFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I keep a running list of small tools I've built for myself. Not products &#8212; just things I needed that didn't exist, or existed badly enough that building felt faster than tolerating.</p><p>Here's the current list, with what each one does and why I built it.</p><div><hr></div><h2>LinkNotes</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rGFo!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rGFo!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rGFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rGFo!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!rGFo!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1b2d7e10-ef58-4c6e-a81d-c27ff34f18b8_3024x1654.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> A private, password-protected link vault with a CRUD API. Each link has a title, URL, tags, source, and a notes field.</p><p><strong>Why I built it:</strong> Most bookmark managers optimize for quantity. Pocket, Raindrop, browser bookmarks &#8212; they're all great at storing links and terrible at storing <em>why</em> you saved them. Three months later you open a bookmark and have no idea what it was for.</p><p>LinkNotes forces a notes field. "Why does this matter?" is a required part of saving. The source field tracks where it came from. Tags let me filter by project or topic. And because it exposes a simple API, I can add links from scripts, automations, or other tools without touching the UI.</p><p>Built on Cloudflare Workers + D1. The whole thing is a single Worker &#8212; no external services, no database bills.</p><div><hr></div><h2>Tree Chat</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!kyXl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!kyXl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!kyXl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!kyXl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!kyXl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F344db4b7-546b-4bc9-ac75-de20fcb5e70e_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> A chat interface where conversations branch into trees. Use <code>/branch [prompt]</code> at any point to fork from the current message and explore a different direction, preserving the original thread.</p><p><strong>Why I built it:</strong> Standard chat UIs have a fundamental problem: they're linear. If you're exploring a problem with an LLM &#8212; comparing two architectural approaches, exploring different framings of a question, trying a prompt variation &#8212; you have to either scroll back and lose your place, or start a new conversation and lose the context.</p><p>A tree solves this. Each branch inherits the context up to the fork point. You can explore three different answers to the same question simultaneously, navigate back to any node, and keep the whole exploration in one place.</p><p>Built on Cloudflare Pages with a D1 database for conversation persistence. Each node stores its parent ID and the full message content, and the tree is reconstructed on load.</p><div><hr></div><h2>TabReplay</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!beA2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!beA2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!beA2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!beA2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!beA2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!beA2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!beA2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!beA2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!beA2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!beA2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9fa053d2-fcb6-449f-ae36-6e498231e943_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Upload a file of URLs, and TabReplay lets you review each one with a live preview and triage them &#8212; keep, read now, or discard.</p><p><strong>Why I built it:</strong> Tab hoarding is a real problem. I'll open 40 tabs in a research session and then not close Chrome for a week because I'm afraid of losing something. The real answer isn't better tab management &#8212; it's committing to a decision about each tab before closing.</p><p>TabReplay makes that decision-making fast. Export your tabs as a list of URLs (most browsers support this), drop the file in, and review each one with the actual page loaded alongside it. It uses Playwright under the hood to visit and render each URL, so you see what's actually on the page, not just the title.</p><div><hr></div><h2>HAR Vision</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!YGDR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!YGDR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!YGDR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!YGDR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!YGDR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F493ff7db-c75f-434d-86e9-4b97d75033c8_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Drop a <code>.har</code> file &#8212; your browser's network log &#8212; and get instant visual analysis, timeline view, and request filtering. Runs entirely in your browser; nothing leaves your machine.</p><p><strong>Why I built it:</strong> HAR files are invaluable for debugging, reverse engineering APIs, or understanding why a page is slow. But the raw JSON is unreadable, and the only decent viewer (Chrome DevTools) requires you to reproduce the session live. If you captured a HAR from a user's session or a CI test run, you're stuck parsing JSON by hand.</p><p>HAR Vision loads the file locally and gives you a filterable timeline &#8212; filter by domain, method, status code, or response time. Useful for finding the slow request in a 300-request page load, or spotting which API endpoints a web app calls.</p><div><hr></div><h2>ClipShot</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!V1w6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!V1w6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!V1w6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!V1w6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!V1w6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fce4a5911-da8c-4c0d-9751-9188493bba3d_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Drop a video file, scrub through a Premiere Pro-style thumbnail timeline, and capture frames at native resolution. Export individually or as a ZIP.</p><p><strong>Why I built it:</strong> I record a lot of screen recordings for documentation and demos. Extracting a good frame usually meant: open the video in QuickTime, scrub manually, take a screenshot, crop. ClipShot turns this into: drop the video, click the frame, done.</p><p>The whole pipeline runs in the browser using the HTML5 <code>&lt;video&gt;</code> element and Canvas API &#8212; no upload, no server, no waiting for processing. You capture exactly what you see in the timeline at native resolution, then export individual frames or bulk-download as a ZIP.</p><div><hr></div><h2>GridSplitter</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mq84!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mq84!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mq84!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mq84!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mq84!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mq84!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mq84!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!mq84!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!mq84!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!mq84!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F292bd432-e773-4e74-a045-f20f17545715_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Upload a sprite sheet, contact sheet, or photo grid &#8212; GridSplitter auto-detects the rows and columns using computer vision and splits it into individual files for download.</p><p><strong>Why I built it:</strong> I was building a small game prototype and needed to extract individual frames from sprite sheets I found online. Every existing tool either required you to manually specify the grid dimensions or was a desktop app. GridSplitter uses edge detection to find the grid structure automatically &#8212; you don't count rows or columns, you just upload and download.</p><p>The CV pipeline runs in the browser using a WASM port of OpenCV. Processing happens entirely client-side, so there's no file size limit and no privacy concern with uploading assets.</p><div><hr></div><h2>Healpix</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AGf_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AGf_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AGf_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AGf_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!AGf_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9e82ba0-d6a7-4533-b898-f19682fff841_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> A browser-based photo healing tool. Paint over blemishes, dust spots, watermarks, or unwanted objects with a brush &#8212; the inpainting engine fills them in seamlessly.</p><p><strong>Why I built it:</strong> The Photoshop healing brush is one of those tools that feels like magic the first time you use it. But Photoshop is $55/month and overkill if that's all you need. Healpix does one thing &#8212; heal &#8212; using the same underlying inpainting algorithms (Telea and Navier-Stokes), in the browser, with no install.</p><p>Paint the region you want removed, release the mouse, and the engine fills it in. Adjustable brush size, full undo/redo history, and export when done.</p><div><hr></div><h2>Watermark Remover</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hf9O!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hf9O!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hf9O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hf9O!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!hf9O!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F497485b7-925a-4fd3-b947-a99de6784522_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Upload a PDF, set of images, or MP4 video &#8212; configure the watermark region &#8212; and it strips the watermark using CV inpainting. Processes PDF pages in batch and video frame-by-frame.</p><p><strong>Why I built it:</strong> Most watermark remover tools online are either broken, behind a paywall, or cap file sizes aggressively. This one uses the same Telea and Navier-Stokes inpainting algorithms that Healpix uses, but targets a fixed region across many frames &#8212; which is exactly how most watermarks work (static, bottom-right corner).</p><p>The video processing pipeline extracts frames at 1fps, inpaints each frame, then re-encodes the video with ffmpeg. Slow, but it works on arbitrarily long videos.</p><div><hr></div><h2>Swap Fitter</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fAI_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fAI_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fAI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fAI_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!fAI_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd34fb482-ac47-4241-85bb-706e4a020ce2_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Two-step image compositing. Upload a base image, draw a selection over the region you want to replace, then upload a second image to fill that region. Runs client-side.</p><p><strong>Why I built it:</strong> I kept needing to answer "what would this look like if..." questions &#8212; swapping a UI component in a screenshot, replacing a label on a product photo, mocking up a design variant. Opening Figma or Photoshop for a two-minute task is too much friction. Swap Fitter is just the compositing step, nothing else.</p><p>Step 1: upload base image, draw selection. Step 2: upload replacement, which gets scaled to fit the selection. Export. Done.</p><div><hr></div><h2>AutoCarousel</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Oewf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Oewf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Oewf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Oewf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Oewf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F684fffc4-479a-4797-8ffd-55b4fbe2b99a_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Upload images in order, set the slide duration (how long each image stays on screen), and export as an MP4 video file.</p><p><strong>Why I built it:</strong> Social media carousels &#8212; especially on LinkedIn and Instagram &#8212; get more reach than static images. But converting a set of slides or screenshots into a video carousel usually involves either a design tool with video export (slow) or a social media scheduler (locked in). AutoCarousel just turns images into a video, no account required.</p><p>Upload your images in order, drag to reorder if needed, set 3&#8211;5 seconds per slide, and export. That's it.</p><div><hr></div><h2>Video Looper</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GkZq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GkZq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GkZq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GkZq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GkZq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8c6df364-6597-4178-9f4a-05d654f8ac30_3024x1654.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>What it is:</strong> Drop a short video clip and an audio track &#8212; the tool loops the video to exactly match the audio's duration and exports a single combined file.</p><p><strong>Why I built it:</strong> A common content format is a looping visual over a music track &#8212; ambience videos, demo backgrounds, lo-fi aesthetic clips. The problem: your video might be 4 seconds and your audio 3 minutes. Manually looping in a video editor means dragging the same clip 45 times.</p><p>Video Looper calculates exactly how many loops are needed, handles the partial last loop cleanly, and muxes the audio track in. Upload both files, click create, done.</p><div><hr></div><p>The pattern I've noticed across all of these: the tools I actually reach for are the ones that do exactly one thing and require zero configuration. No accounts, no settings screens, no onboarding. You understand them in ten seconds or you close the tab.</p><p>That's the design constraint I try to build to. Most of these took a few hours each. The hard part was figuring out what to leave out.</p>]]></content:encoded></item><item><title><![CDATA[I made my Claude agent unkillable on macOS]]></title><description><![CDATA[Three layers of self-healing for a Telegram bot that runs forever]]></description><link>https://blog.ucalyptus.me/p/i-made-my-claude-agent-unkillable</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/i-made-my-claude-agent-unkillable</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 18:01:55 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8cmE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I wanted Claude on my phone. Not a wrapper, not a web UI &#8212; the actual Claude Code agent, with all my tools and context, reachable over Telegram.</p><p>Claude Code has a Telegram plugin for exactly this. You run <code>claude --channels plugin:telegram@claude-plugins-official</code>, pair your account, and suddenly you can message Claude from anywhere.</p><p>The problem: it kept dying.</p><p>Crash it once, it's gone. Reboot your Mac, it's gone. And since it runs as a foreground process, the moment you close the terminal, it's gone.</p><p>So I built a watchdog.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8cmE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8cmE!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8cmE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/df448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8cmE!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 424w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 848w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!8cmE!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdf448361-07d9-4933-820d-3334800148a7_3024x1542.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>The constraint that changes everything</h2><p>Claude Code is a TUI app &#8212; it needs a real terminal (a TTY) to run. You can't just start it as a background daemon directly under launchd. It fails silently.</p><p>The fix: tmux. Run Claude inside a tmux session, and launchd manages the watchdog that watches that session.</p><pre><code>launchd &#8594; watchdog.sh &#8594; tmux session &#8594; claude --channels</code></pre><p>This is the core architecture. Everything else is consequence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!3UXO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!3UXO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!3UXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!3UXO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 424w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 848w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!3UXO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3fa812a7-56bc-4787-a6c3-fa11b1ecf604_1097x548.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Three layers of self-healing</h2><p>I didn't want to ever think about this again, so I designed for complete autonomy:</p><p><strong>Layer 1 &#8212; Process</strong>: <code>watchdog.sh</code> loops every 60 seconds and runs three checks:</p><ol><li><p>Is the tmux session alive?</p></li><li><p>Is the Claude PID alive inside it?</p></li><li><p>Does the Telegram Bot API respond to a <code>getMe</code> call?</p></li></ol><p>If any check fails, it kills the stale session and relaunches.</p><p><strong>Layer 2 &#8212; Watchdog</strong>: The watchdog itself runs under launchd with <code>KeepAlive: true</code>. If <code>watchdog.sh</code> crashes or exits for any reason, launchd restarts it within ~10 seconds.</p><p><strong>Layer 3 &#8212; Machine</strong>: The launchd agent has <code>RunAtLoad: true</code>. Reboot your Mac, log in, and within 60 seconds Claude is back on Telegram. No manual steps.</p><p>The only failures that still require human intervention: OAuth token expiry and bot token revocation. Everything else recovers automatically in 10&#8211;120 seconds.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sV_E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sV_E!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sV_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg" width="728" height="409.5" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;captionedImage&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sV_E!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 424w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 848w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!sV_E!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa17b027a-1056-4b47-afe4-ad82dab7ec9e_929x475.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div><hr></div><h2>Two bugs I hit that weren't obvious</h2><p><strong>Bug 1: Don't use `getUpdates` as your health check.</strong><code>getUpdates</code> as your health check.**</p><p>My first instinct for the Telegram health check was to call <code>getUpdates</code> &#8212; if it returns updates, the bot is alive. Seems reasonable.</p><p>Wrong. Telegram only allows <em>one</em> long-poll consumer per bot token. The Claude plugin already holds that connection. When my watchdog called <code>getUpdates</code>, Telegram rejected it with a conflict error &#8212; and worse, it kicked the plugin's own connection, causing exactly the failure I was trying to detect.</p><p>The fix: <code>getMe</code> instead. It's a read-only endpoint that doesn't compete with the plugin's polling loop.</p><p><strong>Bug 2: `pgrep` is dangerous when you're running multiple bots.</strong><code>pgrep</code> is dangerous when you're running multiple bots.**</p><p>The original version cleaned up stale processes with:</p><pre><code>pkill -f "claude.*channels.*telegram"</code></pre><p>This killed every Claude channels process on the machine &#8212; including a completely separate bot (<code>cc-sdas</code>) that had nothing to do with this watchdog.</p><p>The fix: track only your own PID. The launcher writes its process ID to a file. The watchdog reads that file and only ever kills that specific PID. No pattern matching, no collateral damage.</p><div><hr></div><h2>Exponential backoff</h2><p>If something is fundamentally broken &#8212; say, OAuth expired &#8212; you don't want the watchdog hammering restarts every 60 seconds forever. So failures back off:</p><ul><li><p>1st failure &#8594; restart, wait 30s</p></li><li><p>2nd failure &#8594; restart, wait 60s</p></li><li><p>3rd+ failure &#8594; restart, wait 120s (capped)</p></li><li><p>On recovery &#8594; reset to normal interval</p></li></ul><p>This prevents restart storms while still recovering reasonably fast when the underlying issue is fixed.</p><div><hr></div><h2>What I'd do differently</h2><p>One thing I didn't solve: Claude shows a "Do you trust this project?" prompt on first launch in a new tmux session, because the working directory defaults to <code>/</code>. It waits for Enter &#8212; silently blocking startup. The workaround is <code>tmux send-keys -t claude-telegram Enter</code>, but the real fix is launching from <code>~</code> or pre-trusting the directory.</p><p>Also worth noting: this whole setup assumes your Mac stays on and logged in. The battery-dying scenario is genuinely unrecoverable without physical intervention. If you want true 24/7 availability, you'd need a VPS instead of a local Mac. For my use case (it's my own machine), this is good enough.</p><div><hr></div><h2>The repo</h2><p>Everything is at <a href="https://github.com/ucalyptus/claude-tmux-watchdog">github.com/ucalyptus/claude-tmux-watchdog</a> &#8212; the launcher, watchdog, health check script, and launchd plist. Should work on any Mac running Claude Code with the Telegram plugin.</p><p>If you're running Claude Code and want it reachable from your phone without babysitting it, this is the setup.</p>]]></content:encoded></item><item><title><![CDATA[The legal system wasn't built for you. Sulajh is.]]></title><description><![CDATA[AI-powered dispute resolution for the disputes that never make it to court.]]></description><link>https://blog.ucalyptus.me/p/the-legal-system-wasnt-built-for</link><guid isPermaLink="false">https://blog.ucalyptus.me/p/the-legal-system-wasnt-built-for</guid><dc:creator><![CDATA[Sayantan Das]]></dc:creator><pubDate>Sun, 17 May 2026 17:03:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!GLeS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The legal system was built for a world that doesn't exist anymore.</p><p>A world where you had months to spare. A world where hiring a lawyer for a &#8377;50,000 dispute made economic sense. A world where physically showing up to a courthouse was the only way to be heard.</p><p>That world is gone. But the system remains.</p><div><hr></div><h2>The math doesn't work</h2><p>Here's the absurdity: in India, the average civil case takes <strong>3&#8211;4 years</strong> to resolve. The average cost of litigation &#8212; factoring in lawyer fees, court fees, and lost time &#8212; often <strong>exceeds the disputed amount itself</strong>.</p><p>Which means most people don't pursue it. They eat the loss, swallow the injustice, and move on.</p><p>This isn't a gap in the system. It's the system working exactly as designed &#8212; for people with resources, not for everyone.</p><div><hr></div><h2>What if disputes just... resolved?</h2><p>That's the question behind <strong>Sulajh</strong> (Hindi: &#2360;&#2369;&#2354;&#2333; &#8212; to untangle, to resolve).</p><p>Sulajh is an online dispute resolution platform. You file a claim, the other party responds, a neutral reviews both sides with AI assistance, and you reach a settlement &#8212; all online, often in days.</p><a class="image-link image2 is-viewable-img" target="_blank" href="https://sulajh.ucalyptus.me" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GLeS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GLeS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg" width="728" height="455" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:false,&quot;imageSize&quot;:&quot;normal&quot;,&quot;height&quot;:1800,&quot;width&quot;:2880,&quot;resizeWidth&quot;:728,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sulajh &#8212; online dispute resolution platform&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:&quot;https://sulajh.ucalyptus.me&quot;,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sulajh &#8212; online dispute resolution platform" title="Sulajh &#8212; online dispute resolution platform" srcset="https://substackcdn.com/image/fetch/$s_!GLeS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 424w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 848w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!GLeS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4a7f22dc-341e-44e0-8204-db6ff0dcf37e_2880x1800.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><p>No courtrooms. No travel. No months of waiting.</p><p>The process has four steps:</p><ol><li><p><strong>File a Claim</strong> &#8212; describe the dispute, upload your evidence</p></li><li><p><strong>Respondent is Notified</strong> &#8212; the other party gets a chance to respond</p></li><li><p><strong>Mediation &amp; Review</strong> &#8212; a case manager and a neutral review both sides; AI flags key points and identifies common ground</p></li><li><p><strong>Resolution</strong> &#8212; a fair settlement or a binding decision</p></li></ol><p>That's it.</p><div><hr></div><h2>What makes AI useful here</h2><p>Dispute resolution is fundamentally a reasoning problem. You have two versions of events, a set of documents, and a question: what's fair?</p><p>AI doesn't decide that. But it does something valuable: it reads everything. It doesn't miss the clause buried on page 7. It doesn't favor the party who talks more confidently. It doesn't get tired.</p><p>The AI in Sulajh acts as a research layer for the neutral &#8212; surfacing relevant precedents, summarizing documents, flagging inconsistencies, and identifying where the parties actually agree (which is usually more than either side admits).</p><p>The human neutral still makes the call. But they make it with better information, faster.</p><div><hr></div><h2>Who is this for?</h2><ul><li><p>A freelancer chasing an unpaid invoice</p></li><li><p>A tenant disputing a wrongful deduction from their security deposit</p></li><li><p>A small business with a supplier who delivered defective goods</p></li><li><p>A consumer whose warranty claim was rejected without reason</p></li></ul><p>These are disputes that never reach court &#8212; not because they're not valid, but because the economics don't make sense. Sulajh makes them resolvable.</p><div><hr></div><h2>The broader picture</h2><p>Online dispute resolution isn't new. eBay was quietly resolving 60 million disputes a year through automated mediation long before anyone was calling it ODR. PayPal, Airbnb, Uber &#8212; every platform-scale company built informal dispute systems because the alternative was chaos.</p><p>What's new is making this available outside of platforms. For disputes that happen in the real world, between real people, without a platform intermediary to absorb the coordination cost.</p><p>That's the hard part. And that's what Sulajh is trying to solve.</p><div><hr></div><p>The legal system will not reform itself fast enough. But technology can route around it &#8212; not by replacing justice, but by making it accessible.</p><p>Sulajh is early. But the direction is right.</p><div><hr></div><p><em>If you're dealing with a dispute &#8212; commercial, consumer, or otherwise &#8212; try Sulajh. Or just reply to this email. I'm curious what kinds of disputes feel most underserved to you. <a href="https://sulajh.ucalyptus.me">Sulajh</a></em>. Or just reply to this email. I'm curious what kinds of disputes feel most underserved to you.</p>]]></content:encoded></item></channel></rss>