Where to Meet | Ben Edelstein

All proposals for new agent protocols, tools, and patterns answer some form of the same question — should we meet models where they're at, or should they meet us?

Debates about whether LLMs are truly "intelligent" are interesting, but outside the scope of this post. I find them distracting from the observation that at the interface level — the purely behaviorist inputs and outputs — their capabilities are converging on those of humans. So for the most part, I don't think LLMs need special guidance or treatment, and heavy-handed attempts to do so seem short-sighted. As models keep improving, human tools should become sufficient for LLMs too.

The programmer's desire for premature optimization is evergreen. Joel Spolsky wrote in Strategy Letter VI,

As a programmer, thanks to plummeting memory prices, and CPU speeds doubling every year, you had a choice. You could spend six months rewriting your inner loops in Assembler, or take six months off to play drums in a rock and roll band, and in either case, your program would run faster.

We too can spend six months building the perfect agentic pattern, or we can simply wait for the new models. That's not to say LLMs are infinitely capable, though. They still have a fundamental constraint — the context window — and solutions that make more efficient use of context are likely worthwhile.

Even brief experimentation with agents reveals a remarkably clear and fractally detailed frontier of problems to solve. There are several categories of problems that I can think of (many of which have overlap):

Tool calling (e.g. MCP, Skills)
Context efficiency — preserving context to avoid hitting limits (e.g. compaction, tool search, subagents)
Preprocessing, async data transformation - out-of-band processing and synthesis of information to make it more accessible to LLMs when requested at inference time. (e.g. llms.txt, claude.md)
RAG - looking up external information (e.g. chunking, embedding, grep)
Prompting (e.g. xml tags in prompts)
Orchestration — coordinating multiple agents at once for more complex projects (e.g. subagents, Gas Town, swarms)
Standardization, commoditization — reusable specs to make swapping out models easy (e.g. ai-sdk, Openrouter, Opencode)

These are all valid and important problems, but the solutions that have emerged are immature and rapidly evolving. You can slice and dice and squeeze and filter the context a million ways; it's easy to try a new one (and you should, why not?). It'll take a while for best practices to materialize. Part of the reason for this is that evals are tedious to run — nobody wants to rigorously evaluate the optimal strategy when whatever you try will probably work pretty well, because the underlying models are smart enough to compensate.

Even so, I expect several competing approaches to persist long-term, just as there are dozens of database architecture patterns, each with their own tradeoffs.

I recently came across this protocol, llms.txt, which illustrates these tradeoffs well.

llms.txt presents a more context-efficient way for LLMs to read websites. HTML is full of irrelevant layout and script data, whereas markdown is easy to ingest. And site information might be scattered on multiple pages, but llms.txt can synthesize it into one root file.

This protocol is designed to meet LLMs where they're at — treating them as distinct from human visitors and requiring extra work from developers to accommodate them. But how about a lazy alternative — what if the LLM screenshots the site, ingesting it visually just like a human would? This might even be more token efficient than markdown, and requires zero developer adoption. Or what about reader mode?

It's hard to say how website design will evolve as LLMs become the primary consumers of web content. You could argue that humans and LLMs are truly different in this respect — humans fall for pretty visuals, LLMs are purely rational text ingestors. Maybe sites like https://www.fastht.ml truly need a flashy homepage to entice human developers to use their shiny new framework, while LLMs quietly evaluate the boring technical specs. Or maybe website best practices simply shift to clearer, denser layouts for these kinds of products. I don't know if llms.txt is the answer.

In several instances, I've noticed the popular approach shift over time from bespoke agent abstractions to simple, universal solutions. This blog speaks about the emergence of the filesystem as the preferred interface for agents. In retrospect, that's...kind of obvious? The filesystem is the computer's basic interface, and LLMs can navigate it just fine with a bash terminal.

Merchants of Complexity

What really bothers me, though, is the far end of the spectrum — which often tends to attract the most attention. I'm skeptical of any elaborate pattern, especially if it's shrouded in jargon and hype. Complexity isn't inherently bad, but it must be earned, not conjured up ex nihilo. The 10-page LinkedIn prompsters, elaborate RAG setup Redditors, and perfect-Claude-plugin cum memecoin shills throw up immediate red flags in my head. There might be valuable alpha in their solutions, but it's hard to discern which parts are relevant and which are superfluous.

Most of us are too undiscerning in accepting these tools. I can't quite explain it, but there's a tempting impulse to "feel the AGI" and simply acquiesce to this wave of prolific, baroque maximalism. What we end up with is merchants of complexity spilling their neuroses onto the market.

DHH coined the term "merchants of complexity", but he attributes too much to malice and not enough to stupidity. The creators of these tools aren't simply greedy charlatans, they're just as enamored with their contraptions as their users are.

It sort of reminds me of the hyper-productivity optimizers — Atomic Habits, Roam Research and Obsidian, the color-coded notes and Quizlet girlies, etc. These same people are probably now spending days optimizing their Claude Code setup. (But to what end? Hyper-optimization often becomes a self-consuming tarpit.)

It also reminds me of the output of LLMs themselves, i.e. "LLM-speak" — responses that seem correct and well-articulated at first glance, but are actually full of redundancy, overconfidence, and logical holes. It's hard to look at a 10 paragraph AI-generated plan and discern what information, specifically, is warranted and what isn't. It's easier to just hit "accept and auto-approve edits". Maybe this isn't a coincidence. Perhaps they're deferring their thinking to AI and we're just seeing the bloated results.