Cheng Tan

Skin in the Game

2026-06-28T09:00:00-04:00

I want to ask a narrow, uncomfortable question about my own field. In computer systems research, in the AI era we can already see coming, what is actually valuable? Not pleasant, not publishable—valuable. What does a human still contribute that the world should pay for? The discomfort is that every answer I was raised on has quietly stopped being true.

Everything we valued got cheap

For thirty years my field had a clear hierarchy of proof, and AI is eating it from the bottom up.

We valued code. Talk is cheap; show me the code was the whole creed—code was the honest signal you couldn’t fake. A model writes a plausible implementation now while you read the prompt back.

We valued built systems, the heavy artifacts that took a team a year. The scaffolding, the glue, the second system once you know the shape—increasingly something you supervise rather than write.

We valued the beautifully written paper, because clear prose was scarce and stood in for clear thought. I can generate ten fluent framings of any idea before lunch, and so can you.

And here’s the one that stings, the one people in my field still flinch from: we valued ideas that work. The clever mechanism nobody else had. A model can propose a hundred mechanisms and try them in parallel while you sleep—enough of them good, once sieved by experiment, that “I had the idea” is no longer the high ground it was. Idea generation is becoming search, and search is a machine’s home turf.

The pattern is the whole essay: everything that can be generated has collapsed in value. So if anything human is still valuable, it cannot be a thing you generate.

The one thing the machine cannot do

There is exactly one move in this game that AI structurally cannot make. It cannot put anything at stake.

A model will assert that your proof is correct, and assert the opposite a moment later, and pay nothing either way. No reputation to lose, no name attached, no future in which today’s wrong claim costs it. This is not a weakness to be patched in the next version; it is what the thing is—an engine for producing statements at zero cost. It is the purest possible source of what game theory bluntly calls cheap talk: communication that’s free to emit and therefore, on its own, carries no information. A thousand confident AI answers don’t add up to one trustworthy one.

Which points straight at the scarce thing. A signal carries information in proportion to what it costs to send. The opposite of cheap talk is a costly signal—a claim you pay for if you’re wrong—and it is informative precisely because you wouldn’t have sent it otherwise. When a researcher stakes their reputation on this approach is the one, the spending is the message. Strip the cost away and you’re back to noise.

So the valuable unit of human work, in a world of cheap generation, is the staked claim: a judgment committed to publicly that you will pay for. I need a word for it. In an earlier post I called it “thoughts”—code is cheap; show me the thoughts. Too soft; a thought is free, and you can have a brilliant one and risk nothing. The thing I mean is a thought with a price on being wrong. The closest word I have is commitment: you commit in both senses—you state it, and you bind yourself to it.

The uncomfortable part

Let me admit what this implies, because it’s controversial and I’d rather say it than smuggle it.

The scarce human contribution is shifting from intelligence to accountability. Not how smart you are, not how much you produce—AI now wins both—but whether you’ll stand behind a specific claim and absorb the cost when it’s wrong. The researcher starts to look less like an author and more like an underwriter: someone who looks at a thousand machine-generated candidates and stakes their name on which one is real. I was trained to distrust exactly that person—the one who builds less and asserts more—and I’m no longer sure the training was right.

It sounds like it rewards loud, confident people over quiet, careful ones. Here’s why I don’t think it does, though I hold this loosely: confidence without cost is just more cheap talk, and the world learns to discount it fast. The filter isn’t volume; it’s paying. What survives is a calibrated track record—claims you staked that came true, over years. Being right in private earns you nothing; being loud while wrong eventually bankrupts you. Only being right on the record compounds.

What I’m left with

The old creed was talk is cheap; show me the code—a demand for a costly signal in an age when code was the costly thing. Code is cheap now. Words are cheap, designs are cheap, even ideas are getting cheap. The creed has to be rewritten for an age when generation costs nothing:

Don’t tell me what’s true. Tell me what you’ll stake on it, and what it costs you if you’re wrong.

That’s the last expensive thing. It might be the only one we still get to own.

The Three Pillars of Research

2026-06-21T12:00:00-04:00

Every craft worth learning, everywhere, for most of human history, has been taught the same way: a beginner stands next to a master and does the work until the work becomes theirs. Blacksmiths, surgeons, painters, chefs, violinists. We have had books for millennia and universities for centuries, and still, when the skill actually matters, we do not hand someone a manual—we put them next to someone who already has it. There is a Tang-dynasty line about this, from Han Yu, that I keep coming back to: 师者，传道授业解惑也. A teacher is one who transmits the Way, imparts the skills, and resolves doubts. Three jobs, written down in the year 802, and they are still the three jobs.

Why an apprenticeship? Because the part that matters can’t be written down. The knowledge is tacit: we always know more than we can put into words. You can read every paper on how to ride a bicycle and you will still fall off the bicycle. The knowledge that makes an expert lives mostly below language—in the hands, in the judgment, in the thousand small calls that no one ever wrote into a manual because no one could. The only known way to transfer it is proximity. You watch someone do it, you try, they correct you, you try again.

A PhD is the last apprenticeship most people will ever do, and the data backs the framing: roughly half of people who start a PhD don’t finish, and a leading factor in who finishes isn’t intelligence or undergraduate pedigree—it’s the advising relationship. The research on doctoral attrition points the same way: what happens to students after they arrive matters more for finishing than the qualifications they brought with them. And what happens after they arrive is, mostly, an apprenticeship that either works or doesn’t.

So here is a fair question for anyone in that position, and for me as someone on the other side of it: what, exactly, is supposed to get transmitted? If I am the master in this arrangement, what is the Way? I have spent a while trying to name it honestly, and I keep landing on three things.

The three pillars

Taste. Knowing which problems are worth your life. Not whether you can solve something—whether you should. Which of a hundred directions is worth a year, which clean-looking result is actually an artifact, when to kill your own favorite idea. Taste is the part people assume is innate and mystical. It isn’t. It’s learnable, and most of research failure is taste failure—brilliant execution pointed at a question nobody needed answered.

Execution. Turning a good idea into a real thing that works and that you can trust. For us in computer systems this means building: prototyping fast, diving into a million lines of someone else’s code, understanding the layers beneath you well enough that you’re not fooled by them, and—the unglamorous core—finishing before the deadline. An idea you can’t build is a wish.

Communication. Getting the thing out of your head and into other heads. Writing the paper, giving the talk, and the quieter daily kind: telling your collaborators the truth, early, especially when it’s bad. Research that nobody can understand or trust may as well not exist—the best idea in the world is worthless if it stays in your head.

That’s the framework. Three pillars. I’m aware that “three pillars” is the kind of phrase that should make you suspicious—every consultant has three of something. I won’t claim this carves reality perfectly at the joints; you can name things a researcher does that sit awkwardly across the lines (raising money, managing people, surviving the politics). But I think the three are non-arbitrary, for a reason worth stating up front, because it’s the spine of everything that follows.

Why these three: the work is modeling three things that aren’t you

Here is the claim. Research is the discipline of building accurate models of three systems, none of which is you, and the three pillars are just what it takes to model each one.

Taste is modeling the field—understanding it well enough to know what it actually needs, which problems are real, which results matter. Execution is modeling the machine—understanding the system and the hardware beneath you well enough that you can make them do what you intend. Communication is modeling other minds—understanding the people in your audience well enough to move an idea from your head into theirs. Decide what to work on, make it, move it between minds: each is an act of getting inside something that isn’t you and being right about it. That isn’t a slide with three boxes. It’s the same hard skill—accurate empathy for a system—pointed at three different targets. Hold onto that; the last post pays it off.

It also explains the failure modes, which is the practical test. Because the three are separable (you can be strong at one and weak at another), you can diagnose a stuck student by which model is broken. A student with taste and communication but no execution gives beautiful talks about systems that don’t run—their model of the machine is wrong. One with execution and communication but no taste ships flawless solutions to problems nobody has—their model of the field is wrong. One with taste and execution but no communication does excellent work that dies on their hard drive—they never modeled the reader at all. I have advised all three. Different broken model, different fix.

And there’s a happy coincidence I can’t resist. The three line up with Han Yu’s three verbs: 传道, transmit the Way, is taste—the sense of what’s worth doing; 授业, impart the skills, is execution—the craft of making the thing; 解惑, resolve doubts, is communication—the dialogue that only happens in language. A line from the year 802 and a framework I arrived at by watching students struggle in 2026 landing in the same place isn’t proof of anything. Old ideas about teaching and new ideas about research can rhyme by accident. But it did make me like the decomposition more.

One word on why taste comes first, since it’s the pillar people most doubt can be taught. Think of food. A good palate isn’t only a filter that keeps you from wasting a meal on something bad—it’s also what lets you enjoy eating, because you can taste the difference between fine and extraordinary. Research taste is the same: not just a filter against bad problems, but the thing that lets you enjoy the work, because you can feel the gap between a result that’s merely publishable and one that’s beautiful. People with taste have more fun, and since research is long, the fun is most of what keeps you in it.

The good news, which I’ll spend the next three posts on, is that all three pillars are trainable. Not equally easily—taste is the slowest and the most apprenticeship-shaped, execution rewards raw hours the most, communication is the most teachable and the most neglected. But none of them is a gift you either have or don’t. I’ve watched all three grow in people, including in myself, and the growth follows patterns regular enough to write down.

So that’s what I’m going to try to do here. One post per pillar:

Taste—how to judge work, and how to generate ideas worth judging.
Execution—coding vs. programming vs. hacking, systems, and the boring discipline that beats talent.
Communication—writing, speaking, and telling the truth to the people you work with.

The honest disclaimer up front: this is one systems researcher’s view, shaped by my own field and my own mistakes. Take what’s useful. But I’ve taught these three things for years now, mostly by proximity, the way they’ve always been taught—and writing them down is my attempt to make the apprenticeship a little less tacit. To tell a bit more than I thought I could.

Good Ideas Are Rare; Taste Is the Search

2026-06-21T11:00:00-04:00

This is the first of three posts on what I think research is made of. The overview argued for three pillars—taste, execution, communication. This one is about taste, the slipperiest of the three, the one people most want to believe is innate magic. It isn’t magic. It’s a search.

Here is the fact the whole post stands on: good ideas are rare. Not scarce-ish— rare, the far tail of an enormous space of things you could think, almost all of which are wrong, or boring, or already done. Research is the work of finding the rare good ones in that space. And taste is the search procedure: the thing that tells you where in the space to look, and the thing that tells you, once you’ve grabbed something, whether it’s actually any good. Those two moves have names— generating and judging—and your taste is just how well you make them.

So taste is the difference between searching that huge space efficiently and wandering it at random. The encouraging part, which people resist, is that the search is learnable. “Taste is just subjective” is a comfortable thing to say because it ends the argument—but it isn’t true, and the proof is in your own history. Look at work you loved five years ago and wince at now—your taste didn’t merely change, it got better, and you know it got better, which means there was a better to move toward. A skill with a direction like that can be trained. You just walk the direction, deliberately, for years, and your search gets sharper.

I’ll take the two moves in turn, starting with judging, because you can’t generate toward a target you can’t yet recognize.

Judging: the three questions

When a reviewer reads your paper, or a program committee decides your fate, or I read a draft from a student, the evaluation collapses—almost always—into three questions:

What’s new? (novelty)
Who cares? (importance)
Why now? (timeliness)

The first two aren’t my invention; they’re the questions every program committee and grant panel already asks, in one phrasing or another—what is new here, and if it works, who benefits. (The funding agencies dress them up as “intellectual merit” and “broader impacts,” but it’s the same two nerves.) The third, “why now?”, I add myself, because I’ve come to think timeliness is a real and separate nerve. Different words, same handful of nerves.

The fastest way to feel how load-bearing each question is: knock one out and watch the work collapse.

No “what’s new.” Suppose I build a free office suite that is 100% compatible with Microsoft Office. Who cares? Millions of people. Why now? It saves them money. Both great answers. But what’s new? Nothing—it’s a clone. So it’s a fine product and it is not research. Novelty isn’t optional; it’s the thing that makes it research at all.

No “who cares.” Suppose I invent a genuinely clever new data structure—never been done, provably elegant—that speeds up an operation no real system performs. Novel: yes. Timely: sure. But nobody cares, so it’s a puzzle, not a contribution. This is the most common failure mode among technically strong students, and the hardest to feel from the inside, because cleverness is so satisfying that it masquerades as importance.

No “why now.” Suppose I propose something both novel and important—say, a beautiful scheme that needs hardware nobody will have for twenty years, or that re-solves a problem the field already moved past. The honest reaction is “interesting, but not yet” or “interesting, but too late.” Timing is a real axis. The same idea is a triumph in 2015 and a footnote in 2005.

Hold onto the structure: a contribution has to survive all three questions, and most weak work dies on exactly one. When you read a paper, find the one it’s weakest on. When you write one, defend all three before you defend anything else.

The two-of-three rule

There’s a related heuristic I use for the shape of a strong systems paper. Good work tends to have three possible sources of merit:

a hard, real problem,
a novel idea, and
a substantial implementation and/or thorough evaluation.

You rarely get all three at a publishable level, and you almost never need all three. You need two. The combinations are each a recognizable species of paper:

Problem + idea (light on implementation): a clever, lightweight idea on an important problem—the kind of paper that’s mostly insight, a few pages, and changes how people think.
Problem + implementation (idea is straightforward): the heavyweight paper. The idea is “obvious in hindsight,” but making it actually work—at scale, for real—is a year of hard engineering and careful measurement. Building a real OS kernel in a memory-safe language is this kind of paper.
Idea + implementation (problem is niche): a sophisticated, well-built attack on a smaller problem. Narrower, but a pleasure, because both the thinking and the making are excellent.

This isn’t a law—it’s folklore, and the honest ancestor of it is Levin and Redell’s 1983 note on how to write a good systems paper, where they insist a paper must contain “at least one new idea” and then ask the author what was actually built and what was actually learned: “If you didn’t learn anything, it is a reasonable bet that your readers won’t either.” The two-of-three rule is just a compression of that. Use it as a checklist on your own work: if you can’t honestly name two of the three, you don’t have a paper yet—you have a start.

One more sharpening, on what “contribution” means, because students get this backwards. Lines of code are not contribution. Consider two results: I write 10,000 lines of C++ to make a system 10× faster, or I write 100 lines to make it 10% faster, for every application that will ever run on it. The second is often the bigger contribution. Effort is an input; impact is the output; taste is knowing they’re not the same number.

Generating: more shots, thrown well

Now the harder move. Judging is comparatively easy—you can learn it by reading a hundred papers with someone who has taste, which is most of what a reading group is for. Generating is the part that feels like magic, and the search frame is what dissolves the magic. Remember the setup: a vast space, the good ideas vanishingly rare in its tail. Two consequences for how you should actually work fall straight out of it.

The first is just honesty about the odds. You will generate far more bad ideas than good ones, no matter how good you get—“good” is the tail of a distribution; that’s what the word means. Producing duds isn’t a sign you’re failing. It’s the structure of the problem, and it never goes away.

The second is the strategy that follows: if hits are rare, take more shots. The folk version is the old line about having a lot of ideas and throwing the bad ones away. There’s a research-flavored version too—the claim that a creator’s number of great works tends to track their total output, as if the hit rate were roughly constant. It’s a contested claim, and shadowed by survivorship bias, since we mostly count the people whose volume did pay off and never see the prolific producers of pure noise. So don’t read it as a law. Read it as permission: generating a lot of bad ideas and discarding them fast is normal practice, not failure, and the researchers with the best ideas are often the ones having the most.

But raw quantity is a dumb search—spraying the space at random, which no good researcher actually does. A good search is directed: you spend your shots where they’re likely to land. And directed search, however you run it, turns on one tradeoff you already know by name.

Exploration vs. exploitation

It’s the explore/exploit tradeoff: do you mine the promising vein you’ve already found (exploit), or wander off to look for a better one (explore)? Exploit too hard and you polish a local hill forever, publishing increments while the real mountain sits one valley over. Explore too hard and you start everything and finish nothing. A research career is a long sequence of this one decision, and taste is largely knowing, this month, which mode you should be in.

The two modes even seem to want different physical conditions. Exploration wants you away: alone, off the devices, mind unclamped from the immediate. The famous breakthrough stories all rhyme—the insight that arrives on a walk, in the shower, stepping onto a bus, never at the desk where you’d been grinding. Treat these as suggestive, not as data—they’re the stories winners tell, and we never hear from the equally idle people who got nothing. But there’s a modest real effect underneath, the one called incubation: step away from a hard problem and some part of you keeps working it. The instruction is what matters. If you are stuck, frequently the answer is not more hours at the desk—it’s a walk. Exploitation wants the opposite: the desk, the screen, the long uninterrupted afternoon of grinding a found idea into something real. Know which one you need and arrange your day for it.

The generate-judge loop, and your advisor

Put the two halves together and you get the actual engine of research: a loop. Generate an idea, judge it, kill it or keep it, generate again. Fast and merciless. The whole point of building taste-as-judgment is to make the judge in this loop sharp and quick, so you can run the loop many times—because, see above, you need many swings.

Two things make the loop run faster. The first is writing: you cannot reliably judge an idea that’s still only a feeling in your head, and the act of writing it down is what forces the judgment to get honest. That deserves its own treatment, and it gets one in the communication post—for now just know that the judging half of taste runs on a pen. The second is your advisor, who is, in this loop, a faster and more experienced verifier. The reason you meet with me every week is not for me to hand you ideas. It’s to be a high-quality judge you can query cheaply, so the loop runs against real taste before you’ve sunk a year into a bad branch. That is most of what advising is, and it’s why the apprenticeship model survives: judgment transfers by being used out loud, over and over, until one day it’s yours and the door isn’t needed.

One last thing, the most encouraging thing I know about taste. Here is why beginners quit: you start with taste ahead of your ability, so everything you make disappoints you, and the gap is so painful that most people conclude they have no talent and stop. They’re wrong. That gap is the normal starting condition, and the only way to close it is to keep producing—a volume of work slowly drags your ability up to meet your taste. Your taste running ahead of your output isn’t evidence you’re bad at this. It’s the precondition for getting good. The disappointment is the pillar working. Keep taking swings.

Step back and notice what all of it—judging, generating, the loop, the years of reading—is really building: an accurate model of the field inside your own head, detailed enough that you can feel, from the inside, what’s worth doing and what isn’t. That’s the first of the three systems you learn to model. Next, a system that pushes back harder, because it either runs or it doesn’t: the machine.

Next: Execution—turning a judged idea into a real thing that works.

Execution: The Idea Is the Easy Part

2026-06-21T10:00:00-04:00

This is the second of three posts on the pillars of research. The first was taste, and it argued that having a good idea is hard—rare, the tail of a distribution, the slow fruit of trained judgment. So the title of this post is going to sound like a contradiction: the idea is the easy part. It isn’t a contradiction. It’s a change of frame. Conceiving the idea is hard; but once taste has handed you an idea worth a year, making it true is usually where the whole thing lives or dies—and that is the part people wave away as “just engineering.” That dismissal is the mistake this post is about.

Because a great idea with bad execution isn’t a great idea with an asterisk. It’s nothing. An idea is only a multiplier; it’s worth nothing until something executes it. Multiply your brilliant idea by zero execution and you know what you get.

The clearest example of the multiplier is the most painful one in our field. Through the 1970s, Xerox PARC built the graphical interface, Ethernet, the laser printer, and the Alto that tied them together—a strikingly large fraction of the modern personal computer. (The mouse came from Engelbart’s lab at SRI; PARC made it usable.) Xerox cashed in on some of this—laser printing became a real business—but it fumbled the computer itself: the Alto never shipped as a product and the 1981 Star flopped. Apple and Microsoft executed on the interface and took the industry. The hard ideas had been sitting there, invented, for years. Or Friendster: it had the social network before Facebook and a multi-million-user head start, and a big part of why it lost was unglamorous engineering—its pages were notoriously slow, reportedly taking many seconds to load, and the scaling problem never got fixed. Same idea. Different execution. Different history. Execution is not the part that comes after the important work. Frequently it is the important work.

Three things that aren’t the same: coding, programming, hacking

The first thing I want a student to be able to do is notice which of these they’re doing, because the words get used interchangeably and they are not the same activity.

Coding is writing the code—the local act of turning a known solution into syntax. It’s the smallest of the three and the one people overweight, probably because it’s the most visible.

Programming is solving a problem from a clean slate—designing the solution, choosing the structure, deciding what the thing is before it’s anything. This is where most of the actual thinking lives.

Hacking—and I mean this as a compliment, in the old sense—is solving a problem inside a gazillion lines of code you didn’t write. Finding the three places in a kernel where you need to intervene. Bending an enormous existing system to do something it was never meant to do. This is, honestly, most of what real systems research demands, and it’s the skill schools teach least.

The reason to keep these separate in your head is that research almost never happens at the coding level. If you find yourself stuck, the question “am I stuck on coding, programming, or hacking?” usually dissolves it. Stuck on coding is a syntax or API problem—trivial, look it up. Stuck on programming means you haven’t actually decided what you’re building—go back to the design. Stuck on hacking means you don’t understand the system you’re inside of yet—go read it. Different diagnoses, different cures.

There’s a deeper version of this distinction that’s worth internalizing early: software engineering is really just programming integrated over time. A throwaway script and a system that ten people will maintain for ten years are not the same kind of object, even if they do the same thing today. Fred Brooks made this point decades ago and put a number on it: turning a quick program into a polished product other people can use is roughly 3× the work, and turning it into a component of a larger system is another 3×—so the full “programming systems product” costs about nine times the garage version. Most of the surprise and pain in a student’s first big project is discovering that 9×. Know it’s coming.

What you need to be able to do

Here’s the concrete checklist I’d hand someone. Over a PhD you should, at least once:

Master one programming language completely—not “can write it,” but know it in your bones, the way you know a spoken language you dream in.
Hack one large-scale system—really get inside something huge and unfamiliar and make it do your bidding.
Build one system from scratch, so that you lose the fear of systems. This is the one that changes people. Once you’ve built a database, a kernel, a compiler from nothing, no codebase intimidates you again, because you know there’s no magic in there—just decisions, some good, some you’d make differently.
Have one good visualization/tooling habit. When you can see what your system is doing, debugging stops being archaeology.

And in your own area specifically: you need to be able to prototype fast, and you need to truly understand the layers beneath you—the OS, the hardware, the network—well enough that they can’t fool you. The general principle: the hardware cannot work optimally without help from the programmer, and you cannot give that help if the layer below you is a black box. Most subtle systems bugs and most surprising performance results live exactly at the seam where your mental model of the layer below diverges from what it actually does. Go down a level. It’s almost always worth it.

On the “don’t fear systems” point, one practical note about reading code, since diving into huge codebases scares people. You don’t really read code, you decode it: a piece of code is not literature, it’s a specimen. You don’t read a million-line system front to back like a novel. You find a thread—one request, one syscall, one function—and you pull it, and you follow it down, and you ignore the other 999,000 lines until you need them. Reading code is an active, surgical act, not a passive one. Once you believe that, large systems stop being walls and start being mazes, which at least have a path through.

Deliberate practice, not ten thousand hours

People love the “10,000-hour rule.” It’s wrong, or at least badly mangled—there is nothing magical about ten thousand hours, and the researcher whose work the rule was built on spent years saying so. The hours aren’t the mechanism. What builds skill is deliberate practice: working at the edge of your ability, on something just past what you can comfortably do, with feedback that tells you specifically what was wrong. Ten thousand hours of comfortable repetition makes you exactly as good as you were at hour one. An hour of the uncomfortable kind, with real feedback, moves you.

For us, the uncomfortable kind looks like: writing the part you’re avoiding because you’re not sure you can; profiling the system you assume you understand and being wrong; submitting the paper and reading every brutal review instead of flinching away. The feedback is the whole point—which is, not coincidentally, another argument for the apprenticeship, because a good advisor is a feedback machine calibrated to your specific weaknesses.

Build a system for yourself

Here’s a strange blind spot: systems people almost never build a system for their own work. Look at what your research life actually is—a stream of inputs arriving faster than you can process them (papers, ideas, deadlines, threads, half-thoughts), hard latency requirements, and a tiny leaky cache called your memory. You would never run a service this way. Yet most students try to hold it all in their heads, drop things, and—worse—burn the attention they needed for thinking on the holding. The principle is one sentence: your mind is for having ideas, not holding them. Get the open loops out of your head and into something you trust, and treat your own workflow as a system to be engineered, not a fixed trait of your personality.

Two parts of that system aren’t optional. The first is note-taking that is actually thinking, not transcription. Your notes aren’t a record of the work; done right, they are the work—the place where the thinking actually happens. Note-taking done well is generating and judging on paper, the taste loop from the last post externalized and made durable.

The second is the discipline to actually run the system, which is harder than picking one. Here’s the line worth tattooing on every first-year: you do not rise to the level of your goals, you fall to the level of your systems. Everyone wants to do great research; the wanting is uniform and useless. What differs between people is the boring daily machinery—when they read, how they capture, whether the deadline gets respected. So if you adopt a system, give it a fair trial: run it strictly for at least two weeks before you judge it. Most people quit a good system in three days because it feels awkward, which is just the cost of any new skill and tells you nothing.

Ship: deadlines, estimation, and what to cut

Last, the part that separates research that exists from research that almost existed: finishing. Three things to know.

You are bad at estimating, and not randomly—predictably bad, in one direction. It’s called the planning fallacy: we systematically underestimate how long our own tasks will take, even when we have mountains of evidence that we always run long. The fix isn’t optimism or willpower; it’s taking the outside view—ask how long things like this have actually taken you before, and trust that number over the story in your head.

Given that everything takes longer than you think, you will not finish everything, which means knowing what to cut is an execution skill, not a failure of one. Herbert Simon’s word for the right behavior is satisficing: don’t optimize every component, get each one good enough and move on. The Pareto split is brutally real in systems work—something like 20% of the effort gets you 80% of the result. Find that 20% for each piece, ship it, and spend your remaining time only on the parts where excellence actually changes the contribution.

And iterate fast. Make it work, then make it right, then make it fast—in that order, never reversed. Design your whole setup around an immediate connection between what you change and what you see: the shorter the loop between changing something and seeing what changed, the more times you can run it, and execution—like idea generation—rewards more turns of the loop. Tight loop, many iterations, finished system.

Underneath all of it—the three kinds of building, the layers, the tooling, the deadlines—execution is one thing: building an accurate enough model of the machine that it does what you intend instead of what you assumed. The bugs live exactly where your model and the machine disagree. That’s the second system you learn to model. The third pushes back hardest of all, because it has a mind of its own: other people.

Next: Communication—getting the finished thing out of your head and into everyone else’s.

Communication: Standing Where the Other Stands

2026-06-21T09:00:00-04:00

This is the last of three posts on the pillars of research, after taste and execution. You have judged a good idea and built a real thing. It is sitting on your hard drive, true and working and invisible. Communication is the pillar that decides whether it ever becomes part of anyone else’s thinking—and it’s the one students neglect most, because it feels like packaging applied to the real work after the real work is done. That’s exactly backwards, and showing you why is most of this post.

The title is the whole method, borrowed from a Chinese phrase I keep returning to: 易地而处—stand where the other stands; put yourself in their place. Every rule below is a special case of it. Writing, speaking, and the daily honesty of working with people all come down to the same move: stop modeling your own head and start modeling theirs.

Writing is not reporting. Writing is thinking.

Start with the deepest misconception, because it changes how you should work, not just how you should write. Most people think the sequence is: have the idea, do the research, then write it up. Write-up—as if writing were transcription, the clerical step at the end.

It isn’t. Writing is how you find out whether you actually have an idea. You know the feeling of a thought that seems complete and powerful in your head—and then you sit down to write it and it dissolves into mush, because it was never as finished as it felt. Ideas can feel complete; it’s only when you try to put them into words that you discover they’re not. And the flip side, the real payoff: half the ideas that end up in a piece are ones you thought of while writing it. Writing doesn’t just expose the holes; it generates the patches. Leslie Lamport’s version is the one I quote to students: if you’re thinking without writing, you only think you’re thinking.

Simon Peyton Jones turns this into a method, and it’s the single most useful piece of advice I can give a new researcher. The naive model is idea → do research → write paper. His model is idea → write the paper → do the research. Start writing the paper almost immediately, while the work is still half-formed, because “writing the paper is how you develop the idea in the first place.” The paper isn’t the report of the research. The paper is an instrument for doing the research—it forces you to be clear, it crystallizes what you don’t understand, and it shows you, early and cheaply, which parts are actually hard. Don’t wait until you “have something to write up.” Write to find out if you do.

Writing, mechanically

Once you accept writing as thinking, the craft rules follow from 易地而处—from relentlessly taking the reader’s side.

Your reader doesn’t care what you know. Writing isn’t about communicating your ideas to your readers; it’s about changing their ideas. The reader is not asking “why do you think that?”—they’re asking “why should I think that?” They don’t owe you a reading; you owe them a reason to keep going. Every sentence is spending their patience, and you’d better be buying something with it.

One idea. Peyton Jones again: a paper should have a single “ping”—one clear, sharp point. “You want to infect the mind of your reader with your idea, like a virus.” A virus carries one payload. If your paper has five contributions, your reader will remember none of them; if it has one, sharp and well-aimed, they’ll carry it out of the room. Decide what the one thing is. You may not know at the start—but you must know by the end.

Don’t make them walk your path. The single most common flaw in student drafts: recounting the project in the order it happened. “Do not recapitulate your personal journey of discovery,” Peyton Jones says. “This route may be soaked with your blood, but that is not interesting to the reader.” They don’t want the maze you wandered—they want the straight road to the idea, the one you can only draw after you’ve escaped. Your suffering is not structure.

And a concrete process, the one I actually use: flow → bullets → prose → paper. First the flow—the logical skeleton, the argument’s shape, before any sentences. Then bullets fleshing each beat. Only then prose. Only then polish. The mistake is starting at prose—writing beautiful paragraphs about points that, it turns out, are in the wrong order or shouldn’t exist. Get the skeleton right while it’s cheap to move bones around. Sentences are expensive to write and emotionally expensive to delete, so don’t write them until the structure is settled.

Talks: you have two minutes and one job

A paper and a talk are different instruments and people keep confusing them. A paper is for the record, read alone, re-readable. A talk is live, linear, un-rewindable, and the audience is exhausted and skeptical and checking their phone. 易地而处: design for that person, not for an attentive reader who doesn’t exist.

The purpose of a talk is not to convey your results. Peyton Jones is blunt: the goal is “to give your audience an intuitive feel for your idea” and “to make them eager to read your paper”—not to present every detail, and emphatically not “to impress your audience with your brainpower.” His budget for a conference talk is worth memorizing: “Motivation (20%) + your key idea (80%). Nothing else.” You cannot transmit the paper in twenty minutes. Don’t try. Make them want the paper.

Patrick Winston taught a famous talk at MIT for forty years, and two of his rules I pass on constantly. Start with a promise—in the first minute, tell them exactly what they’ll know or be able to do by the end; give them a reason to stay. And don’t open with a joke—they’re still settling in, still adjusting to your voice, not ready for it; you’ll get the laugh later when you’ve earned the room. (He also held that there is “a special circle in hell for those who use laser pointers”—a conviction I have come to share.)

The deepest enemy of a good talk has a name: the curse of knowledge. Once you know something, you cannot imagine not knowing it, and so you skip the very steps your audience needs. There’s a much-loved illustration: people tap out a famous song on a table—just the rhythm—and badly overestimate how many listeners can name it, because inside their own heads they hear the whole song, melody and all, while the listeners get only the bare taps. That gap—between the song in your head and the bare taps the audience actually receives—is every talk you’ve sat through and not understood. Your expertise is the tapping. Fight it by, again, 易地而处: sit in your own audience’s chair and ask what they actually have in their heads right now, which is almost nothing of what’s in yours.

One underrated, concrete thing: delivery, including your voice. Students obsess over slides and ignore how they sound, but a talk is heard. I’ll add the necessary caveat, since it’s everywhere: the famous “93% of communication is nonverbal” statistic is a myth—a real study, badly generalized, that only ever applied to mismatched emotional signals. Don’t believe the number. Do believe the underlying point: how you say it carries real weight, and a monotone will bury a good idea. Practice out loud. Hear yourself.

The quiet kind: talking to the people you work with

The flashy communication is papers and talks. The kind that determines whether your projects live or die is the daily one—how you talk to your collaborators and your advisor. Two principles.

Be honest, fast, especially when it’s bad. When something goes wrong—a result doesn’t replicate, you’ve fallen behind, you broke the build—the instinct is to hide it until you’ve fixed it. That instinct is the single most expensive habit in research. Tell me immediately. Yes, we might be frustrated for a moment; that passes, and a problem surfaced early is cheap while a problem hidden for a month can be fatal. Toyota built the best manufacturing system in the world partly on one idea: any worker on the line can pull a cord and stop everything the instant they see a defect—and when they pull it, a leader comes to help, not to punish. Pulling the cord is the heroism, not the failure. Build that cord into how we work. Pull it early.

This only functions on a foundation with a name: psychological safety—the shared sense that the team is safe for taking interpersonal risks, that you can admit a mistake or ask a dumb question without being punished for it. When teams are studied for what makes them effective, this keeps coming out near the top, ahead of raw talent or experience. That’s most of what I mean when I tell a student I’m on your side. I’m not the examiner waiting for you to slip. We are pointed at the same problem, and the work goes faster if you can tell me the truth without bracing for impact. The aim is candor with care: challenge people directly precisely because you’re on their side. Care without challenge is useless flattery; challenge without care is just cruelty. Both at once is how good groups actually talk.

Talk outward, too. Three rings, all worth your time. Talk to people outside your field—including, now, to AI, which is an endlessly patient outsider to every field at once; explaining your problem to something that doesn’t share your assumptions is one of the fastest ways to find them. Talk to people in your broad community, systems people generally, who share your language. And talk to the handful working your exact problem—track at least one line of work closely enough that, when people in the area think of it, they think of you. That last one is how a research identity actually forms: not by announcing it, but by being reliably present in one conversation until it becomes yours.

The pillar, and the point of all three

Communication is the pillar that makes the other two count. Taste with no communication is private good judgment nobody benefits from. Execution with no communication is a working system nobody adopts. The work becomes real—enters the shared body of what the field knows—only when it crosses from your head into others’. And every technique for making that crossing reduces to the same move: 易地而处, leave your own head and stand in theirs.

Which is, when you zoom out, what all three pillars have been about—the promise I made at the very start of this series, now come due. Taste is modeling the field well enough to know what it needs. Execution is modeling the machine well enough to make it obey. Communication is modeling other people well enough to reach them. Research is, in the end, three kinds of empathy—for the problem, for the system, and for the people—three different targets, one underlying skill: getting outside your own head and being right about what’s there instead.

That’s why I opened with a Tang-dynasty teacher and I’m closing with a Tang-dynasty phrase, and why they turn out to be the same idea. 传道, 授业, 解惑—transmit the Way, impart the skills, resolve doubts—are three faces of 易地而处, standing where the other stands: the field, the machine, the person across the table. It’s the Way I’m trying to transmit, and it can’t really be transmitted on a page. The rest is doing it next to someone, year after year, until it’s yours.

Generation Got Cheap. Judgment Didn’t.

2026-06-19T00:00:00-04:00

My field, computer systems, has a creed: talk is cheap; show me the code. For thirty years it was right. Ideas were easy to narrate and hard to build, so we trusted the people who built. Doing was the proof; talk was suspicion. That creed is exactly what AI is now inverting—and it is why I have started blogging again.

The two reasons I didn’t

I avoided blogging for most of my career, for two reasons that were both true. Writing well is expensive: not one article, but the years of apprenticeship behind it, and then the hours of drafting and polishing per piece. And my culture rewarded doing over saying—the person who quietly ships beats the one who narrates brilliant ideas and finishes nothing. Underneath, those were the same reason: words were costly to produce and cheap to fake, so words were a bad signal.

What changed

AI broke the first reason outright. The draft-check-polish loop that used to cost me hours is now something I hand off and supervise—the post you are reading was built that way, and so was the one beside it, figures and all. My job in that loop was not the prose. It was deciding what the argument should be, and killing the three versions that were wrong.

Execution is going the same way, slowly and only in parts. An agent will implement and test a well-specified idea while you watch. It will not, yet, do the part of my own research that is actually hard: the long tail where the code runs, passes the happy path, and is wrong in the way that matters. (I spend my days in that gap.) Cheap for the routine; still dear at the frontier. But the direction isn’t in doubt.

Ideas got cheap too

Here is the part the optimistic version of this essay skips. If AI makes prose and code cheap, it makes ideas cheap too—a model will brainstorm a hundred directions before you finish your coffee. So “have ideas” cannot be what saves us. Idea-generation deflates along with everything else.

What doesn’t deflate is judgment. Which of the hundred directions is worth a week. Which clean result is actually a measurement artifact. When to kill your own favorite. The faucet is open for everyone now; what stays rare is the taste to know which drops to keep. Generation got cheap. Selection didn’t.

Why that means writing

And judgment has a visibility problem. It used to ride along with the work: ship something good and the quality vouched for the taste behind it. When the work is cheap to produce, the work stops vouching for anything—and no one can read your mind. The only way to show judgment is to externalize it: to argue, in public, for this and not that. That is what a blog is. Not talk instead of doing; the record of the choices the doing no longer reveals. It is this blog’s motto, and I mean it now as an argument: code is cheap; show me the thoughts.

When the idea becomes runnable

A thought you write down need not only be read. For an idea that carries a cheap test—a benchmark, a small system, a clean experiment—I can already picture handing a blog post to an agent that builds it and reports back: works, or doesn’t, and here is why. Not every idea; the ones whose proof is expensive are exactly the ones that stay human. But for the rest, a written idea stops being a promissory note and edges toward something runnable. The oldest line in my field—show me the code—starts to fold into show me the thought, because the thought, increasingly, runs.

The part that’s still mine

So I am blogging again. Not to talk instead of build, but because building no longer shows what I think. The labor is getting cheap. The judgment is not. A blog is where I keep the part of the work that is still mine.

Correctness Without a Reference

2026-06-13T00:00:00-04:00

For as long as we have verified software, a specification has been something you can check. Given an output, you can decide whether it is correct. Sort a list: is the result ordered, and a permutation of the input? Two questions, both decidable. Most of what I have worked on stands on this: my system Cobra verifies whether a database served its transactions serializably; our Orochi project verifies whether an untrusted server returned what the real computation would have. Different systems, one bedrock: correctness is a property you can check.

Modern AI removes the bedrock. Ask a model to summarize four hundred pages, draft the brief, or run the agent that books your travel. Look at the output and try to decide—correct, or not? Nothing answers. The specification did not get harder to check. It stopped being a checkable object at all.

The “canonical solution” is to find a reference. Pick a trusted implementation and call the output correct when it matches—re-execute the computation, diff against a gold version. It is, for example, what GPU kernel developers do, and how deep-learning compilers are tested. But equality is probably the wrong relation. Sum a million floats in a different order and the last bits change; the answer is still correct, yet a reference check rejects it. And if the reference itself is wrong, you certify its bug as truth. We do not want equality to a reference. We want equivalence—and we have to say which differences are allowed.

Physics settled this first

Sit in a sealed box with no windows and drop a ball. It falls. Are you on Earth, or in a rocket accelerating at one gravity? No measurement inside can tell. Einstein refused to ask which one is “real”: if no observation distinguishes them, they are the same. Sameness is not absolute. It is sameness with respect to what you can observe.

We have already seen this—observational equivalence—in different areas. The physics only makes it vivid: fix what you observe, and you fix what counts as correct.

The definition for “AI correctness”

So change the question. Not “did the machine produce the right answer?” but: could a legitimate run of the machine have produced what I see?

correct ⇔ there exists a legitimate run e with observe(e) = the outcome.

An output is correct if some legitimate execution explains it. To call it wrong is the harder claim: that no legitimate run could have produced it. The definition turns on three choices. What makes a run legitimate—this is the specification, under its true name. What makes two outcomes equal. And what we observe. The legitimate runs are exactly the differences that do not matter: reorder the float sum, fine; return 7 where every legitimate run yields 3, not fine.

If this feels familiar, it should. It echoes several established concepts in computer science. Take serializability as an example. A database history is correct precisely when there exists a serial order of its transactions consistent with the reads and writes we observed, and database checkers like Cobra spend their whole effort searching for that one legitimate order. Serializability was existential equivalence all along. We had not named the pattern.

Why this is useful

The pattern is what excites me, because it does not break as the systems get wilder. You climb from a chip to an agent by turning two dials—loosen what counts as legitimate, coarsen what you observe—and the sentence still means something. On a GPU kernel, the legitimate runs are the valid summation orders and you observe the numbers, up to ε. In LLM inference, they are the samplings the decoding policy allows and you observe the text (see an example in our Jailbreak Oracle Problem). In an agent, they are the allowed interleavings of tool calls and the world, and you observe the side effects—the flight booked, the calendar updated.

Same definition, three altitudes. Naming it pays off twice. The specification becomes a single object, the legitimacy predicate; every dispute about whether an AI is “correct” turns into a concrete dispute about which runs we are willing to call legitimate. And it shows where the work is hard, just by counting. Exhibiting one legitimate run is enough to accept; ruling out all of them is what it takes to reject.

This says only that the machine ran legitimately—not that its output was wise. Whether a result is helpful, or harmless, is a different question, and a later post.

Conclusion

Existential equivalence is one definition that travels: an output is correct when some legitimate run explains what we observe—whether that run is a summation order, a sampling path, or an agent’s trace. It does not make specifying AI easy. It tells us where the difficulty lives: in the legitimacy predicate, and in what we choose to observe. That is a smaller, sharper question than “is the AI right,” and a better place to start.

Every correctness question is a windowless box. The only decision that matters is what you let yourself observe. Make it honestly, and correctness stops being a verdict you pronounce. It becomes an explanation the world can offer—or cannot.

References

Cheng Tan, Changgeng Zhao, Shuai Mu, and Michael Walfish. Cobra: Making Transactional Key-Value Stores Verifiably Serializable. OSDI 2020.
Cheng Tan, Lingfan Yu, Joshua B. Leners, and Michael Walfish. The Efficient Server Audit Problem, Deduplicated Re-execution, and the Web. SOSP 2017.
Christos H. Papadimitriou. The Serializability of Concurrent Database Updates. Journal of the ACM 26(4), 1979.
Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, and Lingming Zhang. NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers. ASPLOS 2023.
Shuyi Lin, Anshuman Suri, Alina Oprea, and Cheng Tan. Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem. MLSys 2026.

Hello, and what this is

2026-06-12T00:00:00-04:00

A paper is the polished end of a long, messy process. By the time it ships, the doubts are sanded off and the dead ends deleted—good for the reader, bad for the record. The most useful part, how you actually got there, is exactly what gets thrown away.

And now we face the AI era, staring straight at it: much of what we stood on, and believed deeply, may no longer be true. When the artifact gets cheap, the thinking is what’s left.

So this is the other thing—a place for the rough cut. Notes on systems, verification, and faithful LLM systems. Arguments with myself. Ideas that aren’t papers and may never be.

Code is cheap. Show me the thoughts.

—Cheng Tan