<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="/naizhengtan_blog/feed.xml" rel="self" type="application/atom+xml" /><link href="/naizhengtan_blog/" rel="alternate" type="text/html" /><updated>2026-06-28T19:56:30-04:00</updated><id>/naizhengtan_blog/feed.xml</id><title type="html">Cheng Tan</title><subtitle>Personal blog of Cheng Tan. Notes, mostly on systems, verification, and faithful LLM systems.</subtitle><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><entry><title type="html">Skin in the Game</title><link href="/naizhengtan_blog/posts/skin-in-the-game/" rel="alternate" type="text/html" title="Skin in the Game" /><published>2026-06-28T09:00:00-04:00</published><updated>2026-06-28T09:00:00-04:00</updated><id>/naizhengtan_blog/posts/skin-in-the-game</id><content type="html" xml:base="/naizhengtan_blog/posts/skin-in-the-game/"><![CDATA[<p>I want to ask a narrow, uncomfortable question about my own field. In computer
systems research, in the AI era we can already see coming, what is actually
<em>valuable</em>? Not pleasant, not publishable—valuable. What does a human still
contribute that the world should pay for? The discomfort is that every answer I was
raised on has quietly stopped being true.</p>

<h2 id="everything-we-valued-got-cheap">Everything we valued got cheap</h2>

<p>For thirty years my field had a clear hierarchy of proof, and AI is eating it from
the bottom up.</p>

<p>We valued <strong>code.</strong> <em>Talk is cheap; show me the code</em> was the whole creed—code was
the honest signal you couldn’t fake. A model writes a plausible implementation now
while you read the prompt back.</p>

<p>We valued <strong>built systems</strong>, the heavy artifacts that took a team a year. The
scaffolding, the glue, the second system once you know the shape—increasingly
something you supervise rather than write.</p>

<p>We valued the <strong>beautifully written paper</strong>, because clear prose was scarce and
stood in for clear thought. I can generate ten fluent framings of any idea before
lunch, and so can you.</p>

<p>And here’s the one that stings, the one people in my field still flinch from: we
valued <strong>ideas that work.</strong> The clever mechanism nobody else had. A model can
propose a hundred mechanisms and <em>try them in parallel</em> while you sleep—enough of
them good, once sieved by experiment, that “I had the idea” is no longer the high
ground it was. Idea generation is becoming search, and search is a machine’s home
turf.</p>

<p>The pattern is the whole essay: <strong>everything that can be generated has collapsed in
value.</strong> So if anything human is still valuable, it cannot be a thing you <em>generate.</em></p>

<h2 id="the-one-thing-the-machine-cannot-do">The one thing the machine cannot do</h2>

<p>There is exactly one move in this game that AI structurally cannot make. It cannot
put anything at stake.</p>

<p>A model will assert that your proof is correct, and assert the opposite a moment
later, and pay nothing either way. No reputation to lose, no name attached, no future
in which today’s wrong claim costs it. This is not a weakness to be patched in the
next version; it is what the thing <em>is</em>—an engine for producing statements at zero
cost. It is the purest possible source of what game theory bluntly calls <strong>cheap
talk</strong>: communication that’s free to emit and therefore, on its own, carries no
information. A thousand confident AI answers don’t add up to one trustworthy one.</p>

<p>Which points straight at the scarce thing. A signal carries information in proportion
to what it costs to send. The opposite of cheap talk is a <strong>costly signal</strong>—a claim
you pay for if you’re wrong—and it is informative precisely because you wouldn’t
have sent it otherwise. When a researcher stakes their reputation on <em>this approach
is the one</em>, the spending is the message. Strip the cost away and you’re back to
noise.</p>

<p>So the valuable unit of human work, in a world of cheap generation, is the staked
claim: a judgment committed to publicly that you will pay for. I need a word for it.
In an earlier post I called it “thoughts”—<em>code is cheap; show me the thoughts.</em>
Too soft; a thought is free, and you can have a brilliant one and risk nothing. The
thing I mean is a thought with a price on being wrong. The closest word I have is
<strong>commitment</strong>: you commit in both senses—you state it, and you bind yourself to
it.</p>

<h2 id="the-uncomfortable-part">The uncomfortable part</h2>

<p>Let me admit what this implies, because it’s controversial and I’d rather say it than
smuggle it.</p>

<p>The scarce human contribution is shifting from <em>intelligence</em> to <em>accountability.</em>
Not how smart you are, not how much you produce—AI now wins both—but whether
you’ll stand behind a specific claim and absorb the cost when it’s wrong. The
researcher starts to look less like an author and more like an <strong>underwriter</strong>:
someone who looks at a thousand machine-generated candidates and stakes their name on
which one is real. I was trained to distrust exactly that person—the one who builds
less and asserts more—and I’m no longer sure the training was right.</p>

<p>It sounds like it rewards loud, confident people over quiet, careful ones. Here’s why
I don’t think it does, though I hold this loosely: confidence without cost is just
more cheap talk, and the world learns to discount it fast. The filter isn’t volume;
it’s <em>paying.</em> What survives is a calibrated track record—claims you staked that
came true, over years. Being right in private earns you nothing; being loud while
wrong eventually bankrupts you. Only being right <em>on the record</em> compounds.</p>

<h2 id="what-im-left-with">What I’m left with</h2>

<p>The old creed was <em>talk is cheap; show me the code</em>—a demand for a costly signal in
an age when code was the costly thing. Code is cheap now. Words are cheap, designs are
cheap, even ideas are getting cheap. The creed has to be rewritten for an age when
generation costs nothing:</p>

<p><em>Don’t tell me what’s true. Tell me what you’ll stake on it, and what it costs you if
you’re wrong.</em></p>

<p>That’s the last expensive thing. It might be the only one we still get to own.</p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="ai" /><category term="research" /><category term="philosophy" /><summary type="html"><![CDATA[I want to ask a narrow, uncomfortable question about my own field. In computer systems research, in the AI era we can already see coming, what is actually valuable? Not pleasant, not publishable—valuable. What does a human still contribute that the world should pay for? The discomfort is that every answer I was raised on has quietly stopped being true.]]></summary></entry><entry><title type="html">The Three Pillars of Research</title><link href="/naizhengtan_blog/posts/the-three-pillars-of-research/" rel="alternate" type="text/html" title="The Three Pillars of Research" /><published>2026-06-21T12:00:00-04:00</published><updated>2026-06-21T12:00:00-04:00</updated><id>/naizhengtan_blog/posts/the-three-pillars-of-research</id><content type="html" xml:base="/naizhengtan_blog/posts/the-three-pillars-of-research/"><![CDATA[<p>Every craft worth learning, everywhere, for most of human history, has been taught
the same way: a beginner stands next to a master and does the work until the work
becomes theirs. Blacksmiths, surgeons, painters, chefs, violinists. We have had
books for millennia and universities for centuries, and still, when the skill
actually matters, we do not hand someone a manual—we put them next to someone
who already has it. There is a Tang-dynasty line about this, from Han Yu, that I
keep coming back to: <em>师者，传道授业解惑也</em>. A teacher is one who transmits the Way,
imparts the skills, and resolves doubts. Three jobs, written down in the year 802,
and they are still the three jobs.</p>

<p>Why an apprenticeship? Because the part that matters can’t be written down. The
knowledge is tacit: we always know more than we can put into words. You can read
every paper on how to ride a bicycle and you will still fall off the bicycle. The
knowledge that makes an expert lives mostly below language—in the hands, in the
judgment, in the thousand small calls that no one ever wrote into a manual because
no one could. The only known way to transfer it is proximity. You watch someone
do it, you try, they correct you, you try again.</p>

<p>A PhD is the last apprenticeship most people will ever do, and the data backs the
framing: roughly half of people who start a PhD don’t finish, and a leading factor
in who finishes isn’t intelligence or undergraduate pedigree—it’s the advising
relationship. The research on doctoral attrition points the same way: what happens to
students after they arrive matters more for finishing than the qualifications they
brought with them. And what happens after they arrive is, mostly, an apprenticeship
that either works or doesn’t.</p>

<p>So here is a fair question for anyone in that position, and for me as someone on
the other side of it: what, exactly, is supposed to get transmitted? If I am the
master in this arrangement, what is the Way? I have spent a while trying to name
it honestly, and I keep landing on three things.</p>

<h2 id="the-three-pillars">The three pillars</h2>

<p><strong>Taste.</strong> Knowing which problems are worth your life. Not whether you <em>can</em> solve
something—whether you <em>should</em>. Which of a hundred directions is worth a year,
which clean-looking result is actually an artifact, when to kill your own favorite
idea. Taste is the part people assume is innate and mystical. It isn’t. It’s
learnable, and most of research failure is taste failure—brilliant execution
pointed at a question nobody needed answered.</p>

<p><strong>Execution.</strong> Turning a good idea into a real thing that works and that you can
trust. For us in computer systems this means building: prototyping fast, diving
into a million lines of someone else’s code, understanding the layers beneath you
well enough that you’re not fooled by them, and—the unglamorous core—finishing
before the deadline. An idea you can’t build is a wish.</p>

<p><strong>Communication.</strong> Getting the thing out of your head and into other heads. Writing
the paper, giving the talk, and the quieter daily kind: telling your collaborators
the truth, early, especially when it’s bad. Research that nobody can understand or
trust may as well not exist—the best idea in the world is worthless if it stays in
your head.</p>

<p>That’s the framework. Three pillars. I’m aware that “three pillars” is the kind of
phrase that should make you suspicious—every consultant has three of something.
I won’t claim this carves reality perfectly at the joints; you can name things a
researcher does that sit awkwardly across the lines (raising money, managing
people, surviving the politics). But I think the three are non-arbitrary, for a
reason worth stating up front, because it’s the spine of everything that follows.</p>

<h2 id="why-these-three-the-work-is-modeling-three-things-that-arent-you">Why these three: the work is modeling three things that aren’t you</h2>

<p>Here is the claim. Research is the discipline of building accurate models of three
systems, none of which is you, and the three pillars are just what it takes to
model each one.</p>

<p><strong>Taste is modeling the field</strong>—understanding it well enough to know what it
actually needs, which problems are real, which results matter. <strong>Execution is
modeling the machine</strong>—understanding the system and the hardware beneath you well
enough that you can make them do what you intend. <strong>Communication is modeling other
minds</strong>—understanding the people in your audience well enough to move an idea from
your head into theirs. Decide what to work on, make it, move it between minds: each
is an act of getting inside something that isn’t you and being right about it. That
isn’t a slide with three boxes. It’s the same hard skill—accurate empathy for a
system—pointed at three different targets. Hold onto that; the last post pays it
off.</p>

<p>It also explains the failure modes, which is the practical test. Because the three
are <em>separable</em> (you can be strong at one and weak at another), you can diagnose a
stuck student by which model is broken. A student with taste and communication but
no execution gives beautiful talks about systems that don’t run—their model of
the machine is wrong. One with execution and communication but no taste ships
flawless solutions to problems nobody has—their model of the field is wrong. One
with taste and execution but no communication does excellent work that dies on
their hard drive—they never modeled the reader at all. I have advised all three.
Different broken model, different fix.</p>

<p>And there’s a happy coincidence I can’t resist. The three line up with Han Yu’s
three verbs: 传道, transmit the Way, is taste—the sense of what’s worth doing;
授业, impart the skills, is execution—the craft of making the thing; 解惑, resolve
doubts, is communication—the dialogue that only happens in language. A line from
the year 802 and a framework I arrived at by watching students struggle in 2026
landing in the same place isn’t proof of anything. Old ideas about teaching and new
ideas about research can rhyme by accident. But it did make me like the
decomposition more.</p>

<p>One word on why taste comes first, since it’s the pillar people most doubt can be
taught. Think of food. A good palate isn’t only a filter that keeps you from
wasting a meal on something bad—it’s also what lets you <em>enjoy</em> eating, because
you can taste the difference between fine and extraordinary. Research taste is the
same: not just a filter against bad problems, but the thing that lets you enjoy the
work, because you can feel the gap between a result that’s merely publishable and
one that’s beautiful. People with taste have more fun, and since research is long,
the fun is most of what keeps you in it.</p>

<p>The good news, which I’ll spend the next three posts on, is that all three pillars
are trainable. Not equally easily—taste is the slowest and the most
apprenticeship-shaped, execution rewards raw hours the most, communication is the
most teachable and the most neglected. But none of them is a gift you either have
or don’t. I’ve watched all three grow in people, including in myself, and the
growth follows patterns regular enough to write down.</p>

<p>So that’s what I’m going to try to do here. One post per pillar:</p>

<ul>
  <li><strong><a href="/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search/">Taste</a></strong>—how
to judge work, and how to generate ideas worth judging.</li>
  <li><strong><a href="/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part/">Execution</a></strong>—coding
vs. programming vs. hacking, systems, and the boring discipline that beats talent.</li>
  <li><strong><a href="/naizhengtan_blog/posts/communication-standing-where-the-other-stands/">Communication</a></strong>—writing,
speaking, and telling the truth to the people you work with.</li>
</ul>

<p>The honest disclaimer up front: this is one systems researcher’s view, shaped by
my own field and my own mistakes. Take what’s useful. But I’ve taught these three
things for years now, mostly by proximity, the way they’ve always been taught—and
writing them down is my attempt to make the apprenticeship a little less tacit. To
tell a bit more than I thought I could.</p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="research" /><category term="phd" /><category term="advice" /><summary type="html"><![CDATA[Every craft worth learning, everywhere, for most of human history, has been taught the same way: a beginner stands next to a master and does the work until the work becomes theirs. Blacksmiths, surgeons, painters, chefs, violinists. We have had books for millennia and universities for centuries, and still, when the skill actually matters, we do not hand someone a manual—we put them next to someone who already has it. There is a Tang-dynasty line about this, from Han Yu, that I keep coming back to: 师者，传道授业解惑也. A teacher is one who transmits the Way, imparts the skills, and resolves doubts. Three jobs, written down in the year 802, and they are still the three jobs.]]></summary></entry><entry><title type="html">Good Ideas Are Rare; Taste Is the Search</title><link href="/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search/" rel="alternate" type="text/html" title="Good Ideas Are Rare; Taste Is the Search" /><published>2026-06-21T11:00:00-04:00</published><updated>2026-06-21T11:00:00-04:00</updated><id>/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search</id><content type="html" xml:base="/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search/"><![CDATA[<p>This is the first of three posts on what I think research is made of. The
<a href="/naizhengtan_blog/posts/the-three-pillars-of-research/">overview</a> argued for
three pillars—taste, execution, communication. This one is about taste, the
slipperiest of the three, the one people most want to believe is innate magic. It
isn’t magic. It’s a search.</p>

<p>Here is the fact the whole post stands on: good ideas are rare. Not scarce-ish—
<em>rare</em>, the far tail of an enormous space of things you could think, almost all of
which are wrong, or boring, or already done. Research is the work of finding the
rare good ones in that space. And taste is the search procedure: the thing that
tells you where in the space to look, and the thing that tells you, once you’ve
grabbed something, whether it’s actually any good. Those two moves have names—
<strong>generating</strong> and <strong>judging</strong>—and your taste is just how well you make them.</p>

<p>So taste is the difference between searching that huge space efficiently and
wandering it at random. The encouraging part, which people resist, is that the
search is <em>learnable.</em> “Taste is just subjective” is a comfortable thing to say
because it ends the argument—but it isn’t true, and the proof is in your own
history. Look at work you loved five years ago and wince at now—your taste didn’t
merely <em>change</em>, it got <em>better</em>, and you know it got better, which means there was
a better to move toward. A skill with a direction like that can be trained. You just
walk the direction, deliberately, for years, and your search gets sharper.</p>

<p>I’ll take the two moves in turn, starting with judging, because you can’t generate
toward a target you can’t yet recognize.</p>

<h2 id="judging-the-three-questions">Judging: the three questions</h2>

<p>When a reviewer reads your paper, or a program committee decides your fate, or I
read a draft from a student, the evaluation collapses—almost always—into three
questions:</p>

<ul>
  <li><strong>What’s new?</strong> (novelty)</li>
  <li><strong>Who cares?</strong> (importance)</li>
  <li><strong>Why now?</strong> (timeliness)</li>
</ul>

<p>The first two aren’t my invention; they’re the questions every program committee and
grant panel already asks, in one phrasing or another—what is new here, and if it
works, who benefits. (The funding agencies dress them up as “intellectual merit” and
“broader impacts,” but it’s the same two nerves.) The third, “why now?”, I add
myself, because I’ve come to think timeliness is a real and separate nerve. Different
words, same handful of nerves.</p>

<p>The fastest way to feel how load-bearing each question is: knock one out and watch
the work collapse.</p>

<p><strong>No “what’s new.”</strong> Suppose I build a free office suite that is 100% compatible
with Microsoft Office. Who cares? Millions of people. Why now? It saves them money.
Both great answers. But what’s new? Nothing—it’s a clone. So it’s a fine
<em>product</em> and it is not research. Novelty isn’t optional; it’s the thing that makes
it research at all.</p>

<p><strong>No “who cares.”</strong> Suppose I invent a genuinely clever new data structure—never
been done, provably elegant—that speeds up an operation no real system performs.
Novel: yes. Timely: sure. But nobody cares, so it’s a puzzle, not a contribution.
This is the most common failure mode among technically strong students, and the
hardest to feel from the inside, because cleverness is <em>so</em> satisfying that it
masquerades as importance.</p>

<p><strong>No “why now.”</strong> Suppose I propose something both novel and important—say, a
beautiful scheme that needs hardware nobody will have for twenty years, or that
re-solves a problem the field already moved past. The honest reaction is “interesting,
but not yet” or “interesting, but too late.” Timing is a real axis. The same idea
is a triumph in 2015 and a footnote in 2005.</p>

<p>Hold onto the structure: a contribution has to survive all three questions, and
most weak work dies on exactly one. When you read a paper, find the one it’s
weakest on. When you write one, defend all three before you defend anything else.</p>

<h2 id="the-two-of-three-rule">The two-of-three rule</h2>

<p>There’s a related heuristic I use for the <em>shape</em> of a strong systems paper. Good
work tends to have three possible sources of merit:</p>

<ol>
  <li>a <strong>hard, real problem</strong>,</li>
  <li>a <strong>novel idea</strong>, and</li>
  <li>a <strong>substantial implementation and/or thorough evaluation</strong>.</li>
</ol>

<p>You rarely get all three at a publishable level, and you almost never <em>need</em> all
three. You need <strong>two</strong>. The combinations are each a recognizable species of paper:</p>

<ul>
  <li><strong>Problem + idea</strong> (light on implementation): a clever, lightweight idea on an
important problem—the kind of paper that’s mostly insight, a few pages, and
changes how people think.</li>
  <li><strong>Problem + implementation</strong> (idea is straightforward): the heavyweight paper.
The idea is “obvious in hindsight,” but making it actually work—at scale, for
real—is a year of hard engineering and careful measurement. Building a real OS
kernel in a memory-safe language is this kind of paper.</li>
  <li><strong>Idea + implementation</strong> (problem is niche): a sophisticated, well-built attack
on a smaller problem. Narrower, but a pleasure, because both the thinking and the
making are excellent.</li>
</ul>

<p>This isn’t a law—it’s folklore, and the honest ancestor of it is Levin and
Redell’s 1983 note on how to write a good systems paper, where they insist a paper
must contain <em>“at least one new idea”</em> and then ask the author what was actually
built and what was actually learned: <em>“If you didn’t learn anything, it is a
reasonable bet that your readers won’t either.”</em> The two-of-three rule is just a
compression of that. Use it as a checklist on your own work: if you can’t honestly
name two of the three, you don’t have a paper yet—you have a start.</p>

<p>One more sharpening, on what “contribution” means, because students get this
backwards. Lines of code are not contribution. Consider two results: I write 10,000
lines of C++ to make a system 10× faster, or I write 100 lines to make it 10%
faster, <em>for every application that will ever run on it.</em> The second is often the
bigger contribution. Effort is an input; impact is the output; taste is knowing
they’re not the same number.</p>

<h2 id="generating-more-shots-thrown-well">Generating: more shots, thrown well</h2>

<p>Now the harder move. Judging is comparatively easy—you can learn it by reading a
hundred papers with someone who has taste, which is most of what a reading group is
for. Generating is the part that feels like magic, and the search frame is what
dissolves the magic. Remember the setup: a vast space, the good ideas vanishingly
rare in its tail. Two consequences for how you should actually work fall straight
out of it.</p>

<p>The first is just honesty about the odds. You will generate far more bad ideas than
good ones, <em>no matter how good you get</em>—“good” is the tail of a distribution;
that’s what the word means. Producing duds isn’t a sign you’re failing. It’s the
structure of the problem, and it never goes away.</p>

<p>The second is the strategy that follows: <strong>if hits are rare, take more shots.</strong> The
folk version is the old line about having a lot of ideas and throwing the bad ones
away. There’s a research-flavored version too—the claim that a creator’s number of
great works tends to track their <em>total</em> output, as if the hit rate were roughly
constant. It’s a contested claim, and shadowed by survivorship bias, since we mostly
count the people whose volume <em>did</em> pay off and never see the prolific producers of
pure noise. So don’t read it as a law. Read it as permission: generating a lot of
bad ideas and discarding them fast is normal practice, not failure, and the
researchers with the best ideas are often the ones having the most.</p>

<p>But raw quantity is a <em>dumb</em> search—spraying the space at random, which no good
researcher actually does. A good search is directed: you spend your shots where
they’re likely to land. And directed search, however you run it, turns on one
tradeoff you already know by name.</p>

<h2 id="exploration-vs-exploitation">Exploration vs. exploitation</h2>

<p>It’s the <a href="https://en.wikipedia.org/wiki/Multi-armed_bandit">explore/exploit tradeoff</a>:
do you mine the promising vein you’ve already found (exploit), or wander off to look
for a better one (explore)? Exploit too hard and you polish a local hill forever,
publishing increments while the real mountain sits one valley over. Explore too hard
and you start everything and finish nothing. A research career is a long sequence of
this one decision, and taste is largely knowing, this month, which mode you should
be in.</p>

<p>The two modes even seem to want different <em>physical conditions</em>. Exploration wants
you <em>away</em>: alone, off the devices, mind unclamped from the immediate. The famous
breakthrough stories all rhyme—the insight that arrives on a walk, in the shower,
stepping onto a bus, never at the desk where you’d been grinding. Treat these as
suggestive, not as data—they’re the stories winners tell, and we never hear from
the equally idle people who got nothing. But there’s a modest real effect underneath,
the one called <em>incubation</em>: step away from a hard problem and some part of you keeps
working it. The instruction is what matters. If you are stuck, frequently the answer
is not more hours at the desk—it’s a walk. Exploitation wants the opposite: the
desk, the screen, the long uninterrupted afternoon of grinding a found idea into
something real. Know which one you need and arrange your day for it.</p>

<h2 id="the-generate-judge-loop-and-your-advisor">The generate-judge loop, and your advisor</h2>

<p>Put the two halves together and you get the actual engine of research: a loop.
Generate an idea, judge it, kill it or keep it, generate again. Fast and merciless.
The whole point of building taste-as-judgment is to make the judge in this loop
sharp and quick, so you can run the loop many times—because, see above, you need
many swings.</p>

<p>Two things make the loop run faster. The first is <strong>writing</strong>: you cannot reliably
judge an idea that’s still only a feeling in your head, and the act of writing it
down is what forces the judgment to get honest. That deserves its own treatment,
and it gets one in the communication post—for now just know that the judging half
of taste runs on a pen. The second is <strong>your advisor</strong>, who is, in this loop, a
faster and more experienced <em>verifier</em>. The
reason you meet with me every week is not for me to hand you ideas. It’s to be a
high-quality judge you can query cheaply, so the loop runs against real taste
before you’ve sunk a year into a bad branch. That is most of what advising <em>is</em>,
and it’s why the apprenticeship model survives: judgment transfers by being used
out loud, over and over, until one day it’s yours and the door isn’t needed.</p>

<p>One last thing, the most encouraging thing I know about taste. Here is why beginners
quit: you start with taste <em>ahead</em> of your ability, so everything you make
disappoints you, and the gap is so painful that most people conclude they have no
talent and stop. They’re wrong. That gap is the normal starting condition, and the
only way to close it is to keep producing—a volume of work slowly drags your
ability up to meet your taste. Your taste running ahead of your output isn’t evidence
you’re bad at this. It’s the precondition for getting good. The disappointment is the
pillar working. Keep taking swings.</p>

<p>Step back and notice what all of it—judging, generating, the loop, the years of
reading—is really building: an accurate model of the field inside your own head,
detailed enough that you can feel, from the inside, what’s worth doing and what
isn’t. That’s the first of the three systems you learn to model. Next, a system
that pushes back harder, because it either runs or it doesn’t: the machine.</p>

<p><em>Next: <a href="/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part/">Execution</a>—turning
a judged idea into a real thing that works.</em></p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="research" /><category term="phd" /><category term="advice" /><category term="taste" /><summary type="html"><![CDATA[This is the first of three posts on what I think research is made of. The overview argued for three pillars—taste, execution, communication. This one is about taste, the slipperiest of the three, the one people most want to believe is innate magic. It isn’t magic. It’s a search.]]></summary></entry><entry><title type="html">Execution: The Idea Is the Easy Part</title><link href="/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part/" rel="alternate" type="text/html" title="Execution: The Idea Is the Easy Part" /><published>2026-06-21T10:00:00-04:00</published><updated>2026-06-21T10:00:00-04:00</updated><id>/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part</id><content type="html" xml:base="/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part/"><![CDATA[<p>This is the second of three posts on the pillars of research. The first was
<a href="/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search/">taste</a>, and it
argued that having a good idea is hard—rare, the tail of a distribution, the slow
fruit of trained judgment. So the title of <em>this</em> post is going to sound like a
contradiction: the idea is the easy part. It isn’t a contradiction. It’s a change
of frame. Conceiving the idea is hard; but once taste has handed you an idea worth a
year, <em>making it true</em> is usually where the whole thing lives or dies—and that is
the part people wave away as “just engineering.” That dismissal is the mistake this
post is about.</p>

<p>Because a great idea with bad execution isn’t a great idea with an asterisk. It’s
nothing. An idea is only a multiplier; it’s worth nothing until something executes
it. Multiply your brilliant idea by zero execution and you know what you get.</p>

<p>The clearest example of the multiplier is the most painful one in our field.
Through the 1970s, Xerox PARC built the graphical interface, Ethernet, the laser
printer, and the Alto that tied them together—a strikingly large fraction of the
modern personal computer. (The mouse came from Engelbart’s lab at SRI; PARC made it
usable.) Xerox cashed in on <em>some</em> of this—laser printing became a real
business—but it fumbled the computer itself: the Alto never shipped as a product
and the 1981 Star flopped. Apple and Microsoft executed on the interface and took
the industry. The hard ideas had been sitting there, invented, for years. Or
Friendster: it had the social network before Facebook and a multi-million-user head
start, and a big part of why it lost was unglamorous engineering—its pages were
notoriously slow, reportedly taking many seconds to load, and the scaling problem
never got fixed. Same idea. Different execution. Different history. Execution is not
the part that comes after the important work. Frequently it <em>is</em> the important work.</p>

<h2 id="three-things-that-arent-the-same-coding-programming-hacking">Three things that aren’t the same: coding, programming, hacking</h2>

<p>The first thing I want a student to be able to do is notice which of these they’re
doing, because the words get used interchangeably and they are not the same
activity.</p>

<p><strong>Coding</strong> is writing the code—the local act of turning a known solution into
syntax. It’s the smallest of the three and the one people overweight, probably
because it’s the most visible.</p>

<p><strong>Programming</strong> is solving a problem from a clean slate—designing the solution,
choosing the structure, deciding what the thing <em>is</em> before it’s anything. This is
where most of the actual thinking lives.</p>

<p><strong>Hacking</strong>—and I mean this as a compliment, in the old sense—is solving a
problem inside a gazillion lines of code you didn’t write. Finding the three places
in a kernel where you need to intervene. Bending an enormous existing system to do
something it was never meant to do. This is, honestly, most of what real systems
research demands, and it’s the skill schools teach least.</p>

<p>The reason to keep these separate in your head is that research <em>almost never
happens at the coding level.</em> If you find yourself stuck, the question “am I stuck
on coding, programming, or hacking?” usually dissolves it. Stuck on coding is a
syntax or API problem—trivial, look it up. Stuck on programming means you haven’t
actually decided what you’re building—go back to the design. Stuck on hacking
means you don’t understand the system you’re inside of yet—go read it. Different
diagnoses, different cures.</p>

<p>There’s a deeper version of this distinction that’s worth internalizing early:
software engineering is really just programming integrated over <em>time.</em> A throwaway
script and a system that ten people will maintain for ten years are not the same kind
of object, even if they do the same thing today. Fred Brooks made this point decades
ago and put a number on it: turning a quick program into a polished <em>product</em> other
people can use is
roughly 3× the work, and turning it into a component of a larger <em>system</em> is
another 3×—so the full “programming systems product” costs about nine times the
garage version. Most of the surprise and pain in a student’s first big project is
discovering that 9×. Know it’s coming.</p>

<h2 id="what-you-need-to-be-able-to-do">What you need to be able to do</h2>

<p>Here’s the concrete checklist I’d hand someone. Over a PhD you should, at least
once:</p>

<ul>
  <li><strong>Master one programming language</strong> completely—not “can write it,” but know it
in your bones, the way you know a spoken language you dream in.</li>
  <li><strong>Hack one large-scale system</strong>—really get inside something huge and
unfamiliar and make it do your bidding.</li>
  <li><strong>Build one system from scratch</strong>, so that you lose the fear of systems. This is
the one that changes people. Once you’ve built a database, a kernel, a compiler
from nothing, no codebase intimidates you again, because you know there’s no
magic in there—just decisions, some good, some you’d make differently.</li>
  <li><strong>Have one good visualization/tooling habit.</strong> When you can <em>see</em> what your
system is doing, debugging stops being archaeology.</li>
</ul>

<p>And in your own area specifically: you need to be able to <strong>prototype fast</strong>, and
you need to <strong>truly understand the layers beneath you</strong>—the OS, the hardware, the
network—well enough that they can’t fool you. The general principle: the hardware
cannot work optimally without help from the programmer, and you cannot give that help
if the layer below you is a black box. Most subtle systems bugs and most surprising
performance results live exactly at
the seam where your mental model of the layer below diverges from what it actually
does. Go down a level. It’s almost always worth it.</p>

<p>On the “don’t fear systems” point, one practical note about reading code, since
diving into huge codebases scares people. You don’t really <em>read</em> code, you <em>decode</em>
it: a piece of code is not literature, it’s a specimen. You don’t read a million-line
system front to back like a novel. You find a thread—one
request, one syscall, one function—and you pull it, and you follow it down, and
you ignore the other 999,000 lines until you need them. Reading code is an active,
surgical act, not a passive one. Once you believe that, large systems stop being
walls and start being mazes, which at least have a path through.</p>

<h2 id="deliberate-practice-not-ten-thousand-hours">Deliberate practice, not ten thousand hours</h2>

<p>People love the “10,000-hour rule.” It’s wrong, or at least badly mangled—there is
nothing magical about ten thousand hours, and the researcher whose work the rule was
built on spent years saying so. The hours aren’t the mechanism. What builds skill is
<strong>deliberate practice</strong>: working at the edge of your ability, on
something just past what you can comfortably do, with feedback that tells you
specifically what was wrong. Ten thousand hours of comfortable repetition makes you
exactly as good as you were at hour one. An hour of the uncomfortable kind, with
real feedback, moves you.</p>

<p>For us, the uncomfortable kind looks like: writing the part you’re avoiding because
you’re not sure you can; profiling the system you assume you understand and being
wrong; submitting the paper and reading every brutal review instead of flinching
away. The feedback is the whole point—which is, not coincidentally, another
argument for the apprenticeship, because a good advisor is a feedback machine
calibrated to your specific weaknesses.</p>

<h2 id="build-a-system-for-yourself">Build a system for yourself</h2>

<p>Here’s a strange blind spot: systems people almost never build a system for their
own work. Look at what your research life actually is—a stream of inputs arriving
faster than you can process them (papers, ideas, deadlines, threads, half-thoughts),
hard latency requirements, and a tiny leaky cache called your memory. You would
never run a service this way. Yet most students try to hold it all in their heads,
drop things, and—worse—burn the attention they needed for thinking on the
holding. The principle is one sentence: your mind is for <em>having</em> ideas, not
<em>holding</em> them. Get the open loops out of your head and into something you trust, and
treat your own workflow as a system to be engineered, not a fixed trait of your
personality.</p>

<p>Two parts of that system aren’t optional. The first is <strong>note-taking that is
actually thinking</strong>, not transcription. Your notes aren’t a record of the work; done
right, they <em>are</em> the work—the place where the thinking actually happens. Note-taking
done well is generating and judging on paper, the taste loop from the last post
externalized and made durable.</p>

<p>The second is the discipline to actually run the system, which is harder than
picking one. Here’s the line worth tattooing on every first-year: you do not rise to
the level of your goals, you fall to the level of your systems. Everyone wants to do
great research; the wanting is uniform and useless. What differs between
people is the boring daily machinery—when they read, how they capture, whether the
deadline gets respected. So if you adopt a system, give it a fair trial: run it
strictly for at least two weeks before you judge it. Most people quit a good system
in three days because it feels awkward, which is just the cost of any new skill and
tells you nothing.</p>

<h2 id="ship-deadlines-estimation-and-what-to-cut">Ship: deadlines, estimation, and what to cut</h2>

<p>Last, the part that separates research that exists from research that almost
existed: <strong>finishing.</strong> Three things to know.</p>

<p>You are bad at estimating, and not randomly—<em>predictably</em> bad, in one direction.
It’s called the <a href="https://en.wikipedia.org/wiki/Planning_fallacy">planning fallacy</a>:
we systematically underestimate how long our own tasks will take, even when we have
mountains of evidence that we always run long. The fix isn’t optimism or willpower;
it’s taking the <em>outside</em> view—ask how
long things <em>like</em> this have actually taken you before, and trust that number over
the story in your head.</p>

<p>Given that everything takes longer than you think, you will not finish everything,
which means <strong>knowing what to cut is an execution skill, not a failure of one.</strong>
Herbert Simon’s word for the right behavior is <em>satisficing</em>: don’t optimize every
component, get each one good enough and move on. The Pareto split is brutally real
in systems work—something like 20% of the effort gets you 80% of the result.
Find that 20% for each piece, ship it, and spend your remaining time only on the
parts where excellence actually changes the contribution.</p>

<p>And iterate fast. Make it work, then make it right, then make it fast—in that
order, never reversed. Design your whole setup around an immediate connection between
what you change and what you see: the shorter the loop between changing something and
seeing what changed, the more times you can run it, and execution—like idea
generation—rewards more turns of the loop. Tight loop, many iterations, finished
system.</p>

<p>Underneath all of it—the three kinds of building, the layers, the tooling, the
deadlines—execution is one thing: building an accurate enough model of the machine
that it does what you intend instead of what you assumed. The bugs live exactly
where your model and the machine disagree. That’s the second system you learn to
model. The third pushes back hardest of all, because it has a mind of its own:
other people.</p>

<p><em>Next: <a href="/naizhengtan_blog/posts/communication-standing-where-the-other-stands/">Communication</a>—getting
the finished thing out of your head and into everyone else’s.</em></p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="research" /><category term="phd" /><category term="advice" /><category term="execution" /><summary type="html"><![CDATA[This is the second of three posts on the pillars of research. The first was taste, and it argued that having a good idea is hard—rare, the tail of a distribution, the slow fruit of trained judgment. So the title of this post is going to sound like a contradiction: the idea is the easy part. It isn’t a contradiction. It’s a change of frame. Conceiving the idea is hard; but once taste has handed you an idea worth a year, making it true is usually where the whole thing lives or dies—and that is the part people wave away as “just engineering.” That dismissal is the mistake this post is about.]]></summary></entry><entry><title type="html">Communication: Standing Where the Other Stands</title><link href="/naizhengtan_blog/posts/communication-standing-where-the-other-stands/" rel="alternate" type="text/html" title="Communication: Standing Where the Other Stands" /><published>2026-06-21T09:00:00-04:00</published><updated>2026-06-21T09:00:00-04:00</updated><id>/naizhengtan_blog/posts/communication-standing-where-the-other-stands</id><content type="html" xml:base="/naizhengtan_blog/posts/communication-standing-where-the-other-stands/"><![CDATA[<p>This is the last of three posts on the pillars of research, after
<a href="/naizhengtan_blog/posts/good-ideas-are-rare-taste-is-the-search/">taste</a> and
<a href="/naizhengtan_blog/posts/execution-the-idea-is-the-easy-part/">execution</a>. You
have judged a good idea and built a real thing. It is sitting on your hard drive,
true and working and invisible. Communication is the pillar that decides whether it
ever becomes part of anyone else’s thinking—and it’s the one students neglect
most, because it feels like packaging applied to the real work after the real work
is done. That’s exactly backwards, and showing you why is most of this post.</p>

<p>The title is the whole method, borrowed from a Chinese phrase I keep returning to:
<em>易地而处</em>—stand where the other stands; put yourself in their place. Every rule
below is a special case of it. Writing, speaking, and the daily honesty of working
with people all come down to the same move: stop modeling your own head and start
modeling theirs.</p>

<h2 id="writing-is-not-reporting-writing-is-thinking">Writing is not reporting. Writing is thinking.</h2>

<p>Start with the deepest misconception, because it changes how you should work, not
just how you should write. Most people think the sequence is: have the idea, do the
research, then write it up. Write<em>-up</em>—as if writing were transcription, the
clerical step at the end.</p>

<p>It isn’t. Writing is how you find out whether you actually have an idea. You know
the feeling of a thought that seems complete and powerful in your head—and then
you sit down to write it and it dissolves into mush, because it was never as
finished as it felt. Ideas can feel complete; it’s only when you try to put them
into words that you discover they’re not. And the flip side, the real payoff: half
the ideas that end up in a piece are ones you thought of <em>while writing it.</em> Writing
doesn’t just expose the holes; it generates the patches. Leslie Lamport’s version is
the one I quote to students: <em>if you’re thinking without writing, you only think
you’re thinking.</em></p>

<p>Simon Peyton Jones turns this into a method, and it’s the single most useful piece
of advice I can give a new researcher. The naive model is <em>idea → do research →
write paper.</em> His model is <em>idea → write the paper → do the research.</em> Start writing
the paper almost immediately, while the work is still half-formed, because <em>“writing
the paper is how you develop the idea in the first place.”</em> The paper isn’t the
report of the research. The paper is an instrument <em>for doing</em> the research—it
forces you to be clear, it crystallizes what you don’t understand, and it shows you,
early and cheaply, which parts are actually hard. Don’t wait until you “have
something to write up.” Write to find out if you do.</p>

<h2 id="writing-mechanically">Writing, mechanically</h2>

<p>Once you accept writing as thinking, the craft rules follow from <em>易地而处</em>—from
relentlessly taking the reader’s side.</p>

<p><strong>Your reader doesn’t care what you know.</strong> Writing isn’t about communicating your
ideas to your readers; it’s about <em>changing</em> their ideas. The reader is not asking
“why do you think that?”—they’re asking <em>“why should I think that?”</em> They don’t owe
you a reading; you owe them a reason to keep going. Every sentence is spending their
patience, and you’d better be buying something with it.</p>

<p><strong>One idea.</strong> Peyton Jones again: a paper should have a single “ping”—one clear,
sharp point. <em>“You want to infect the mind of your reader with your idea, like a
virus.”</em> A virus carries one payload. If your paper has five contributions, your
reader will remember none of them; if it has one, sharp and well-aimed, they’ll
carry it out of the room. Decide what the one thing is. You may not know at the
start—but you must know by the end.</p>

<p><strong>Don’t make them walk your path.</strong> The single most common flaw in student drafts:
recounting the project in the order it happened. <em>“Do not recapitulate your personal
journey of discovery,”</em> Peyton Jones says. <em>“This route may be soaked with your
blood, but that is not interesting to the reader.”</em> They don’t want the maze you
wandered—they want the straight road to the idea, the one you can only draw <em>after</em>
you’ve escaped. Your suffering is not structure.</p>

<p>And a concrete process, the one I actually use: <strong>flow → bullets → prose → paper.</strong>
First the <em>flow</em>—the logical skeleton, the argument’s shape, before any sentences.
Then <em>bullets</em> fleshing each beat. Only then <em>prose</em>. Only then <em>polish.</em> The
mistake is starting at prose—writing beautiful paragraphs about points that, it
turns out, are in the wrong order or shouldn’t exist. Get the skeleton right while
it’s cheap to move bones around. Sentences are expensive to write and emotionally
expensive to delete, so don’t write them until the structure is settled.</p>

<h2 id="talks-you-have-two-minutes-and-one-job">Talks: you have two minutes and one job</h2>

<p>A paper and a talk are different instruments and people keep confusing them. A
paper is for the record, read alone, re-readable. A talk is live, linear,
un-rewindable, and the audience is exhausted and skeptical and checking their phone.
<em>易地而处</em>: design for <em>that</em> person, not for an attentive reader who doesn’t exist.</p>

<p>The purpose of a talk is not to convey your results. Peyton Jones is blunt: the goal
is <em>“to give your audience an intuitive feel for your idea”</em> and <em>“to make them
eager to read your paper”</em>—not to present every detail, and emphatically <em>not “to
impress your audience with your brainpower.”</em> His budget for a conference talk is
worth memorizing: <em>“Motivation (20%) + your key idea (80%). Nothing else.”</em> You
cannot transmit the paper in twenty minutes. Don’t try. Make them <em>want</em> the paper.</p>

<p>Patrick Winston taught a famous talk at MIT for forty years, and two of his rules I
pass on constantly. <strong>Start with a promise</strong>—in the first minute, tell them
exactly what they’ll know or be able to do by the end; give them a reason to stay.
And <strong>don’t open with a joke</strong>—they’re still settling in, still adjusting to your
voice, not ready for it; you’ll get the laugh later when you’ve earned the room. (He
also held that there is <em>“a special circle in hell for those who use laser
pointers”</em>—a conviction I have come to share.)</p>

<p>The deepest enemy of a good talk has a name: the <strong>curse of knowledge.</strong> Once you
know something, you cannot imagine not knowing it, and so you skip the very steps
your audience needs. There’s a much-loved illustration: people tap out a famous song
on a table—just the rhythm—and badly overestimate how many listeners can name it,
because inside their own heads they hear the whole song, melody and all, while the
listeners get only the bare taps. That gap—between the song in your head and the
bare taps the audience actually receives—is every talk you’ve sat through and not
understood. Your expertise is the tapping. Fight it by, again, <em>易地而处</em>: sit in your
own audience’s chair and ask what they actually have in their heads right now, which
is almost nothing of what’s in yours.</p>

<p>One underrated, concrete thing: <strong>delivery, including your voice.</strong> Students obsess
over slides and ignore how they sound, but a talk is <em>heard.</em> I’ll add the necessary
caveat, since it’s everywhere: the famous “93% of communication is nonverbal”
statistic is a myth—a real study, badly generalized, that only ever applied to
mismatched emotional signals. Don’t believe the number. <em>Do</em> believe the underlying
point: how you say it carries real weight, and a monotone will bury a good idea.
Practice out loud. Hear yourself.</p>

<h2 id="the-quiet-kind-talking-to-the-people-you-work-with">The quiet kind: talking to the people you work with</h2>

<p>The flashy communication is papers and talks. The kind that determines whether your
projects live or die is the daily one—how you talk to your collaborators and your
advisor. Two principles.</p>

<p><strong>Be honest, fast, especially when it’s bad.</strong> When something goes wrong—a result
doesn’t replicate, you’ve fallen behind, you broke the build—the instinct is to
hide it until you’ve fixed it. That instinct is the single most expensive habit in
research. Tell me immediately. Yes, we might be frustrated for a moment; that
passes, and a problem surfaced early is cheap while a problem hidden for a month can
be fatal. Toyota built the best manufacturing system in the world partly on one
idea: any worker on the line can pull a cord and stop <em>everything</em> the instant they
see a defect—and when they pull it, a leader comes <em>to help,</em> not to punish.
Pulling the cord is the heroism, not the failure. Build that cord into how we work.
Pull it early.</p>

<p>This only functions on a foundation with a name: <strong>psychological safety</strong>—the
shared sense that the team is safe for taking interpersonal risks, that you can admit
a mistake or ask a dumb question without being punished for it. When teams are
studied for what makes them effective, this keeps coming out near the top, ahead of
raw talent or experience. That’s most of what I mean when I tell a student <em>I’m on
your side.</em> I’m not the examiner waiting for you to slip. We are pointed at the same
problem, and the work goes faster if you can tell me the truth without bracing for
impact. The aim is candor <em>with</em> care: challenge people directly precisely because
you’re on their side. Care without challenge is useless flattery; challenge without
care is just cruelty. Both at once is how good groups actually talk.</p>

<p><strong>Talk outward, too.</strong> Three rings, all worth your time. Talk to people <em>outside</em>
your field—including, now, to AI, which is an endlessly patient outsider to every
field at once; explaining your problem to something that doesn’t share your
assumptions is one of the fastest ways to find them. Talk to people in your broad
community, systems people generally, who share your language. And talk to the
handful working your exact problem—track at least one line of work closely enough
that, when people in the area think of it, they think of you. That last one is how a
research identity actually forms: not by announcing it, but by being reliably
present in one conversation until it becomes yours.</p>

<h2 id="the-pillar-and-the-point-of-all-three">The pillar, and the point of all three</h2>

<p>Communication is the pillar that makes the other two count. Taste with no
communication is private good judgment nobody benefits from. Execution with no
communication is a working system nobody adopts. The work becomes <em>real</em>—enters
the shared body of what the field knows—only when it crosses from your head into
others’. And every technique for making that crossing reduces to the same move:
<em>易地而处</em>, leave your own head and stand in theirs.</p>

<p>Which is, when you zoom out, what all three pillars have been about—the promise I
made at the very start of this series, now come due. Taste is modeling the field
well enough to know what it needs. Execution is modeling the machine well enough to
make it obey. Communication is modeling other people well enough to reach them.
Research is, in the end, three kinds of empathy—for the problem, for the system,
and for the people—three different targets, one underlying skill: getting outside
your own head and being right about what’s there instead.</p>

<p>That’s why I opened with a Tang-dynasty teacher and I’m closing with a Tang-dynasty
phrase, and why they turn out to be the same idea. 传道, 授业, 解惑—transmit the
Way, impart the skills, resolve doubts—are three faces of 易地而处, standing where
the other stands: the field, the machine, the person across the table. It’s the Way
I’m trying to transmit, and it can’t really be transmitted on a page. The rest is
doing it next to someone, year after year, until it’s yours.</p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="research" /><category term="phd" /><category term="advice" /><category term="communication" /><summary type="html"><![CDATA[This is the last of three posts on the pillars of research, after taste and execution. You have judged a good idea and built a real thing. It is sitting on your hard drive, true and working and invisible. Communication is the pillar that decides whether it ever becomes part of anyone else’s thinking—and it’s the one students neglect most, because it feels like packaging applied to the real work after the real work is done. That’s exactly backwards, and showing you why is most of this post.]]></summary></entry><entry><title type="html">Generation Got Cheap. Judgment Didn’t.</title><link href="/naizhengtan_blog/posts/generation-got-cheap-judgment-didnt/" rel="alternate" type="text/html" title="Generation Got Cheap. Judgment Didn’t." /><published>2026-06-19T00:00:00-04:00</published><updated>2026-06-19T00:00:00-04:00</updated><id>/naizhengtan_blog/posts/generation-got-cheap-judgment-didnt</id><content type="html" xml:base="/naizhengtan_blog/posts/generation-got-cheap-judgment-didnt/"><![CDATA[<p>My field, computer systems, has a creed: <em>talk is cheap; show me the code.</em> For thirty years it was
right. Ideas were easy to narrate and hard to build, so we trusted the people who
built. Doing was the proof; talk was suspicion. That creed is exactly what AI is
now inverting—and it is why I have started blogging again.</p>

<h2 id="the-two-reasons-i-didnt">The two reasons I didn’t</h2>

<p>I avoided blogging for most of my career, for two reasons that were both true.
Writing well is expensive: not one article, but the years of apprenticeship
behind it, and then the hours of drafting and polishing per piece. And my
culture rewarded doing over saying—the person who quietly ships beats the one
who narrates brilliant ideas and finishes nothing. Underneath, those were the
same reason: words were costly to produce and cheap to fake, so words were a bad
signal.</p>

<h2 id="what-changed">What changed</h2>

<p>AI broke the first reason outright. The draft-check-polish loop that used to cost
me hours is now something I hand off and supervise—the post you are reading was
built that way, and so was the one beside it, figures and all. My job in that loop
was not the prose. It was deciding what the argument should be, and killing the
three versions that were wrong.</p>

<p>Execution is going the same way, slowly and only in parts. An agent will
implement and test a well-specified idea while you watch. It will not, yet, do
the part of my own research that is actually hard: the long tail where the code
runs, passes the happy path, and is wrong in the way that matters. (I spend my
days in that gap.) Cheap for the routine; still dear at the frontier. But the
direction isn’t in doubt.</p>

<h2 id="ideas-got-cheap-too">Ideas got cheap too</h2>

<p>Here is the part the optimistic version of this essay skips. If AI makes prose
and code cheap, it makes <em>ideas</em> cheap too—a model will brainstorm a hundred
directions before you finish your coffee. So “have ideas” cannot be what saves
us. Idea-generation deflates along with everything else.</p>

<p>What doesn’t deflate is judgment. Which of the hundred directions is worth a
week. Which clean result is actually a measurement artifact. When to kill your
own favorite. The faucet is open for everyone now; what stays rare is the taste
to know which drops to keep. Generation got cheap. Selection didn’t.</p>

<h2 id="why-that-means-writing">Why that means writing</h2>

<p>And judgment has a visibility problem. It used to ride along with the work: ship
something good and the quality vouched for the taste behind it. When the work is
cheap to produce, the work stops vouching for anything—and no one can read your
mind. The only way to show judgment is to externalize it: to argue, in public,
for this and not that. That is what a blog is. Not talk instead of doing; the
record of the choices the doing no longer reveals. It is this blog’s motto, and I
mean it now as an argument: <em>code is cheap; show me the thoughts.</em></p>

<h2 id="when-the-idea-becomes-runnable">When the idea becomes runnable</h2>

<p>A thought you write down need not only be read. For an idea that carries a cheap
test—a benchmark, a small system, a clean experiment—I can already picture
handing a blog post to an agent that builds it and reports back: <em>works,</em> or
<em>doesn’t, and here is why.</em> Not every idea; the ones whose proof is expensive are exactly the
ones that stay human. But for the rest, a written idea stops being a promissory
note and edges toward something runnable. The oldest line in my field—<em>show me
the code</em>—starts to fold into <em>show me the thought,</em> because the thought,
increasingly, runs.</p>

<h2 id="the-part-thats-still-mine">The part that’s still mine</h2>

<p>So I am blogging again. Not to talk instead of build, but because building no
longer shows what I think. The labor is getting cheap. The judgment is not. A
blog is where I keep the part of the work that is still mine.</p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="meta" /><category term="ai" /><category term="writing" /><summary type="html"><![CDATA[My field, computer systems, has a creed: talk is cheap; show me the code. For thirty years it was right. Ideas were easy to narrate and hard to build, so we trusted the people who built. Doing was the proof; talk was suspicion. That creed is exactly what AI is now inverting—and it is why I have started blogging again.]]></summary></entry><entry><title type="html">Correctness Without a Reference</title><link href="/naizhengtan_blog/posts/ai-correctness-without-a-reference/" rel="alternate" type="text/html" title="Correctness Without a Reference" /><published>2026-06-13T00:00:00-04:00</published><updated>2026-06-13T00:00:00-04:00</updated><id>/naizhengtan_blog/posts/ai-correctness-without-a-reference</id><content type="html" xml:base="/naizhengtan_blog/posts/ai-correctness-without-a-reference/"><![CDATA[<p>For as long as we have verified software, a specification has been something you
can check. Given an output, you can decide whether it is correct. Sort a list:
is the result ordered, and a permutation of the input? Two questions, both
decidable.
Most of what I have worked on stands on this:
my system <a href="https://www.usenix.org/conference/osdi20/presentation/tan">Cobra</a> verifies
whether a database served its transactions serializably;
our <a href="https://naizhengtan.github.io/doc/papers/efficient17tan.pdf">Orochi</a> project verifies
whether an untrusted server returned what the real computation would have.
Different systems, one bedrock: correctness is a property you can check.</p>

<p>Modern AI removes the bedrock. Ask a model to summarize four hundred pages, draft
the brief, or run the agent that books your travel. Look at the output and try to
decide—correct, or not? Nothing answers. The specification did not get harder
to check. It stopped being a checkable object at all.</p>

<p>The “canonical solution” is to find a reference.
Pick a trusted implementation and call the
output correct when it matches—re-execute the computation, diff against a gold
version.
It is, for example, what GPU kernel developers do, and <a href="https://naizhengtan.github.io/doc/papers/nnsmith23liu.pdf">how deep-learning compilers are tested</a>.
But equality is probably the wrong relation.
Sum a million floats in a different order and the
last bits change; the answer is still correct, yet a reference check rejects it.
And if the reference itself is wrong, you certify its bug as truth.
We do not want equality to a reference.
We want equivalence—and we have to say which differences are allowed.</p>

<h2 id="physics-settled-this-first">Physics settled this first</h2>

<p>Sit in a sealed box with no windows and drop a ball. It falls. Are you on Earth,
or in a rocket accelerating at one gravity? No measurement inside can tell.
Einstein refused to ask which one is “real”: if no observation distinguishes
them, they are the same. Sameness is not absolute. It is sameness with respect to
what you can observe.</p>

<p><img src="/naizhengtan_blog/assets/img/topic2/fig1.svg" alt="A sealed box at rest on Earth and a sealed box accelerating through deep space at one gravity: from inside, no experiment can tell them apart." /></p>

<p>We have already seen this—observational equivalence—in different areas.
The physics only makes it vivid: fix what you observe, and you fix what counts as correct.</p>

<h2 id="the-definition-for-ai-correctness">The definition for “AI correctness”</h2>

<p>So change the question. Not “did the machine produce the right answer?” but: could
a legitimate run of the machine have produced what I see?</p>

<blockquote>
  <p>correct  ⇔  there exists a legitimate run <em>e</em> with observe(<em>e</em>) = the outcome.</p>
</blockquote>

<p>An output is correct if some legitimate execution explains it. To call it wrong is
the harder claim: that no legitimate run could have produced it. The definition
turns on three choices. What makes a run <em>legitimate</em>—this is the specification,
under its true name. What makes two outcomes <em>equal</em>. And what we <em>observe</em>. The
legitimate runs are exactly the differences that do not matter: reorder the float
sum, fine; return 7 where every legitimate run yields 3, not fine.</p>

<p>If this feels familiar, it should.
It echoes several established concepts in computer science.
Take <a href="https://dl.acm.org/doi/10.1145/322154.322158">serializability</a> as an example. A database
history is correct precisely when there exists a serial order of its transactions consistent
with the reads and writes we observed, and database checkers like <a href="https://www.usenix.org/conference/osdi20/presentation/tan">Cobra</a>
spend their whole effort searching for that one legitimate order.
Serializability was <em>existential equivalence</em> all along.
We had not named the pattern.</p>

<h2 id="why-this-is-useful">Why this is useful</h2>

<p>The pattern is what excites me, because it does not break as the systems get
wilder. You climb from a chip to an agent by turning two dials—loosen what
counts as legitimate, coarsen what you observe—and the sentence still means
something. On a GPU kernel, the legitimate runs are the valid summation orders and
you observe the numbers, up to ε. In LLM inference, they are the samplings
the decoding policy allows and you observe the text (see an example in our <a href="https://naizhengtan.github.io/doc/papers/jo26shuyi.pdf">Jailbreak Oracle Problem</a>).
In an agent, they are the
allowed interleavings of tool calls and the world, and you observe the side
effects—the flight booked, the calendar updated.</p>

<p><img src="/naizhengtan_blog/assets/img/topic2/fig2.svg" alt="Existential equivalence at three altitudes---tensor, token, trace. The same question; climbing loosens legitimacy and coarsens observation." /></p>

<p>Same definition, three altitudes.
Naming it pays off twice.
The specification becomes a single object, the legitimacy predicate;
every dispute about whether an AI is “correct” turns into a concrete dispute about which runs we are willing to call legitimate.
And it shows where the work is hard, just by counting.
Exhibiting one legitimate run is enough to accept; ruling out all of them is what it takes to reject.</p>

<p>This says only that the machine ran legitimately—not that its output was wise.
Whether a result is helpful, or harmless, is a different question, and a later post.</p>

<h2 id="conclusion">Conclusion</h2>

<p>Existential equivalence is one definition that travels: an output is correct when
some legitimate run explains what we observe—whether that run is a summation
order, a sampling path, or an agent’s trace. It does not make specifying AI easy.
It tells us where the difficulty lives: in the legitimacy predicate, and in what
we choose to observe. That is a smaller, sharper question than “is the AI right,”
and a better place to start.</p>

<p>Every correctness question is a windowless box.
The only decision that matters is what you let yourself observe.
Make it honestly, and correctness stops being a verdict you pronounce.
It becomes an explanation the world can offer—or cannot.</p>

<h2 id="references">References</h2>

<ul>
  <li>Cheng Tan, Changgeng Zhao, Shuai Mu, and Michael Walfish.
<a href="https://www.usenix.org/conference/osdi20/presentation/tan">Cobra: Making Transactional Key-Value Stores Verifiably Serializable</a>.
OSDI 2020.</li>
  <li>Cheng Tan, Lingfan Yu, Joshua B. Leners, and Michael Walfish.
<a href="https://naizhengtan.github.io/doc/papers/efficient17tan.pdf">The Efficient Server Audit Problem, Deduplicated Re-execution, and the Web</a>.
SOSP 2017.</li>
  <li>Christos H. Papadimitriou.
<a href="https://dl.acm.org/doi/10.1145/322154.322158">The Serializability of Concurrent Database Updates</a>.
Journal of the ACM 26(4), 1979.</li>
  <li>Jiawei Liu, Jinkun Lin, Fabian Ruffy, Cheng Tan, Jinyang Li, Aurojit Panda, and Lingming Zhang.
<a href="https://naizhengtan.github.io/doc/papers/nnsmith23liu.pdf">NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers</a>.
ASPLOS 2023.</li>
  <li>Shuyi Lin, Anshuman Suri, Alina Oprea, and Cheng Tan.
<a href="https://naizhengtan.github.io/doc/papers/jo26shuyi.pdf">Toward Principled LLM Safety Testing: Solving the Jailbreak Oracle Problem</a>.
MLSys 2026.</li>
</ul>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="verification" /><category term="correctness" /><category term="ai" /><summary type="html"><![CDATA[For as long as we have verified software, a specification has been something you can check. Given an output, you can decide whether it is correct. Sort a list: is the result ordered, and a permutation of the input? Two questions, both decidable. Most of what I have worked on stands on this: my system Cobra verifies whether a database served its transactions serializably; our Orochi project verifies whether an untrusted server returned what the real computation would have. Different systems, one bedrock: correctness is a property you can check.]]></summary></entry><entry><title type="html">Hello, and what this is</title><link href="/naizhengtan_blog/posts/hello/" rel="alternate" type="text/html" title="Hello, and what this is" /><published>2026-06-12T00:00:00-04:00</published><updated>2026-06-12T00:00:00-04:00</updated><id>/naizhengtan_blog/posts/hello</id><content type="html" xml:base="/naizhengtan_blog/posts/hello/"><![CDATA[<p>A paper is the polished end of a long, messy process. By the time it ships,
the doubts are sanded off and the dead ends deleted—good for the reader,
bad for the record. The most useful part, <em>how you actually got there</em>, is
exactly what gets thrown away.</p>

<p>And now we face the AI era, staring straight at it: much of what we stood on,
and believed deeply, may no longer be true. When the artifact gets cheap, the
thinking is what’s left.</p>

<p>So this is the other thing—a place for the rough cut. Notes on systems,
verification, and faithful LLM systems. Arguments with myself. Ideas that
aren’t papers and may never be.</p>

<blockquote>
  <p>Code is cheap. Show me the thoughts.</p>
</blockquote>

<p>—Cheng Tan</p>]]></content><author><name>Cheng Tan</name><email>c.tan@northeastern.edu</email></author><category term="meta" /><summary type="html"><![CDATA[A paper is the polished end of a long, messy process. By the time it ships, the doubts are sanded off and the dead ends deleted—good for the reader, bad for the record. The most useful part, how you actually got there, is exactly what gets thrown away.]]></summary></entry></feed>