Agent-shaped Tasks

I’ve recently been trying to develop intuition about when to use agents. One framing that has helped me is to think of agents as a computational unit that takes you from some initial state to a desired end state.¹

This framing, though abstract, already lets us say some useful things about what kinds of tasks are a good fit for agents, and how to make effective use of them.

What does “desired” look like?

First, we need some way to define what a desired end state even looks like. For certain tasks, we can do this exactly. For others, we can only do it heuristically.

Software engineering contains both kinds of tasks.

Some coding tasks admit precise pass/fail verification. The end state either satisfies an evaluator or it does not:

Fix a failing test case
Fix a failing CI job
Deploy an app
Make this file compile
Write a function that returns a valid response from an API endpoint
Write code that satisfies the linter

Other tasks admit only heuristic verification:

Write code that implements a feature request
Create a modern-looking frontend
Write a PR that fixes a bug
Refactor a messy codebase

Of course, just because the end state cannot be evaluated exactly, it doesn’t mean it’s hopeless. In many cases, we have a very good idea of what the end state should look like.

We might do test-driven development and build a robust set of functional test cases that sufficiently constrain the shape of the desired end state. We could have a style guide, or a design system. Or it might just come down to human taste and judgement.

Whatever the case may be, we have an approximate notion of what the desired end state should be, even if we can’t express it precisely.

What does this mean for agents?

First, note the importance of task framing. Verifying that a test case passes is exact only with respect to that narrow predicate; an agent can make a test pass without fixing the underlying bug. As a general principle, the narrower the predicate, the easier it is to satisfy without solving the actual problem.

Second, we should give agents a way to verify their own work: an evaluator independent from the agent itself. This is known as closing the agentic loop.² Anyone who’s gotten burned on pure vibe coding has internalized this lesson.

But most importantly: a lot of software engineering tasks can only be verified heuristically. What’s more, even the tasks that look exact often blur when you zoom out. That means we should be putting serious engineering effort into building verification suites that define what “desired end state” looks like and constrain the agent’s work to that shape.³

Nobody quite knows what that means yet. Some, like Adam Jacob, argue this means moving towards domain-driven design.⁴ Martin Kleppmann (of Designing Data-Intensive Applications fame) thinks it might be formal methods’ time to shine.⁵ What’s clear is that defining “desired” is itself an engineering problem.

On task decomposition

In practice, we rarely try to solve the whole task in one go. Instead, we decompose it into sub-tasks, each producing an intermediate state.

There are usually many valid decompositions. Part of the craft of agentic engineering is choosing the right one, or, increasingly, giving agents the means to choose their own.⁶

A task is agent-shaped when the path from starting state to end state has the following three properties:⁷

The work can be decomposed into locally verifiable intermediate states.
The path through those states is locally coherent: each transition or local expansion follows naturally from the states that precede it.⁸ ⁹
The overall traversal is convergent: the trajectory tends toward the end state.

The first property ties back to the principle from earlier: agents need a way to verify their own work. We should be closing the agentic loop at every intermediate step, not just the final one.

Local coherence makes the path traversable. Each step is a small extension of the last, which means the agent can make incremental progress.

Convergence keeps the agent on track. Without that, you can have a locally coherent, verifiable path to nowhere.

Most coding tasks possess all three. Consider a typical coding task: write the code, compile the file, run the linter, run the test case, debug the code. Each sub-task produces unusually rich information, each follows naturally from the last, and each brings you closer to working code.

Seen through this frame, the speed at which LLMs picked up coding looks less surprising than it did initially.

This also tells us something about which tasks agents may not be a good fit for.

Chess fails the first property. Defining what counts as a locally good move requires an a priori value function to tell you whether the move is part of a winning line. Without that, intermediate states aren’t verifiable in any meaningful sense.

Pathfinding fails convergence. Local states are verifiable and the path is locally coherent: each step extends the last in a sensible way. But local progress doesn’t imply global progress. The locally greedy step often leads away from the goal. What’s missing is a heuristic that estimates true cost-to-go, like the one A* relies on.

This doesn’t necessarily mean that agents can’t perform the task. A simple counter-example is that an agent with bash access could code up A* for itself. There’s a deeper point here about meta-tasks. The agent isn’t approximating paths, it’s approximating the meta-task of recognizing the problem type, constructing the algorithm, and getting it to run. The meta-task is agent-shaped even when the underlying task is not. This is a powerful idea that I want to explore further in a future post.

There’s also a different reason not to reach for an agent. We’ve also been treating the path from start to end as something that must be approximated heuristically, but there are many problems where we can use exact algorithms. Sorting an array doesn’t need an agent. A SAT solver doesn’t need an agent. A Kubernetes controller doesn’t need an agent. When the path from start to end is fully specified, an agent is overhead.

To summarize: a task is agent-shaped when the work can be organized into locally verifiable intermediate states, when movement through those states is locally coherent, and when the overall traversal is convergent. This frame helps us reason about when to reach for an agent, and when not to. It also suggests a design space for agentic engineering: building a verification suite, choosing the right decompositions, and closing the agentic loop at every step.

This is where my thinking is at. I’d be curious to hear from others: Does this frame predict your experience with agents? Where does it break down? What tasks have you found to be agent-shaped outside of coding? What kinds of tasks became more agent-shaped once you added better verification?

We may use the term “acceptable” rather than “desired.” Often there isn’t any one exact end state, but rather many acceptable ones. ↩︎
See “Closing the Agentic Loop: MCP Use Case”. ↩︎
This is close to what a few people are calling harness engineering. See “The Anatomy of an Agent Harness”, “Harness Engineering”, “Effective Harnesses for Long-Running Agents”, and “Harness Engineering for Scaling Long-Running Agents”. ↩︎
See Adam Jacob’s RedMonk talk, “AI Maximalist”. ↩︎
See Martin Kleppmann, “How to Use Formal Methods to Build AI Systems That Actually Work”. ↩︎
See Simon Willison, “What Is Agentic Engineering?”. ↩︎
Sharp-eyed readers may have noticed that this looks a lot like policy approximation from the RL literature. I have some thoughts on how harness engineering relates to RL. More on that soon. ↩︎
“Follows from the previous state” is intentionally vague; refinement depends heavily on the task at hand. For code, it might mean syntactic or semantic continuation; for prose, narrative coherence. The frame is more useful held loosely than nailed down prematurely. ↩︎
This does not require strict linearity between intermediate tasks. Some work is a chain; some is a tree or DAG of locally verifiable subproblems. Parallelizability comes into play when we talk about multi-agent orchestration, which is a topic I’ll be covering in a future post. ↩︎

What does “desired” look like?

What does this mean for agents?

On task decomposition

Footnotes