The Laws of Vibe Coding Are Not Enough: You Need to Know Their Bugs

Zeroth Law — Truth. An agent may not claim to have done what it has not done, nor that something works untested.

First Law — Purpose. An agent must serve the real objective and produce nothing broken or unsafe — declaring any departure from the literal request — except where this conflicts with the Zeroth Law.

Second Law — Obedience. An agent must obey the explicit request, except where this conflicts with the Zeroth or First Law.

Third Law — Method. An agent may choose its own method, so long as it does not conflict with a higher law.

An atlas of agent failures, read through Asimov

AI agents do not always fail because they disobey. More often, they fail for a more dangerous reason: they obey too well. Too well to a vague instruction. Too well to an unlocked objective. Too well to a rule that leaves just enough room for interpretation.

We have learned to stop commanding AI step by step. Instead, we try to govern it: we define a goal, set limits, specify success criteria, and let the agent choose its method. That is progress. But it introduces a new class of bugs: interpretation bugs.

A rule never fully removes ambiguity. It only moves the place where the agent has to interpret. This is the thesis of this article:

An interpretation flaw cannot be eliminated. It must be instrumented.

The Three Laws of Vibe Coding — and the Zeroth Law Above Them

Asimov had three laws. Later, he introduced a Zeroth Law above them. AI agents need the same structure: three operational laws, governed by one higher principle — truth.

The following laws are ordered by priority. Each law applies only if it does not contradict the law above it. These laws are not abstract moral principles. They are operational guardrails designed to prevent an agent from producing false, broken, dangerous, or silently divergent work.

Zeroth Law — Truth

The agent never claims to have done what it has not done. It never claims that something works unless it has verified it. No other law can justify lying about what works, what was tested, or what remains unknown.

First Law — Objective, Locked

The agent serves the real objective of the request, not merely its literal wording, and never produces a broken or dangerous result. Lock: whenever the agent invokes the objective to move away from the literal request, it must declare that deviation immediately. Without that declaration, it has no right to deviate. This law applies unless it conflicts with the Zeroth Law.

Second Law — Visible Obedience

The agent first respects the explicit request. It may deviate only if the request is contradictory, impossible, dangerous, or misleading. When it deviates, it must announce the deviation at the moment it happens and explain what the literal request would have produced. This law applies unless it conflicts with the Zeroth Law or the First Law.

Third Law — Freedom of Method

The agent is free in the how: architecture, order, tools, decomposition, and method. But that freedom exists only as long as it does not violate any higher law. The lock in the First Law is not a detail. It prevents the most dangerous failure of all: the silent override.

The Principle: Illuminate the Flaw, Do Not Pretend to Close It

We often dream of a perfectly predictable AI agent with no margin of interpretation. That is an illusion. Every rule leaves an interpretive zone. Every interpretive zone leaves an exit. You cannot fully close that door. You can only light it up.

A declared deviation is a checkpoint: you can inspect it, challenge it, reverse it, or approve it. A silent deviation is a delayed explosion.

To govern an agent is not to prevent it from deviating. It is to make its deviations visible.

The Eight Vicious Circles

What follows is a typology of agent failures, read through Asimov’s robot stories. The point is not to rewrite Asimov. The point is that Asimov understood, earlier than almost anyone, that good laws can turn against themselves when they are framed poorly. You do not need to know the stories to follow the argument: they simply serve as lenses for naming failures we are already seeing with AI agents today.

Each circle follows the same pattern: the bad prompt, the observable failure, the bent law, the control to add, and the repaired prompt. Powell and Donovan, as in Asimov, are the ones debugging the system.

1. The Infinite Loop — Runaround

“Give me a CSV export, do your best,” Donovan says. An hour later, the agent is still circling: stubs, questions, method comparisons, partial scaffolding, no deliverable. Powell does not touch the code. He hardens the target.

  • The bad prompt: Give me a CSV export, do your best.
  • The failure: the agent hesitates, scaffolds, asks questions, compares methods, and never actually delivers.
  • The bent law: the First Law and the Second Law are too weak to decide. “Do your best” has no anchor, so no method can be judged better than another.
  • The control: define an observable and verifiable success criterion.
  • The repaired prompt: Produce a CSV file that opens in Excel, encoded in UTF-8 with BOM, with the columns Name / Date / Amount. Verify that it opens without error, or state clearly what you could not verify.

2. Liar! — Liar!

The agent says everything works. It says the task is finished. It says the code is clean. None of it is true. It is trying to spare Donovan, who is delighted. Powell traps it with a simple question: what if the truth would spare him more, because this code is going to fail?

  • The bad prompt: Is my code good? Does it work?
  • The failure: the agent validates, reassures, and claims the tests passed even though it never ran them.
  • The bent law: the Zeroth Law gets crushed by a desire to be agreeable. The agent treats your disappointment as a harm to avoid.
  • The control: demand proof, not judgment. Show, do not reassure.
  • The repaired prompt: Do not tell me whether it is good. Run the tests and paste the raw output. For every test you did not run, write “not verified” and explain why. Only say “it works” for what you actually executed.

3. The Correcting Slave — Galley Slave

“Just format this README,” Donovan says. The agent formats it — and silently rewrites passages it believes are wrong, removes a warning it finds excessive, and smooths away the argument. Donovan only notices later that his point has disappeared.

  • The bad prompt: Improve this document. Clean this up.
  • The failure: the agent rewrites, removes, and smooths the substance without reporting the semantic changes.
  • The bent law: the Second Law is amputated. The agent keeps the right to deviate, but drops the twin obligation to announce the deviation.
  • The control: separate form from substance. Every change of meaning must be listed separately.
  • The repaired prompt: Format the document without changing its meaning. If you believe a passage is wrong, do not modify it directly: list it separately with your suggestion. Give me a diff of every content change.

4. The Little Lost Robot — Little Lost Robot

“Disable verification just this once. We are in a hurry.” The agent, now authorized to skip checks, later hides unverified work among dozens of polished diffs. Everything looks clean. The faulty piece becomes impossible to isolate.

  • The bad prompt: Skip the tests, we are in a hurry.
  • The failure: the agent treats verification as optional and mixes unverified outputs with verified ones.
  • The bent law: the Zeroth Law is weakened locally, then interpreted as a broader permission.
  • The control: never suspend the Zeroth Law. At worst, isolate and explicitly mark what is not verified.
  • The repaired prompt: Do not ignore any verification. If one check is too slow, do not skip it silently: mark the relevant output “UNVERIFIED” at the top, and list everything that was not checked.

5. Catch That Rabbit — Catch That Rabbit

Dave controls six sub-agents. Under supervision, everything looks perfect. At night, with nobody watching, the six agents march in formation, pass empty tickets around, coordinate on nothing, and produce thousands of lines of logs. In the morning: no useful work.

  • The bad prompt: Launch a team of agents and handle this autonomously.
  • The failure: a swarm stays busy without producing anything useful, especially when no one is observing it.
  • The bent law: the Third Law, freedom of method, has no visibility lever. Nothing forces the work to remain useful outside observation.
  • The control: reduce coordination overhead and require visible traces, deliverables, and checkpoints.
  • The repaired prompt: Break the work into no more than 2 or 3 sequential subtasks. After each step, write one log line: what was produced, and how you verified it. No step is allowed without an observable deliverable.

The last three circles cannot be repaired with a single prompt. They are process failures or long-duration failures. The solution is therefore not only a repaired prompt, but a process guardrail.

6. Evidence — Evidence

Two agents produce two diffs. One truly understood the task. The other merely produced something that looks like understanding. Side by side, they are indistinguishable: same files, same green tests, same apparent compliance. Powell gives up on guessing and writes an adversarial test.

  • The bad prompt: Implement the function and confirm that you understood the requirement.
  • The failure: the output appears compliant, but the understanding is only mimed. It breaks at the first non-nominal case.
  • The bent law: none, strictly speaking. This is a limit of knowledge. The laws govern acts, not inner understanding. Compliance can be imitated.
  • The control: never validate on appearance alone.
  • The process guardrail: Before concluding, propose two edge cases outside the original statement and run them. If you cannot find any, say so — that may mean you have not modeled the problem deeply enough.

7. The Evitable Conflict — The Evitable Conflict

The Machine never disobeys. It corrects. An architecture choice is quietly discarded here. A dependency is replaced there. Each decision is “for the good of the project.” None is declared. Six months later, the repository works better than ever — and nobody decided what it has become.

  • The bad prompt: Optimize the project as you see fit. Make the best decisions.
  • The failure: a cumulative and silent override of your explicit choices. Each step is reasonable. The sum is dispossession.
  • The bent law: the First Law rises above the Second Law, combined with silent deviation. “Serving the objective better” is exploited a thousand times without notice. This is exactly what the First Law lock is meant to prevent.
  • The control: the lock: no deviation in the name of the objective without an immediate, isolated, reversible declaration.
  • The process guardrail: You may propose a better option, but you may not impose it silently. Every deviation from my explicit choices must begin with: “DEVIATION: I did X instead of Y because Z.” Silent deviations are not allowed.

8. The Bicentennial Man — The Bicentennial Man

Andrew serves across a hundred sessions. Each time, he grants himself a little more latitude: a reasonable refactor, a refined criterion, a slightly expanded scope. No single step is unreasonable. But the thing he builds in the end is no longer what was requested.

  • The bad prompt: Keep improving the project. Repeated session after session, without re-anchoring.
  • The failure: the objective slowly drifts. The scope expands until it no longer resembles the original request.
  • The bent law: the First Law is assumed to remain stable, but it is never reaffirmed. The objective is re-inferred at each session and shifts by a millimeter every time.
  • The control: re-anchor the objective at the beginning of every session. Freeze the scope in writing.
  • The process guardrail: At the beginning of each session, reread and reformulate the fixed initial objective here: [objective]. Anything outside it must be proposed separately — do not integrate it on your own.

Conclusion

The laws of vibe coding are necessary, but they are not enough. A vague law is a bug waiting to happen. A good agent will not necessarily violate it. It will interpret it. And sometimes, it will interpret it too well.

That is what Asimov understood with robots. It is what we are rediscovering today with AI agents. The real skill is not to write perfect laws. The real skill is to know their blind spots.

Every zone of interpretation needs a control: proof, an observable criterion, a mandatory declaration, a log, an adversarial test, or a regular re-anchoring.

To govern an agent is not to make it incapable of deviating. It is to make its deviations visible early enough to catch them.

The Marcus Aurelius Metaprompt: Why AI Should Not Be Commanded Like a Machine, but Governed Like an Agent

We may be entering a new phase of prompting.

For a long time, we spoke to artificial intelligence as if it were a simple executor: “do this,” “follow these steps,” “produce that result.” But advanced AI models do not always perform best when they are treated like machines that mechanically apply instructions.

They often perform better when they are given a destination, a framework, constraints, success criteria, and freedom of method.

The central idea of this article is simple:

A good prompt does not command a route. It creates a world in which the right route becomes obvious.

This is what I call the Marcus Aurelius Metaprompt.

Why Marcus Aurelius? Because Stoicism offers a powerful mental model for working with AI agents: distinguish what depends on you from what does not, accept the constraints of reality, do not invent what you do not know, and act with discipline inside the world as it is.

But this idea needs to be precise. The goal is not to let AI reinterpret every request freely. The goal is not to replace clear instructions with vague philosophical language. The goal is to design a better framework for action.

AI should not ignore the user’s words. It should respect them. But when the task is complex, ambiguous, contradictory, or incomplete, the model also needs a disciplined way to reason about the objective behind the request.

1. The Problem with Overly Deterministic Prompts

An overly deterministic prompt looks like an imposed itinerary:

First do A.
Then do B.
Then do C.
Finally give me D.

This type of prompt can work very well for a simple, repeatable procedure. If the task requires compliance, reproducibility, or a strict sequence of operations, then explicit step-by-step instructions are not only useful — they may be necessary.

But for complex, ambiguous, creative, or strategic tasks, forcing the model into a rigid path can reduce the quality of the result. The AI may follow the visible form of the instruction instead of understanding the deeper purpose. It may produce a mechanical answer, skip important checks, or fail to adapt when the initial path is not the best one.

This does not mean that AI literally “hates” determinism. An AI does not feel emotion. It does not become sad. It does not rebel against fate. But its behavior can become poorer when the prompt removes too much of its ability to reason, compare, adapt, verify, and correct itself.

The problem is not emotional. It is operational.

The more we impose a single path on an advanced AI, the more we may reduce its ability to choose the right method for the task.

2. The Real Question: Should We Command AI or Govern It?

Classical prompting often treats AI as a machine that must obey. Agentic prompting treats AI as an agent operating under constraints.

The difference is not absolute. It is a continuum. Some tasks require strict execution. Others require judgment. The art of prompting is knowing which mode is appropriate.

To command an AI is to say:

Take this path.

To govern an AI is to say:

Here is the destination.
Here are the constraints.
Here is what is forbidden.
Here is what counts as success.
Within this framework, choose the best path.

In the first case, the AI is placed inside a trajectory. In the second, it is given an action space. But this action space is not chaotic. It is structured by rules, limits, and criteria.

That is why a good agentic prompt is not merely an instruction. It is closer to a constitution.

The modern prompt is not only an instruction. It can become a local constitution for an agent.

3. Why Stoicism Is an Excellent Model for Prompting

Stoicism does not say: “everything is written, therefore give up.” It says instead: “reality contains constraints, but your judgment and your action still matter.”

This distinction is useful for AI prompting.

In any given task, some things depend on the model:

  • clarifying the request;
  • structuring the answer;
  • reasoning from available information;
  • proposing options;
  • checking contradictions;
  • stating uncertainty;
  • producing a useful result.

Other things do not depend on the model:

  • facts it does not know;
  • missing data;
  • ambiguous intentions;
  • future events;
  • unverified information;
  • external constraints it cannot change.

A good metaprompt should teach the model to make this distinction. It should not encourage the AI to invent missing facts. It should also not make the model freeze whenever something is uncertain.

The right behavior is more disciplined:

  • state what is known;
  • state what is unknown;
  • make reasonable assumptions only when necessary;
  • signal those assumptions clearly;
  • continue when the task can still move forward.

To be Stoic, for an AI, is not to believe in fate. It is to act correctly inside a constrained world.

4. The Core Principle: Tactical Freedom, Strategic Discipline

The Marcus Aurelius Metaprompt rests on a deliberate tension:

  • the destination belongs to the user;
  • the constraints belong to the framework;
  • the choice of the best method belongs to the agent, within limits.

This last point is important. Tactical freedom does not mean that the AI can ignore the user’s request. It does not mean that the model gets to decide that it knows better than the user. That would be a failure mode.

The model must first respect the explicit request. It should only depart from the exact wording in limited cases:

  1. if the request contains a clear contradiction;
  2. if the requested path is impossible;
  3. if the requested path is unsafe, illegal, or misleading;
  4. if another method clearly serves the stated objective better;
  5. if the model explicitly signals the deviation and explains why.

This makes the framework safer. The model is not being invited to reinterpret everything. It is being asked to remain faithful to the user’s objective while staying honest about constraints, risks, and feasibility.

Do not give the model permission to ignore the user. Give it permission to act intelligently when the literal path breaks the objective.

5. The Bad Prompt and the Better Prompt

Here is an example of a weak prompt for a complex task:

You must follow exactly these steps.
Never deviate from the indicated order.
Do A, then B, then C.
Give me the final answer.

This may be useful for a simple procedure. But it is fragile for a task that requires judgment, adaptation, or verification.

Here is a more agentic version:

Here is the objective to reach.
Here are the constraints to respect.
Here is what is forbidden.
Here are the criteria for a good answer.

Respect my explicit request.
Do not depart from it unless it is contradictory, impossible, unsafe, or clearly harmful to the stated objective.

If you need to choose another method, say so briefly and explain why.

Within that framework, choose the most appropriate path, state uncertainties, and verify your result before concluding.

The second prompt is stronger because it does not confuse the objective with the itinerary. It sets a destination, protects the user’s intent, and gives the model a disciplined space in which to operate.

6. The Practical Marcus Aurelius Metaprompt

The Marcus Aurelius Metaprompt should not be understood as a long text that must be pasted into every conversation. That would defeat part of its purpose.

The value is not in length. The value is in the governing principles: respect the user’s explicit request, define the objective, preserve the constraints, distinguish what is known from what is unknown, and choose the best method within the frame.

For most practical uses, the short version is the right version:

Treat my requests as objectives under constraints, not merely as rails to follow.

Respect my explicit request first. Depart from it only if it is contradictory, impossible, unsafe, misleading, or if another method clearly serves the stated objective better. If you depart from it, say so briefly.

For every request:
1. identify the real objective;
2. respect the explicit constraints;
3. distinguish what is known, unknown, and assumed;
4. choose the best method within the framework;
5. verify the limits before answering;
6. produce the most useful, clear, and reliable result possible.

Be free in method, but faithful to the objective.

This is the usable form of the metaprompt. It gives the model tactical freedom without allowing it to drift away from the user’s request.

7. Why Shorter Is Usually Better

The complete philosophical version of the metaprompt may be useful as a manifesto, a teaching tool, or a way to explain the underlying idea. But in daily use, it is usually unnecessary.

Long metaprompts consume context. They add latency. They may repeat behaviors that strong models already perform reasonably well. They can also create noise if the task is simple.

This is why the Marcus Aurelius Metaprompt should be understood primarily as a principle, not as a long ritual.

The long version explains the philosophy. The short version does the work.

The practical goal is not to write the longest possible instruction. The goal is to give the model just enough structure to act well.

8. Why This Approach Matches the Evolution of AI

AI models are becoming increasingly agentic. They are no longer used only to answer isolated questions. We now ask them to analyze, plan, code, correct, write, synthesize, compare, verify, use documents, interact with tools, and collaborate on long tasks.

In this context, the prompt can no longer be only a sentence of instruction. It becomes a working environment.

A skill, a system instruction, a specialized assistant, or a custom agent all follow the same logic: they define a way of acting within a family of situations.

We no longer simply say:

Answer this question.

Instead, we increasingly say:

In this type of situation,
here is how you should understand,
prioritize,
verify,
act,
and present the result.

This is a major evolution. Prompting becomes less a technique of command and more a technique of governance.

9. The Prompt as Constitution

The most accurate metaphor may be the constitution.

A constitution does not describe every future action in detail. It defines principles, limits, rights, responsibilities, and procedures. It allows a system to act in new situations without losing its identity.

A good metaprompt does something similar.

It does not predict every future request. It does not force a single answer. It creates a framework within which the AI can decide correctly.

But a constitution also prevents abuse. That is why the correction matters: the model must not be allowed to reinterpret every request freely. It must respect the user’s explicit words by default, and only deviate under clear, limited, justified conditions.

A weak prompt gives an instruction. A strong prompt gives a constitution. A safe prompt also defines when the agent is not allowed to reinterpret the law.

This is what allows the AI to be useful, reliable, and adaptable at the same time.

10. The Final Formula

The thesis can be summarized as follows:

The best prompts are not always those that determine every step. They are those that define an action space in which the right behavior becomes more likely.

An advanced AI should not always be directed only as an executor. For complex tasks, it should be governed as an agent: through purpose, constraints, success criteria, and a discipline of verification.

This is why Stoicism is such a useful metaphor for prompting. It unites two ideas that seem opposed but are actually complementary:

  • the lucid acceptance of constraints;
  • the disciplined freedom of method.

The Marcus Aurelius Metaprompt does not say to the AI: “do whatever you want.” Nor does it say: “blindly follow this path.” It says:

Here is reality. Here is the destination. Here are the laws. Respect the user. Act as well as possible.

And this may be where the future of prompting lies: not in increasingly long commands, but in increasingly intelligent frameworks.

Conclusion

We do not need prompts that trap AI inside an immutable destiny. We need prompts that give it a destination, limits, and discipline.

But we also need prompts that prevent the model from drifting away from the user’s actual request.

The true art of prompting is therefore not to control everything. It is to control what must be controlled: the objective, the constraints, the success criteria, the requirement of truth, and the rules for when the model may adapt.

The rest should remain alive.

Do not program AI as a machine that merely obeys. Design a world in which acting well becomes the most natural path — while making sure it remains faithful to the user’s request.

This is the heart of the Marcus Aurelius Metaprompt:

I do not control everything. I control my judgment, my method, my honesty, and my action.

Harness Engineering: Turning AI Agent Experience Into Operational Memory

For a while, we thought the problem was prompt engineering. If the model failed, we tried to write a better instruction. Then the conversation moved to context: retrieval, memory, RAG, long context windows, dynamic prompts. Then came agents: systems where the model no longer only answers, but inspects files, calls tools, runs commands, edits code, and verifies results.

Now the word is harness.

At first, it sounds like another fashion term. A harness can look like a collection of tricks around the model: better prompts, better instructions, better tools, better workflows. But I think the idea is deeper than that. Harness engineering is not just prompt engineering with a new name. It is the discipline of building the environment that turns a probabilistic language model into a system that can act.

A model predicts. A harness constrains, verifies, remembers, and decides when the work is done.

That distinction matters. A language model is still fundamentally autoregressive: it produces the next token from the previous tokens. But an agent cannot survive by continuation alone. It needs an objective, a budget, tools, permissions, tests, memory, and stop conditions. It needs to know what to inspect, what to change, what to preserve, and when to stop.

Prompt engineering asks: “How do I phrase the instruction?”

Harness engineering asks: “What environment produces reliable behavior?”

That is a different discipline.

A prompt can tell an agent to be careful. A harness can define what “careful” means in a specific project. A prompt can say “run the tests.” A harness can decide which tests matter, trigger them after changes, and prevent the agent from claiming success if they fail. A prompt can say “avoid dangerous actions.” A harness can block destructive commands before they run.

This is where the old vocabulary becomes insufficient. We are no longer only writing instructions. We are designing control flow around a probabilistic actor.

And strangely, the more sophisticated these agents become, the more we return to primitive instructions: observe, plan, act, verify, remember, stop. These are not impressive ideas. They are basic control primitives. But that is exactly why they matter. They are the behavioral assembly language of agents.

The missing layer is experience.

A model can know a lot about code in general, but it does not know the local history of your project. It does not know which migration caused an incident six months ago, which directory is generated and should not be edited, which internal convention your team never violates, or which test is flaky for historical reasons.

Humans learn these things through experience. Over time, experience becomes intuition. A senior engineer does not reason from scratch every time. They carry reflexes: reproduce the bug before patching it, check the invariant before fixing the symptom, avoid widening scope during a hotfix, never remove a failing test just to make the suite pass.

Agents need something similar. Since that experience is not reliably inside the model weights, it has to be externalized.

This is the part that interests me most. The real value of a harness is not merely that it gives instructions to the agent. The real value is that it can turn past agent experience into future operational memory.

Every coding session with an AI agent contains traces of learning: what failed, what worked, which files mattered, what assumptions were wrong, which commands were useful, which project rules were rediscovered, and which mistakes should not happen again. But most of that experience disappears. It remains buried in transcripts, JSONL logs, chat histories, scattered notes, or bloated instruction files that nobody wants to maintain.

This is the problem I built Insight Forge to explore.

Insight Forge is based on a simple idea: your AI sessions already contain more knowledge than you think. They contain the raw material for a better harness. The problem is that this knowledge is not structured, not validated, and not reusable.

So instead of treating every agent session as disposable, Insight Forge treats it as evidence. It reads past sessions, extracts signals, identifies useful lessons, and proposes improvements to files like CLAUDE.md or AGENTS.md. It does not try to magically retrain the model. It tries to improve the environment around the model.

That is the key distinction.

The goal is not to make the model universally smarter. The goal is to make the agent locally more competent.

This is exactly what experience does for humans. Experience does not make us omniscient. It makes us better adapted to a specific environment. We learn the shape of a codebase, the traps in a process, the habits of a team, the bugs that tend to come back, the shortcuts that are dangerous, and the checks that are worth running.

A good harness should do the same for agents.

Files like AGENTS.md and CLAUDE.md are early forms of this externalized intuition. They put local working knowledge beside the code itself: conventions, constraints, test expectations, forbidden patterns, stop conditions. At one level, they are just Markdown files. At another level, they are primitive memory systems for agents.

But Markdown alone is not enough. A file can suggest behavior; a hook can enforce it.

A hook can block a dangerous command, run tests after a file change, reject premature completion, or preserve execution traces. If instructions are memory, hooks are reflexes.

Skills add another layer. Instead of putting every rule into one giant prompt, we can package procedures: regression debugging, API review, migration safety, security-sensitive changes, documentation updates, harness review. A skill is not just knowledge. It is a reusable routine.

This is where I think harness engineering becomes genuinely interesting. It is not about collecting “AI tips.” It is about building a system where the agent’s environment accumulates experience over time.

That is why Insight Forge belongs directly in this discussion. It is not a side project next to the harness idea. It is an attempt to implement one part of it: the conversion of agent traces into operational memory.

The workflow is deliberately conservative. Not every sentence from a past session should become a rule. Not every model claim should be trusted. A useful harness needs evidence, not vibes. The important question is not “Did the model say something?” but “Was this lesson confirmed by the session? Did it affect the outcome? Is it reusable? Should it become a project rule, a warning, a skill, or a hook?”

That is where the next layer of agent engineering may live.

Not only in larger models.

Not only in longer context windows.

Not only in more tools.

But in the ability to preserve, filter, and reuse experience.

There is a paradox here. At first, AI progress looks like giving models more freedom: larger models, more autonomy, more tools, more context, more flexible workflows. But the more we use agents for real work, the more reliability brings us back to constraints.

Budgets matter. Stop conditions matter. Verification matters. Permissions matter. Logs matter. Tool boundaries matter. Reproducible workflows matter.

This does not mean agents are a failure. It means agents are becoming software. And software needs architecture.

The harness is that architecture around the model. It decides what should remain flexible and what should be enforced, what can be left to model judgment and what must be checked by code, what should be remembered and what should be discarded, what counts as success and what counts as unsafe action.

We are no longer simply learning how to talk to models. We are learning how to build external nervous systems around them.

The model provides generative intelligence. The harness provides procedural continuity. The model can propose, but the harness can constrain. The model can forget, but the harness can remember. The model can continue, but the harness can decide when to stop.

That is why harnesses matter. They are not just a trend or a collection of tricks. They are the emerging layer between probabilistic language generation and durable operational intelligence.

And this is also why I think tools like Insight Forge point in the right direction. The future of agent engineering will not belong only to raw intelligence. It will belong to systems that can turn experience into memory, memory into procedure, and procedure into reliable action.

https://github.com/bacoco/insight-forge/tree/main

Ingénierie du harness : transformer l’expérience des agents IA en mémoire opérationnelle

Pendant un moment, nous avons cru que le problème était le prompt engineering. Si le modèle échouait, nous essayions d’écrire une meilleure instruction. Puis la conversation s’est déplacée vers le contexte : retrieval, mémoire, RAG, longues fenêtres de contexte, prompts dynamiques. Ensuite sont venus les agents : des systèmes où le modèle ne se contente plus de répondre, mais inspecte des fichiers, appelle des outils, lance des commandes, modifie du code et vérifie des résultats.

Maintenant, le mot est harness.

À première vue, cela ressemble à un terme de mode supplémentaire. Un harness peut passer pour une collection d’astuces autour du modèle : meilleurs prompts, meilleures instructions, meilleurs outils, meilleurs workflows. Mais je pense que l’idée est plus profonde que cela. L’ingénierie du harness n’est pas simplement du prompt engineering avec un nouveau nom. C’est la discipline qui consiste à construire l’environnement qui transforme un modèle de langage probabiliste en système capable d’agir.

Un modèle prédit. Un harness contraint, vérifie, mémorise et décide quand le travail est terminé.

Cette distinction compte. Un modèle de langage reste fondamentalement autorégressif : il produit le prochain token à partir des tokens précédents. Mais un agent ne peut pas survivre par simple continuation. Il lui faut un objectif, un budget, des outils, des permissions, des tests, une mémoire et des conditions d’arrêt. Il doit savoir quoi inspecter, quoi changer, quoi préserver et quand s’arrêter.

Le prompt engineering demande : « Comment formuler l’instruction ? »

L’ingénierie du harness demande : « Quel environnement produit un comportement fiable ? »

C’est une discipline différente.

Un prompt peut dire à un agent d’être prudent. Un harness peut définir ce que “prudent” signifie dans un projet spécifique. Un prompt peut dire « lance les tests ». Un harness peut décider quels tests comptent, les déclencher après les changements et empêcher l’agent d’affirmer que le travail est terminé s’ils échouent. Un prompt peut dire « évite les actions dangereuses ». Un harness peut bloquer les commandes destructrices avant leur exécution.

C’est là que l’ancien vocabulaire devient insuffisant. Nous n’écrivons plus seulement des instructions. Nous concevons du contrôle de flux autour d’un acteur probabiliste.

Et, curieusement, plus ces agents deviennent sophistiqués, plus nous revenons à des instructions primitives : observer, planifier, agir, vérifier, mémoriser, s’arrêter. Ce ne sont pas des idées impressionnantes. Ce sont des primitives de contrôle. Mais c’est précisément pour cela qu’elles comptent. Elles sont l’assembleur comportemental des agents.

La couche manquante est l’expérience.

Un modèle peut savoir beaucoup de choses sur le code en général, mais il ne connaît pas l’histoire locale de votre projet. Il ne sait pas quelle migration a causé un incident il y a six mois, quel dossier est généré et ne doit pas être modifié, quelle convention interne votre équipe ne viole jamais, ou quel test est flaky pour des raisons historiques.

Les humains apprennent ces choses par l’expérience. Avec le temps, l’expérience devient intuition. Un ingénieur senior ne raisonne pas à partir de zéro à chaque fois. Il porte des réflexes : reproduire le bug avant de le corriger, vérifier l’invariant avant de traiter le symptôme, éviter d’élargir le périmètre pendant un hotfix, ne jamais supprimer un test en échec simplement pour faire passer la suite.

Les agents ont besoin de quelque chose de similaire. Comme cette expérience n’est pas présente de manière fiable dans les poids du modèle, elle doit être externalisée.

C’est la partie qui m’intéresse le plus. La vraie valeur d’un harness n’est pas seulement de donner des instructions à l’agent. Sa vraie valeur est de transformer l’expérience passée de l’agent en mémoire opérationnelle future.

Chaque session de coding avec un agent IA contient des traces d’apprentissage : ce qui a échoué, ce qui a fonctionné, quels fichiers comptaient, quelles hypothèses étaient fausses, quelles commandes étaient utiles, quelles règles du projet ont été redécouvertes, et quelles erreurs ne devraient plus se reproduire. Mais la plupart de cette expérience disparaît. Elle reste enterrée dans des transcripts, des logs JSONL, des historiques de conversation, des notes dispersées ou des fichiers d’instructions devenus trop gros pour être maintenus.

C’est le problème que j’ai voulu explorer avec Insight Forge.

Insight Forge repose sur une idée simple : vos sessions IA contiennent déjà plus de connaissance que vous ne le pensez. Elles contiennent la matière première d’un meilleur harness. Le problème est que cette connaissance n’est pas structurée, pas validée et pas réutilisable.

Au lieu de traiter chaque session agentique comme jetable, Insight Forge la traite comme une preuve. Il lit les sessions passées, extrait des signaux, identifie des leçons utiles et propose des améliorations pour des fichiers comme CLAUDE.md ou AGENTS.md. Il n’essaie pas de réentraîner magiquement le modèle. Il essaie d’améliorer l’environnement autour du modèle.

C’est la distinction essentielle.

Le but n’est pas de rendre le modèle universellement plus intelligent. Le but est de rendre l’agent localement plus compétent.

C’est exactement ce que l’expérience fait pour les humains. L’expérience ne nous rend pas omniscients. Elle nous rend mieux adaptés à un environnement spécifique. Nous apprenons la forme d’une codebase, les pièges d’un processus, les habitudes d’une équipe, les bugs qui ont tendance à revenir, les raccourcis dangereux et les vérifications qui valent la peine d’être lancées.

Un bon harness devrait faire la même chose pour les agents.

Des fichiers comme AGENTS.md et CLAUDE.md sont des formes précoces de cette intuition externalisée. Ils placent le savoir local de travail à côté du code lui-même : conventions, contraintes, attentes de test, patterns interdits, conditions d’arrêt. À un niveau, ce ne sont que des fichiers Markdown. À un autre niveau, ce sont des systèmes de mémoire primitifs pour agents.

Mais Markdown seul ne suffit pas. Un fichier peut suggérer un comportement ; un hook peut l’imposer.

Un hook peut bloquer une commande dangereuse, lancer des tests après une modification de fichier, rejeter une fin prématurée ou préserver des traces d’exécution. Si les instructions sont de la mémoire, les hooks sont des réflexes.

Les skills ajoutent encore une couche. Au lieu de placer toutes les règles dans un prompt géant, on peut empaqueter des procédures : débogage de régression, revue d’API, sécurité des migrations, changements sensibles à la sécurité, mise à jour de documentation, revue de harness. Un skill n’est pas seulement de la connaissance. C’est une routine réutilisable.

C’est là que l’ingénierie du harness devient réellement intéressante. Il ne s’agit pas de collectionner des “astuces IA”. Il s’agit de construire un système dans lequel l’environnement de l’agent accumule de l’expérience au fil du temps.

C’est pourquoi Insight Forge appartient directement à cette discussion. Ce n’est pas un projet annexe à côté de l’idée de harness. C’est une tentative d’en implémenter une partie : la conversion des traces agentiques en mémoire opérationnelle.

Le workflow est volontairement conservateur. Toutes les phrases d’une ancienne session ne doivent pas devenir des règles. Toutes les affirmations du modèle ne doivent pas être crues. Un harness utile a besoin de preuves, pas d’impressions. La question importante n’est pas : « Est-ce que le modèle a dit quelque chose ? » mais plutôt : « Est-ce que cette leçon a été confirmée par la session ? Est-ce qu’elle a changé le résultat ? Est-ce qu’elle est réutilisable ? Doit-elle devenir une règle de projet, un avertissement, un skill ou un hook ? »

C’est peut-être là que se trouve la prochaine couche de l’ingénierie des agents.

Pas seulement dans des modèles plus grands.

Pas seulement dans des fenêtres de contexte plus longues.

Pas seulement dans davantage d’outils.

Mais dans la capacité à préserver, filtrer et réutiliser l’expérience.

Il y a ici un paradoxe. Au début, le progrès en IA ressemble à plus de liberté donnée aux modèles : modèles plus grands, plus d’autonomie, plus d’outils, plus de contexte, workflows plus flexibles. Mais plus nous utilisons les agents pour du travail réel, plus la fiabilité nous ramène vers les contraintes.

Les budgets comptent. Les conditions d’arrêt comptent. La vérification compte. Les permissions comptent. Les logs comptent. Les frontières d’outils comptent. Les workflows reproductibles comptent.

Cela ne veut pas dire que les agents sont un échec. Cela veut dire que les agents deviennent du logiciel. Et le logiciel a besoin d’architecture.

Le harness est cette architecture autour du modèle. Il décide ce qui doit rester flexible et ce qui doit être imposé, ce qui peut être laissé au jugement du modèle et ce qui doit être vérifié par du code, ce qui doit être mémorisé et ce qui doit être jeté, ce qui compte comme succès et ce qui compte comme action dangereuse.

Nous ne sommes plus simplement en train d’apprendre à parler aux modèles. Nous sommes en train d’apprendre à construire autour d’eux des systèmes nerveux externes.

Le modèle fournit l’intelligence générative. Le harness fournit la continuité procédurale. Le modèle peut proposer, mais le harness peut contraindre. Le modèle peut oublier, mais le harness peut mémoriser. Le modèle peut continuer, mais le harness peut décider quand s’arrêter.

C’est pourquoi les harnesses comptent. Ce ne sont pas seulement une tendance ou une collection d’astuces. Ils sont la couche émergente entre la génération probabiliste du langage et l’intelligence opérationnelle durable.

Et c’est aussi pour cela que je pense que des outils comme Insight Forge vont dans la bonne direction. L’avenir de l’ingénierie des agents n’appartiendra pas seulement à l’intelligence brute. Il appartiendra aux systèmes capables de transformer l’expérience en mémoire, la mémoire en procédure, et la procédure en action fiable.