Research Note | April 23, 2026

Feedback Should Become Rules

Systems do not learn just because they store corrections. They learn when feedback is classified, scoped, and translated into durable behavior without contaminating memory or leaking internal repair notes back into the reply.

One of the easiest demos in agent design is apology. The assistant gets corrected, says thank you, retries, and sounds humble. That is not the same as learning. In most systems, feedback still lands as loose residue: another line in the transcript, another blob in memory, another prompt fragment that may or may not echo back later. The correction either disappears or survives in the wrong form.

AaronCore treats that as a structural problem. If a user says "that is not what I meant," "too short," "do not open that," or "remember this next time," those sentences are not four versions of the same event. They point at different failures, different scopes, and different places in the runtime. Pouring them into one bucket does not create improvement. It creates contamination.

Feedback only becomes valuable when it changes future behavior in the right place.

Correction is not memory

A lot of systems treat correction as if it were a memory write. The user objects, the objection gets stored, and the product calls that adaptation. But many corrections are not facts about the user and not durable memories about the relationship. They are judgments about what just went wrong. Some describe a routing error. Some describe reply shape. Some describe execution posture. Some do point to real preference. Those should not all be preserved through the same path.

When correction gets flattened into memory, the memory layer starts carrying policy debris. That makes recall less trustworthy. The system remembers that something felt wrong once, but not what kind of wrong it was, how long it should matter, or where it should apply. The result is a memory system that stores more text while grounding less behavior.

Learning without scope becomes contamination

A correction does not just need meaning. It needs scope. Some feedback should affect one retry and then disappear. Some should hold for the current session. Some should remain as a short-term pattern while the user is in a particular mode of work. A smaller set deserves to become a longer-lived rule. Without that distinction, the system becomes unstable in both directions: it either forgets too fast, or it drags local repair into unrelated future situations.

This is where a lot of products quietly fail. They promise learning, but the learning has no level. Everything is either transient conversation noise or permanent personalization. Real growth sits in the middle. It needs explicit room for once, session, short-term, and long-term influence instead of a single undifferentiated bucket.

Feedback needs type before it needs storage

Before feedback can be kept, it should be classified. Was the failure about routing the wrong capability, producing the wrong kind of answer, choosing the wrong execution posture, violating a user preference, or mishandling identity and memory? That question matters more than simply asking whether the system should save the message. Classification determines where the repair belongs and what kind of future behavior it should alter.

This is one reason explicit runtime state matters. A system with state can attach the correction to an active task, a current phase, a tool decision, a verification miss, or a relationship rule. A system without those anchors is forced to rely on wording alone. Then the repair becomes fragile, because the only thing holding it together is whatever the latest turn happened to sound like.

The reply path and the learning path must split

Another common mistake is letting learning artifacts leak directly into the next user-facing answer. The system classifies the complaint, drafts internal notes, or records a rule candidate, and then those traces start bleeding into the reply. Suddenly the assistant sounds like it is narrating its own repair log. That is not transparency. It is boundary failure.

The user deserves an honest response, not the raw contents of the repair process. Internal learning should stay internal. The reply path should produce the next answer. The learning path should classify, scope, and preserve what needs to shape future behavior. Those two paths can inform each other, but they should not collapse into one stream of text.

Growth should stay inspectable

Once feedback becomes typed and scoped, improvement stops looking mystical. The system can say what changed, where that change lives, and how long it should matter. That makes growth inspectable instead of theatrical. Users do not need a machine that claims to be self-improving in the abstract. They need one that can survive correction without turning every objection into sludge.

That is the larger point behind this page. Agents do not improve by accumulating more text. They improve when feedback becomes stateful enough to shape the future without polluting the past. Memory preserves what matters. Rules decide what should happen next. If a correction never crosses that boundary, the system has not really learned yet.