How procedural memory can reduce the cost and complexity of AI agents

Do you want smarter information in your reception box? Sign up for our weekly newsletters to obtain only what matters for business managers, data and security managers. Subscribe now
A new technique from the University of Zhejiang and the Alibaba group gives agents of the large language model (LLM) a dynamic memory, which makes them more effective and effective in complex tasks. The technique, called Memp, offers agents a “procedural memory” which is continuously updated as they acquire experience, a bit like the way humans learn from practice.
Memp creates a lifelong learning environment where agents do not have to start from scratch for each new task. Instead, they gradually become better and more effective because they meet new situations in real environments, a key requirement for reliable corporate automation.
The case of procedural memory in AI agents
LLM agents are promising for the automation of complex and several stages. In practice, however, these long-horizon tasks can be fragile. Researchers point out that unpredictable events such as network glitches, changes in the user interface or data schemes can derail the entire process. For current agents, this often means starting each time, which can take time and expensive.
Meanwhile, many complex tasks, despite surface differences, share deep structural points. Instead of relearn these models each time, an agent should be able to extract and reuse his experience of past successes and failures, underline the researchers. This requires a specific “procedural memory”, which in humans is long -term memory responsible for skills such as entering or driving a bicycle, which becomes automatic with practice.
The AI scale reached its limits
Electricity ceilings, increase in token costs and inference delays restart the AI company. Join our exclusive fair to discover how best the teams are:
- Transform energy into a strategic advantage
- Effective inference architecting for real debit gains
- Unlock a competitive return on investment with sustainable AI systems
Secure your place to stay in advance::

Current agent systems often lack this capacity. Their procedural knowledge is generally handcrafted by developers, stored in rigid or integrated prompt models in model parameters, which are expensive and slow to update. Even existing managers in memory memory provide only coarse abstractions and do not adequate how the skills must be built, indexed, corrected and ultimately pruned on the life cycle of an agent.
Consequently, researchers note in their article: “There is no means of principle to quantify how effectively an agent evolves his procedural repertoire or to guarantee that new experiences are improving rather than eroding performance.”
How does Memp work
The MEMP is an agnostic frame of task which deals with procedural memory as a central component to optimize. It consists of three key steps that operate in a continuous loop: building, recovering and updating memory.
Memories are built from the past experiences of an agent or “trajectories”. The researchers explored the storage of these memories in two formats: motor shares, step by step; Or distilling these actions in abstractions of higher level and script type. For recovery, the agent seeks his memory for the most relevant past experience when she has given a new task. The team has experienced different methods, such a search for a vector, to correspond to the description of the new task with past requests or to the extraction of keywords to find the best fit.
The most critical component is the update mechanism. Memp presents several strategies to ensure that the agent’s memory is evolving. While an agent performs more tasks, his memory can be updated by simply adding the new experience, only filtering successful results or, more efficiently, reflecting on failures to correct and revise the original memory.

This concentration on dynamic and scalable memory places the MEMP in a growing field of research aimed at making AI agents more reliable for the long -term tasks. The work is parallel to other efforts, such as MEM0, which consolidates the key information of long conversations in structured and graphic knowledge of knowledge to ensure coherence. Likewise, A-MEM allows agents to create and bind “memory notes” from their interactions, forming a complex knowledge structure over time.
However, the co-author Runnan Fang highlights a critical distinction between Memp and other executives.
“Mem0 and A-mem are excellent works … but they focus on recalling salient content In A single trajectory or conversation, ”said Fang for Venturebeat. Essentially, they help an agent to remember “what” happened. “Memp, on the other hand, targets transversal procedural memory. He focuses on “how” knowledge that can be generalized on similar tasks, preventing the agent from re -explore from zero each time.
“By distilling the successful work flows passed as a reusable procedural priority, Memp increases success rates and shorten the stages,” added Fang. “Above all, we also introduce an update mechanism so that this procedural memory continues to improve – after all, the practice also makes it perfect for agents.”
Overcome the problem of “cold start”
Although the concept of learning past trajectories is powerful, it raises a practical question: how does an agent build his initial memory when there are no perfect examples to learn? The researchers approach this problem of “cold start” with a pragmatic approach.
Fang explained that developers can first define a robust assessment metric instead of requiring a perfect “gold” trajectory in advance. This metric, which can be based on rules or even another LLM, marks the quality of an agent’s performance. “Once this metric is in place, we let peak models explore in the agent’s work flow and keep the trajectories that reach the highest scores,” said Fang. This process quickly covers an initial set of useful memories, allowing a new agent to put himself at speed without in -depth manual programming.
Memp in action
To test the framework, the team implemented MEMP over powerful LLMs such as GPT-4O, Claude 3.5 SONNET and QWEN2.5, evaluating them on complex tasks such as household chores in the ALFWORLD reference and the search for information in Travelplanner. The results have shown that the construction and recovery of procedural memory allowed an agent to distill and effectively reuse his previous experience.
During the tests, agents equipped with a MEMP not only reached higher success rates, but have become much more effective. They eliminated exploration and unsuccessful tests and errors, leading to a substantial reduction in the number of stages and the consumption of token required to accomplish a task.

One of the most important conclusions for corporate applications is that procedural memory is transferable. In an experience, the procedural memory generated by the powerful GPT-4O was given to a much smaller model, Qwen2.5-14b. The smaller model has experienced a significant increase in performance, improving its success rate and reducing the stages necessary to perform tasks.
According to Fang, it works because smaller models often manage simple actions and in one step, but weaken with regard to planning and long-horizon reasoning. The procedural memory of the larger model effectively fills this difference in capacity. This suggests that knowledge can be acquired using a advanced model, then deployed on smaller and more profitable models without losing the advantages of this experience.
Towards truly autonomous agents
By equipping agents with memory update mechanisms, the framework of the MEMP allows them to constantly build and refine their procedural knowledge while operating in a live environment. The researchers found that this had endowed the agent with a “continuous, almost linear mastery of the task”.
However, the path of complete autonomy requires overcoming another obstacle: many real tasks, such as the production of a research report, do not have a simple success signal. To improve continuously, an agent must know if he has done a good job. Fang says that the future lies in the use of the LLM themselves as judges.
“Today, we often combine powerful models with handcrafted rules to calculate completion scores,” he notes. “It works, but the rules written are brittle and difficult to generalize.”
An LLM-As-Judge could provide the nuanced supervision feedback necessary for an agent self-coring on complex and subjective tasks. This would make the whole of the learning loop more scalable and robust, marking a critical step towards the construction of resilient AI workers, adaptable and truly autonomous necessary for the automation of sophisticated companies.
https://venturebeat.com/wp-content/uploads/2025/08/llm-agent-memory.jpg?w=1024?w=1200&strip=all