# [Paper Review] MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

I have been looking for a method that will fulfill tasks for extracting data from a long-sequenced and unstructured text using LLM.

If given a pdf file of a research paper, my first approach was to iterate each pages and feed the text data from a page to an agent. But an agent is stateless, meaning it does not have any information of the previous page and will cause data loss as some information is divided.

I, then, came up with an idea of a shared-memory as a state to be utilized by each agent in every step.

This paper’s goal is to enable the agents to perform a better reasoning and inference, also to reduce the inference time with less memory utilization.

# 1\. Introduction

* Problem arisen from a traditional long context data processing with LLM
    
    * full-context prompting, appending all past turns regardless of their relevance
        
    * Growing inference cost and memory usage
        
    * Generalization limits beyond the training horizon
        
    * Overloaded and inefficient context
        
* Solution
    
    * a model to learn to consolidate its memory as part of its reasoning process
        
    * memory to be shared by agents
        

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1770538101911/15fef603-358d-4da1-8b3a-740415caa043.png align="center")

# 2\. MEM1

* Annotate each component using XML-style tags
    
    * &lt;IS&gt; for internal state (reasoning)
        
        * summarizes past information
            
        * reasons about subsequent actions
            
    * &lt;query&gt; for environment queries
        
    * &lt;answer&gt; for the agent’s responses
        
    * &lt;info&gt; for external observations or tool outputs
        

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1770541971425/33033527-d84b-4271-be13-d4c38649a2b5.png align="center")

The process indicates that the $IS\_{t-1}$, $Query\_{t-1}$, $Info\_{t-1}$ are processed to be given to $IS\_{t}$. In every step this process happens to get rid of the unnecessary data, which may affect the inference performance and data quality.

# 3\. Experiment & Results

Interestingly, MEM1 approach showed the two key results

* better at inference time (less than others)
    
* better in match count
    

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1770558287942/e654e196-705e-4722-b2a0-33d9d4f221aa.png align="center")

# Reflection

In our research the entire MEM1 process is an excess approach, and also is a slightly different topic. However, it is notable that the paper represents the shared-memory technique to safely toss the data to the next agent and cast aside the incorrect data.

I will adapt the shared-memory in our pipeline, and the draft of the pipeline will look something like this.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1770559239501/35159817-30e2-4413-aa18-2b497fad4a0b.png align="center")

It may not state everything in a accurate way but seems at least feasible to be applied to our research.

Also, when we consider training a model in our specific domain (Cognitive Reserve), I asked to myself, “Can we train in such a way?”, and came to a conclusion that I cannot as CR needs a specific dataset and cannot be trained and generalised.

# Reference

\[1\] Zhou, Z., Qu, A., Wu, Z., Kim, S., Prakash, A., Rus, D., ... & Liang, P. P. (2025). MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents. *arXiv preprint arXiv:2506.15841*.
