MIRIX: Multi-Agent Memory System for LLM-Based Agents

Hong-Wei Wu included in category Paper Introduction

2025-07-17 935 words 5 minutes CC BY-NC 4.0

Explore MIRIX, a powerful new memory system for LLM-based agents. Learn how its unique 6-component architecture and multi-agent design achieve state-of-the-art results, outperforming methods like LangMem, Mem0, and MemGPT.

Contents

1 Introduction

This article introduces the paper MIRIX: Multi-Agent Memory System for LLM-Based Agents. As the title suggests, MIRIX is a paper related to LLM Memory, similar to LangMem, Mem0, and MemGPT which we have previously discussed. The MIRIX paper was published on arXiv in July 2025. The authors have open-sourced the code on GitHub, and you can also download the software developed based on this paper directly from the official MIRIX website.

Interestingly, the official MIRIX website not only provides software downloads but also displays the benchmark results of multiple methods on LOCOMO and ScreenshotVQA. It’s clear that the method proposed by MIRIX not only surpasses popular methods like LangMem, Mem0, and Zep but is also one of the few methods that can support images as LLM Memory.

2 The MIRIX Method Design

The design of the MIRIX method can be broadly divided into the following three aspects:

Memory Component Design
Memory Update Workflow
Conversation Workflow

2.1 Memory Component Design

[Figure 1] The 6 Memory Components defined in MIRIX

As shown in the figure above, MIRIX defines a total of 6 Memory Components. These components appear to be a synthesis of the memory components designed in LangMem and MemGPT. For instance, LangMem also includes Episodic, Semantic, and Procedural Memory, while Core Memory and Resource Memory correspond to Core Memory and Archival Memory in MemGPT.

Here is a breakdown of the information stored in each Memory Component:

Core Memory: Stores the most crucial information. Following the approach of MemGPT, Core Memory contains two sections: persona and human. The persona section holds the agent’s identity, tone, and expected behavior, while the human section stores information about the user’s identity.
Episodic Memory: Stores timestamped events. Each entry consists of the following:
- event_type: e.g., user_message, inferred_result, or system_notification
- summary: A brief description of the event
- details: A detailed description of the event
- actor: The initiator of the event, can be user or assistant
- timestamp: e.g., 2025-03-05 10:15
Semantic Memory: Stores established facts or general information. For example, “Harry Potter is written by J.K. Rowling” or “John is a friend of the user who enjoys jogging and lives in San Francisco.” Information in Semantic Memory does not expire unless specifically removed or modified. Each entry consists of:
- name
- summary
- details
- source
Procedural Memory: Stores information that helps the agent solve complex and specific tasks. For example, few-shot demonstrations or step-by-step instructions provided to the agent. Each entry consists of:
- entry_type: Can be workflow, guide, or script
- description: A description of the task to be completed
- steps: Step-by-step instructions to complete the task
Resource Memory: Stores all information required by the user that does not fall into any of the above categories. Each entry consists of:
- title
- summary
- resource_type: e.g., doc, markdown, pdf_text, image, voice_transcript
- full content/excerpted content
Knowledge Vault: Stores confidential and sensitive information, such as the user’s address, contact information, and API keys. Each entry consists of:
- entry_type: e.g., credential, bookmark, contact_info, api_key
- source: e.g., user_provided, github
- sensitivity: low, medium, high
- secret_value

2.2 Memory Update Workflow

[Figure 2] The Memory Update Workflow in MIRIX

The figure above illustrates how memory is updated in the MIRIX method. Based on the user’s input, relevant information is first retrieved from the 6 Memory Components. The Meta Memory Manager then determines which Memory Component the current user input belongs to and assigns the update task to the corresponding Memory Manager.

2.3 Conversation Workflow

[Figure 3] The Conversation Workflow in MIRIX

Once the MIRIX Agent has collected sufficient memory, it can begin to answer the user’s questions based on that memory. The actual conversation process of the MIRIX Agent is shown in the figure above. Based on the user’s input, relevant (but concise, not all details) information is first retrieved from the Memory Base across the 6 Memory Components. The Chat Agent then determines which Memory Component should handle the current input and triggers a “Conduct Specific Search” to retrieve more detailed and complete information from that specific component. Finally, it generates the final response based on this retrieved information. If the Chat Agent determines that the user’s input requires a memory update, it can directly trigger the specific Memory Manager to update the relevant Memory Component.

3 Experimental Results

In the experimental phase, the MIRIX paper used two datasets—ScreenshotVQA and LOCOMO.

ScreenshotVQA is a multimodal LLM memory dataset created for this paper. This benchmark includes 5886, 18178, and 5349 screen captures collected from 3 users over 1 day, 20 days, and 1 month, respectively, along with 11, 21, and 55 corresponding questions. LOCOMO, on the other hand, is a text-only LLM memory dataset, containing 600 conversations, with each conversation averaging 26K tokens and 200 corresponding questions.

For the evaluation metric, the authors designed an LLM-as-a-Judge method based on GPT-4.1. Additionally, the MIRIX Agent used gemini-2.5-flash-preview-04-17 and gpt-4.1-mini as its backbone models for the ScreenshotVQA and LOCOMO datasets, respectively.

[Table 1] MIRIX experimental results on ScreenshotVQA

[Table 2] MIRIX experimental results on LOCOMO

From the experimental results above, it is clear that the MIRIX Agent achieved outstanding performance on both datasets!

4 Conclusion

This article has introduced the LLM Memory method proposed in the MIRIX paper. After reading this paper, what impressed me the most was the 6 Memory Components defined in MIRIX. They cover almost every conceivable usage scenario and address the shortcomings in memory component design found in LangMem, Mem0, and MemGPT. As for the Memory Update Workflow and Conversation Workflow, I found them to be less novel. However, MIRIX’s approach of designing a specific Memory Agent for each component has led to better performance than other baseline methods, suggesting that its prompt design is likely worth studying.