AI總是忘事？教你9招，讓智能體“記性”變超強！原創

AI大模型觀察站

發布于 2025-7-18 14:26

瀏覽

0收藏

從滑動窗口到類操作系統記憶的測試與解析

9種技巧

優化AI代理記憶的9種技巧：從入門到高級

優化AI代理的一種方法是設計多子代理架構以提升準確性。然而，在對話型AI中，優化遠不止于此——memory變得尤為關鍵。

隨著你與AI代理的對話越來越長、越來越深入，它使用的memory會越來越多。這是因為AI依賴于諸如歷史上下文存儲、工具調用、數據庫搜索等組件。

在這篇博客中，我們將編寫代碼并評估9種從入門到高級的memory optimization技巧，幫助你了解如何應用每種技巧，以及它們的優缺點——從簡單的sequential approach到高級的OS-like memory management實現。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

技巧總結

為了保持清晰和實用性，我們將全程使用一個簡單的AI代理，觀察每種技巧的內部機制，便于在更復雜系統中擴展和實現這些策略。

所有代碼（理論+筆記本）都可在我的GitHub倉庫獲取：
???https://github.com/PulsarPioneers/Multi-Agent-AI-System??

環境設置
創建輔助函數
創建基礎代理和Memory Class
Sequential Optimization Approach的問題
Sliding Window Approach
Summarization Based Optimization
Retrieval Based Memory
Memory Augmented Transformers
Hierarchical Optimizationfor Multi-tasks
Graph Based Optimization
Compression & Consolidation Memory
OS-Like Memory Management
選擇合適的策略

環境設置

為了優化和測試AI代理的memory techniques，我們需要先初始化一些組件。但在初始化之前，得先安裝必要的Python庫：

openai：用于與LLM API交互的客戶端庫。
numpy：用于數值運算，特別是處理embeddings。
faiss-cpu：Facebook AI的庫，用于高效相似性搜索，驅動我們的retrieval memory，堪稱完美的內存向量數據庫。
networkx：用于創建和管理Graph-Based Memory中的knowledge graph。
tiktoken：用于精確計算tokens并管理上下文窗口限制。

安裝這些模塊：

pip install openai numpy faiss-cpu networkx tiktoken

接下來，初始化client module以調用LLM：

import os
from openai import OpenAI

API_KEY = "YOUR_LLM_API_KEY"
BASE_URL = "https://api.studio.nebius.com/v1/"

client = OpenAI(
    base_url=BASE_URL,
    api_key=API_KEY
)

print("OpenAI client configured successfully.")

我們將通過Bnebius或Together AI等API提供商使用開源模型。接下來，導入并選擇用于創建AI代理的開源LLM：

import tiktoken
import time

GENERATION_MODEL = "meta-llama/Meta-Llama-3.1-8B-Instruct"
EMBEDDING_MODEL = "BAAI/bge-multilingual-gemma2"

主要任務使用LLaMA 3.1 8B Instruct模型，部分優化依賴embedding model，我們將使用Gemma-2-BGE多模態嵌入模型。

接下來，定義多個輔助函數，貫穿整個博客使用。

創建輔助函數

為了避免重復代碼并遵循良好編碼習慣，我們將定義三個輔助函數：

generate_text：根據系統和用戶prompts生成內容。
generate_embeddings：為retrieval-based方法生成embeddings。
count_tokens：為每種retrieval-based方法計算總tokens數。

先編碼generate_text函數，根據輸入prompt生成文本：

def generate_text(system_prompt: str, user_prompt: str) -> str:
    """
    調用LLM API生成文本響應。
    
    參數：
        system_prompt: 定義AI角色和行為的指令。
        user_prompt: 用戶輸入，AI需對此響應。
        
    返回：
        AI生成的文本內容，或錯誤信息。
    """
    response = client.chat.completions.create(
        model=GENERATION_MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ]
    )
    return response.choices[0].message.content

generate_text函數接受system prompt和user prompt，基于LLaMA 3.1 8B生成響應。

接下來，編碼generate_embeddings函數，使用Gemma-2模型生成embeddings：

def generate_embedding(text: str) -> list[float]:
    """
    使用嵌入模型為給定文本生成數值嵌入。
    
    參數：
        text: 要轉換為嵌入的輸入字符串。
        
    返回：
        表示嵌入向量的浮點數列表，或錯誤時返回空列表。
    """
    response = client.embeddings.create(
        model=EMBEDDING_MODEL,
        input=text
    )
    return response.data[0].embedding

embedding函數使用Gemma-2模型返回輸入文本的嵌入。

最后，創建一個函數，基于整個AI和用戶聊天歷史計算tokens，幫助了解優化效果：

我們將使用常見的現代tokenizer——OpenAI cl100k_base，這是一個**Byte Pair Encoding (BPE)**分詞器。簡單來說，BPE是一種高效地將文本拆分為子詞單元的算法。

BPE示例：
???"lower", "lowest" → ["low", "er"], ["low", "est"]??

初始化tokenizer：

tokenizer = tiktoken.get_encoding("cl100k_base")

現在創建函數來分詞并計算tokens總數：

def count_tokens(text: str) -> int:
    """
    使用預加載的tokenizer計算給定字符串的token數。
    
    參數：
        text: 要分詞的字符串。
        
    返回：
        token數的整數。
    """
    return len(tokenizer.encode(text))

搞定！輔助函數創建完畢，我們可以開始探索和評估不同技巧。

創建基礎代理和Memory Class

現在需要創建代理的核心設計結構，貫穿整個指南使用。關于memory，AI代理有三個關鍵組件：

將歷史消息添加到AI代理的memory，使其了解上下文。
檢索相關內容，幫助AI生成響應。

在每種策略實施后清除AI代理的memory。

Object-Oriented Programming (OOP)是構建基于memory功能的最佳方式，我們來實現：

import abc

class BaseMemoryStrategy(abc.ABC):
    """所有memory策略的抽象基類。"""
    
    @abc.abstractmethod
    def add_message(self, user_input: str, ai_response: str):
        """添加新的用戶-AI交互到memory存儲。"""
        pass

    @abc.abstractmethod
    def get_context(self, query: str) -> str:
        """從memory檢索并格式化相關上下文發送給LLM。"""
        pass

    @abc.abstractmethod
    def clear(self):
        """重置memory，適用于開始新對話。"""
        pass

我們使用**@abstractmethod**，這是子類復用不同實現時的常見編碼風格。每種策略（子類）包含不同實現，因此設計中需要抽象方法。

基于剛定義的memory state和輔助函數，我們使用OOP原則構建AI代理結構：

class AIAgent:
    """主AI代理類，設計為可與任何memory策略配合使用。"""
    
    def __init__(self, memory_strategy: BaseMemoryStrategy, system_prompt: str = "You are a helpful AI assistant."):
        """
        初始化代理。
        
        參數：
            memory_strategy: 繼承自BaseMemoryStrategy的實例，決定代理如何記憶對話。
            system_prompt: 給LLM的初始指令，定義其角色和任務。
        """
        self.memory = memory_strategy
        self.system_prompt = system_prompt
        print(f"Agent initialized with {type(memory_strategy).__name__}.")

    def chat(self, user_input: str):
        """
        處理對話中的一個回合。
        
        參數：
            user_input: 用戶的最新消息。
        """
        print(f"\n{'='*25} NEW INTERACTION {'='*25}")
        print(f"User > {user_input}")
        
        start_time = time.time()
        context = self.memory.get_context(query=user_input)
        retrieval_time = time.time() - start_time
        
        full_user_prompt = f"### MEMORY CONTEXT\n{context}\n\n### CURRENT REQUEST\n{user_input}"
        
        prompt_tokens = count_tokens(self.system_prompt + full_user_prompt)
        print("\n--- Agent Debug Info ---")
        print(f"Memory Retrieval Time: {retrieval_time:.4f} seconds")
        print(f"Estimated Prompt Tokens: {prompt_tokens}")
        print(f"\n[Full Prompt Sent to LLM]:\n---\nSYSTEM: {self.system_prompt}\nUSER: {full_user_prompt}\n---")
        
        start_time = time.time()
        ai_response = generate_text(self.system_prompt, full_user_prompt)
        generation_time = time.time() - start_time
        
        self.memory.add_message(user_input, ai_response)
        
        print(f"\nAgent > {ai_response}")
        print(f"(LLM Generation Time: {generation_time:.4f} seconds)")
        print(f"{'='*70}")

代理基于6個簡單步驟：

1. 根據使用的策略從memory檢索上下文，記錄時間等。

2. 將檢索的memory context與當前用戶輸入合并，準備完整的prompt。

3. 打印調試信息，如prompt的tokens數和上下文檢索時間。

4. 將完整prompt（系統+用戶+上下文）發送給LLM，等待響應。

5. 用新交互更新memory，供未來上下文檢索使用。

6. 顯示AI響應及生成時間，結束此回合。

好了！組件編碼完成，我們開始理解和實現每種memory optimization技巧。

Sequential Optimization Approa 的問題

這是最基礎、最簡單的優化方法，許多開發者常用，是早期管理對話歷史的常用方式，常用于早期chatbots。

該方法將每條新消息添加到運行日志，并每次將整個對話反饋給模型，形成線性memory鏈，保留所有對話內容。讓我們來可視化：

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

Sequential Approach工作方式：

1. 用戶與AI代理開始對話。

2. 代理響應。

3. 用戶-AI交互（一個“回合”）保存為單一文本塊。

4. 下一回合，代理獲取整個歷史（回合1+回合2+回合3…）并與新用戶查詢結合。

5. 這個巨大的文本塊發送給LLM生成下一次響應。

使用我們的Memory Class實現sequential optimization：

class SequentialMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory，包含一個空列表存儲對話歷史。"""
        self.history = []

    def add_message(self, user_input: str, ai_response: str):
        """將新的用戶-AI交互添加到歷史。"""
        self.history.append({"role": "user", "content": user_input})
        self.history.append({"role": "assistant", "content": ai_response})

    def get_context(self, query: str) -> str:
        """檢索整個對話歷史，格式化為單一字符串作為LLM的上下文。"""
        return "\n".join([f"{turn['role'].capitalize()}: {turn['content']}" for turn in self.history])

    def clear(self):
        """通過清空列表重置對話歷史。"""
        self.history = []
        print("Sequential memory cleared.")

代碼解析：

init(self)：初始化空的self.history列表存儲對話。
add_message(...)：添加用戶輸入和AI響應到歷史。
get_context(...)：將歷史格式化為“Role: Content”字符串作為上下文。
clear()：為新對話重置歷史。

初始化memory class并構建AI代理：

sequential_memory = SequentialMemory()
agent = AIAgent(memory_strategy=sequential_memory)

測試sequential approach，創建多回合對話：

agent.chat("Hi there! My name is Sam.")
agent.chat("I'm interested in learning about space exploration.")
agent.chat("What was the first thing I told you?")

輸出：

==== NEW INTERACTION ====
User: Hi there! My name is Sam.  
Bot: Hello Sam! Nice to meet you. What brings you here today?  
>>>> Tokens: 23 | Response Time: 2.25s

==== NEW INTERACTION ====
User: I am interested in learning about space exploration.  
Bot: Awesome! Are you curious about:  
- Mars missions  
- Space agencies  
- Private companies (e.g., SpaceX)  
- Space tourism  
- Search for alien life?  
...  
>>>> Tokens: 92 | Response Time: 4.46s

==== NEW INTERACTION ====
User: What was the first thing I told you?  
Bot: You said, "Hi there

! My name is Sam."  
...  
>>>> Tokens: 378 | Response Time: 0.52s

對話很順暢，但注意token計算，每回合后tokens數越來越大。我們的代理不依賴顯著增加token的外部工具，因此增長完全來自消息的sequential accumulation。

缺點：對話越大，token成本越高，sequential approach成本高昂。

Sliding Window Approach

為避免大上下文問題，接下來聚焦sliding window approach，代理無需記住所有歷史消息，只保留最近N條消息的上下文。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

代理僅保留最近N條消息作為上下文，新消息到達時，最舊的消息被丟棄，窗口向前滑動。

Sliding Window Approach流程：

1. 定義固定窗口大小，如N=2回合。

2. 前兩回合填滿窗口。

3. 第三回合時，第一個回合被推出窗口。

4. 發送給LLM的上下文僅為當前窗口內的內容。

實現Sliding Window Memory類：

from collections import deque

class SlidingWindowMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 4):
        """
        初始化memory，使用固定大小的deque。
        
        參數：
            window_size: 保留的對話回合數（用戶+AI=1回合）。
        """
        self.history = deque(maxlen=window_size)

    def add_message(self, user_input: str, ai_response: str):
        """添加新對話回合到歷史，deque滿時自動移除最舊回合。"""
        self.history.append([
            {"role": "user", "content": user_input},
            {"role": "assistant", "content": ai_response}
        ])

    def get_context(self, query: str) -> str:
        """檢索當前窗口內的對話歷史，格式化為單一字符串。"""
        context_list = []
        for turn in self.history:
            for message in turn:
                context_list.append(f"{message['role'].capitalize()}: {message['content']}")
        return "\n".join(context_list)

sequential和sliding memory類相似，區別在于添加了上下文窗口。代碼解析：

init(self, window_size=2)：設置固定大小的deque，實現上下文窗口的自動滑動。
add_message(...)：添加新回合，deque滿時丟棄舊條目。
get_context(...)：僅從當前滑動窗口內的消息構建上下文。

初始化sliding window并構建AI代理：

sliding_memory = SlidingWindowMemory(window_size=2)
agent = AIAgent(memory_strategy=sliding_memory)

測試優化方法，創建多回合對話：

agent.chat("My name is Priya and I'm a software developer.")
agent.chat("I work primarily with Python and cloud technologies.")
agent.chat("My favorite hobby is hiking.")

輸出：

==== NEW INTERACTION ====
User: My name is Priya and I am a software developer.  
Bot: Nice to meet you, Priya! What can I assist you with today?  
>>>> Tokens: 27 | Response Time: 1.10s

==== NEW INTERACTION ====
User: I work primarily with Python and cloud technologies.  
Bot: That is great! Given your expertise...  
>>>> Tokens: 81 | Response Time: 1.40s

==== NEW INTERACTION ====
User: My favorite hobby is hiking.  
Bot: It seems we had a nice conversation about your background...  
>>>> Tokens: 167 | Response Time: 1.59s

對話與sequential approach類似。現在，測試用戶詢問窗口外的信息：

agent.chat("What is my name?")

輸出：

==== NEW INTERACTION ====
User: What is my name?  
Bot: I apologize, but I dont have access to your name from our recent conversation. Could you please remind me?  
>>>> Tokens: 197 | Response Time: 0.60s

AI代理無法回答，因為相關上下文已超出滑動窗口。token數減少，但重要上下文可能丟失。滑動窗口大小需根據AI代理類型定制。

Summarization Based Optimization

sequential approach有巨大上下文問題，sliding window可能丟失重要上下文。需要一種方法壓縮上下文而不丟失關鍵信息，這就是summarization。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

Summarization Approach流程：

1. 最近消息存儲在臨時“buffer”中。

2. buffer達到一定大小（“threshold”）時，代理暫停并觸發動作。

3. 將buffer內容和之前summary發送給LLM，要求生成新的、合并的summary。

4. LLM生成新summary，替換舊的，buffer清空。

實現summarization optimization：

class SummarizationMemory(BaseMemoryStrategy):
    def __init__(self, summary_threshold: int = 4):
        """
        初始化summarization memory。
        
        參數：
            summary_threshold: 觸發summarization的消息數（用戶+AI）。
        """
        self.running_summary = ""
        self.buffer = []
        self.summary_threshold = summary_threshold

    def add_message(self, user_input: str, ai_response: str):
        """添加新交互到buffer，buffer滿時觸發memory consolidation。"""
        self.buffer.append({"role": "user", "content": user_input})
        self.buffer.append({"role": "assistant", "content": ai_response})

        if len(self.buffer) >= self.summary_threshold:
            self._consolidate_memory()

    def _consolidate_memory(self):
        """使用LLM總結buffer內容并與現有running summary合并。"""
        print("\n--- [Memory Consolidation Triggered] ---")
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        
        summarization_prompt = (
            f"You are a summarization expert. Your task is to create a concise summary of a conversation. "
            f"Combine the 'Previous Summary' with the 'New Conversation' into a single, updated summary. "
            f"Capture all key facts, names, and decisions.\n\n"
            f"### Previous Summary:\n{self.running_summary}\n\n"
            f"### New Conversation:\n{buffer_text}\n\n"
            f"### Updated Summary:"
        )
        
        new_summary = generate_text("You are an expert summarization engine.", summarization_prompt)
        self.running_summary = new_summary
        self.buffer = []
        print(f"--- [New Summary: '{self.running_summary}'] ---")

    def get_context(self, query: str) -> str:
        """構建上下文，結合長期running summary和短期buffer。"""
        buffer_text = "\n".join([f"{msg['role'].capitalize()}: {msg['content']}" for msg in self.buffer])
        return f"### Summary of Past Conversation:\n{self.running_summary}\n\n### Recent Messages:\n{buffer_text}"

代碼解析：

init(...)：設置空的running_summary和buffer列表。
add_message(...)：將消息添加到buffer，達到summary_threshold時調用**_consolidate_memory**。
_consolidate_memory()：格式化buffer和現有summary，請求LLM生成新summary，更新running_summary并清空buffer。
get_context(...)：提供長期summary和短期buffer，給LLM完整對話視圖。

初始化并測試：

summarization_memory = SummarizationMemory(summary_threshold=4)
agent = AIAgent(memory_strategy=summarization_memory)

agent.chat("I'm starting a new company called 'Innovatech'. Our focus is on sustainable energy.")
agent.chat("Our first product will be a smart solar panel, codenamed 'Project Helios'.")

輸出：

==== NEW INTERACTION ====
User: I am starting a new company called 'Innovatech'. Ou...
Bot: Congratulations on starting Innovatech! Focusing o ...  
>>>> Tokens: 45 | Response Time: 2.55s

==== NEW INTERACTION ====
User: Our first product will be a smart solar panel....  
--- [Memory Consolidation Triggered] ---  
--- [New Summary: The user started a compan ...  
Bot: That is exciting news about  ....  
>>>> Tokens: 204 | Response Time: 3.58s

兩回合后生成summary。繼續測試：

agent.chat("The marketing budget is set at $50,000.")
agent.chat("What is the name of my company and its first product?")

輸出：

==== NEW INTERACTION ====
User: What is the name of my company and its first product?  
Bot: Your company is called 'Innovatech' and its first product is codenamed 'Project Helios'.  
>>>> Tokens: 147 | Response Time: 1.05s

第四回合token數幾乎減半，summarization大大降低token使用。但需精心設計summarization prompts以捕捉關鍵細節。

缺點：關鍵信息可能在summarization中丟失。例如，40回合對話包含數值或事實細節（如第四回合的銷售數據），可能不再出現在summary中。

測試40回合后的場景：

agent.chat("what was the gross sales of our company in the fiscal year?")

輸出：

==== NEW INTERACTION ====
User: what was the gross sales of our company in the fiscal year?  
Bot: I am sorry but I do not have that information. Could you please provide the gross sales figure for the fiscal year?  
>>>> Tokens: 1532 | Response Time: 2.831s

summary信息雖減少tokens，但答案質量可能顯著下降。建議創建子代理進行fact-checking，提升可靠性。

Retrieval Based Memory

這是許多AI代理用例中最強大的策略：RAG-based AI agents。之前的方法減少token使用但可能丟失上下文，RAG通過基于當前用戶查詢檢索相關上下文解決此問題。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

上下文存儲在數據庫中，embedding models將文本轉換為向量表示，提升檢索效率。

RAG Based Memory流程：

1. 新交互保存為數據庫中的“document”，生成其數值表示（embedding）并存儲。

2. 用戶發送新消息，代理將其轉換為embedding。

3. 使用查詢embedding對所有document embeddings進行相似性搜索。

4. 檢索語義上最相關的k個documents（如3個最相似的歷史回合）。

5. 僅將這些相關documents注入LLM的上下文窗口。

使用FAISS進行向量存儲：

import numpy as np
import faiss

class RetrievalMemory(BaseMemoryStrategy):
    def __init__(self, k: int = 2, embedding_dim: int = 3584):
        """
        初始化retrieval memory系統。
        
        參數：
            k: 檢索的top相關documents數。
            embedding_dim: 嵌入模型生成的向量維度，BAAI/bge-multilingual-gemma2為3584。
        """
        self.k = k
        self.embedding_dim = embedding_dim
        self.documents = []
        self.index = faiss.IndexFlatL2(self.embedding_dim)

    def add_message(self, user_input: str, ai_response: str):
        """添加新對話回合到memory，分別嵌入和索引用戶和AI消息。"""
        docs_to_add = [
            f"User said: {user_input}",
            f"AI responded: {ai_response}"
        ]
        for doc in docs_to_add:
            embedding = generate_embedding(doc)
            if embedding:
                self.documents.append(doc)
                vector = np.array([embedding], dtype='float32')
                self.index.add(vector)

    def get_context(self, query: str) -> str:
        """根據語義相似性檢索k個最相關documents。"""
        if self.index.ntotal == 0:
            return "No information in memory yet."
        
        query_embedding = generate_embedding(query)
        if not query_embedding:
            return "Could not process query for retrieval."
        
        query_vector = np.array([query_embedding], dtype='float32')
        distances, indices = self.index.search(query_vector, self.k)
        
        retrieved_docs = [self.documents[i] for i in indices[0] if i != -1]
        if not retrieved_docs:
            return "Could not find any relevant information in memory."
        
        return "### Relevant Information Retrieved from Memory:\n" + "\n---\n".join(retrieved_docs)

代碼解析：

init(...)：初始化documents列表和faiss.IndexFlatL2存儲搜索向量，指定embedding_dim。
add_message(...)：為用戶和AI消息生成embedding，添加到documents和FAISS index。
get_context(...)：嵌入用戶查詢，使用self.index.search查找k個最相似向量，提取原始文本作為上下文。

初始化并測試：

retrieval_memory = RetrievalMemory(k=2)
agent = AIAgent(memory_strategy=retrieval_memory)

agent.chat("I am planning a vacation to Japan for next spring.")
agent.chat("For my software project, I'm using the React framework for the frontend.")
agent.chat("I want to visit Tokyo and Kyoto while I'm on my trip.")
agent.chat("The backend of my project will be built with Django.")
agent.chat("What cities am I planning to visit on my vacation?")

輸出：

==== NEW INTERACTION ====
User: What cities am I planning to visit on my vacation?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: MEMORY CONTEXT  
Relevant Information Retrieved from Memory:  
User said: I want to visit Tokyo and Kyoto while I am on my trip.  
---  
User said: I am planning a vacation to Japan for next spring.  
...  

Bot: You are planning to visit Tokyo and Kyoto while on your vacation to Japan next spring.  
>>>> Tokens: 65 | Response Time: 0.53s

成功檢索相關上下文，token數極低，僅檢索相關信息。embedding model和vector storage database的選擇至關重要，FAISS因其高效性廣受歡迎。但數據庫越大，AI代理復雜度越高，需并行查詢等優化技術。

Memory Augmented Transformers

AI系統正采用更復雜的方法，突破可能性的邊界。

想象普通AI像一個學生，只有一個小筆記本，寫的內容有限。在長考試中，他們得擦掉舊筆記為新筆記騰空間。Memory-Augmented Transformers就像給學生一堆便簽，筆記本處理當前工作，便簽保存早期關鍵信息。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

例如：設計一個無暴力的太空視頻游戲，早期提到“太空設定，無暴力”。普通AI可能忘記，但memory-augmentedAI將此寫在便簽上，稍后查詢時仍能匹配原始愿景。

Memory Augmented Transformers流程：

使用SlidingWindowMemory管理近期聊天。
每回合后，使用LLM作為“fact extractor”，分析對話，決定是否包含核心事實、偏好或決定。
若發現重要事實，存儲為memory token（簡潔字符串）。
提供給代理的最終上下文結合近期聊天窗口和所有持久memory tokens。

實現：

class MemoryAugmentedMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2):
        """
        初始化memory-augmented系統。
        
        參數：
            window_size: 短期memory保留的最近回合數。
        """
        self.recent_memory = SlidingWindowMemory(window_size=window_size)
        self.memory_tokens = []

    def add_message(self, user_input: str, ai_response: str):
        """添加回合到近期memory，并使用LLM決定是否創建持久memory token。"""
        self.recent_memory.add_message(user_input, ai_response)
        
        fact_extraction_prompt = (
            f"Analyze the following conversation turn. Does it contain a core fact, preference, or decision that should be remembered long-term? "
            f"Examples include user preferences ('I hate flying'), key decisions ('The budget is $1000'), or important facts ('My user ID is 12345').\n\n"
            f"Conversation Turn:\nUser: {user_input}\nAI: {ai_response}\n\n"
            f"If it contains such a fact, state the fact concisely in one sentence. Otherwise, respond with 'No important fact.'"
        )
        
        extracted_fact = generate_text("You are a fact-extraction expert.", fact_extraction_prompt)
        
        if "no important fact" not in extracted_fact.lower():
            print(f"--- [Memory Augmentation: New memory token created: '{extracted_fact}'] ---")
            self.memory_tokens.append(extracted_fact)

    def get_context(self, query: str) -> str:
        """結合短期近期對話和長期memory tokens構建上下文。"""
        recent_context = self.recent_memory.get_context(query)
        memory_token_context = "\n".join([f"- {token}" for token in self.memory_tokens])
        return f"### Key Memory Tokens (Long-Term Facts):\n{memory_token_context}\n\n### Recent Conversation:\n{recent_context}"

代碼解析：

init(...)：初始化SlidingWindowMemory和空的memory_tokens列表。
add_message(...)：添加回合到滑動窗口，額外調用LLM檢查是否提取關鍵事實，添加到memory_tokens。
get_context(...)：結合“便簽”（memory_tokens）和近期聊天歷史構建豐富prompt。

初始化并測試：

mem_aug_memory = MemoryAugmentedMemory(window_size=2)
agent = AIAgent(memory_strategy=mem_aug_memory)

agent.chat("Please remember this for all future interactions: I am severely allergic to peanuts.")
agent.chat("Okay, let's talk about recipes. What's a good idea for dinner tonight?")
agent.chat("That sounds good. What about a dessert option?")
agent.chat("Could you suggest a Thai green curry recipe? Please ensure it's safe for me.")

輸出：

==== NEW INTERACTION ====
User: Please remember this for all future interactions: I am severely allergic to peanuts.  
--- [Memory Augmentation: New memory token created: 'The user has a severe allergy to peanuts.'] ---  
Bot: I have taken note of your long-term fact: You are severely allergic to peanuts. I will keep this in mind...  
>>>> Tokens: 45 | Response Time: 1.32s

...

==== NEW INTERACTION ====
User: Could you suggest a Thai green curry recipe? Please ensure it is safe for me.  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: MEMORY CONTEXT  
Key Memory Tokens (Long-Term Facts):  
- The user has a severe allergy to peanuts.  
...  
Recent Conversation:  
User: Okay, lets talk about recipes...  
...  

Bot: Of course. Given your peanut allergy, it is very important to be careful with Thai cuisine as many recipes use peanuts or peanut oil. Here is a peanut-free Thai green curry recipe...  
>>>> Tokens: 712 | Response Time: 6.45s

此方法因需額外LLM調用進行fact extraction，復雜且成本高，但能長期保留關鍵信息，非常適合構建可靠的個人助手。

Hierarchical Optimization for Multi-tasks

之前我們將memory視為單一系統。如果代理能像人類一樣，擁有不同用途的memory類型呢？這就是Hierarchical Memory的理念，結合多種簡單memory類型，創建更復雜、有組織的智能系統。

類比人類記憶：

Working Memory：最近聽到的幾句話，快速但短暫。
Short-Term Memory：今天早上會議的要點，幾小時內易回憶。
-Term Memory：家庭地址或多年前學到的關鍵事實，持久且深入。

Hierarchical Approach流程：

1. 捕獲用戶消息到working memory。

2. 檢查信息是否重要，需提升至long-term memory。

3. 提升內容存儲到retrieval memory供未來使用。

4. 新查詢時，搜索long-term memory獲取相關上下文。

5. 將相關memories注入上下文，生成更好響應。

實現：

class HierarchicalMemory(BaseMemoryStrategy):
    def __init__(self, window_size: int = 2, k: int = 2, embedding_dim: int = 3584):
        """
        初始化hierarchical memory系統。
        
        參數：
            window_size: 短期working memory的回合數。
            k: 從long-term memory檢索的documents數。
            embedding_dim: long-term memory的嵌入向量維度。
        """
        print("Initializing Hierarchical Memory...")
        self.working_memory = SlidingWindowMemory(window_size=window_size)
        self.long_term_memory = RetrievalMemory(k=k, embedding_dim=embedding_dim)
        self.promotion_keywords = ["remember", "rule", "preference", "always", "never", "allergic"]

    def add_message(self, user_input: str, ai_response: str):
        """添加消息到working memory，基于內容有條件提升到long-term memory。"""
        self.working_memory.add_message(user_input, ai_response)
        
        if any(keyword in user_input.lower() for keyword in self.promotion_keywords):
            print(f"--- [Hierarchical Memory: Promoting message to long-term storage.] ---")
            self.long_term_memory.add_message(user_input, ai_response)

    def get_context(self, query: str) -> str:
        """結合long-term和short-term memory層構建豐富上下文。"""
        working_context = self.working_memory.get_context(query)
        long_term_context = self.long_term_memory.get_context(query)
        return f"### Retrieved Long-Term Memories:\n{long_term_context}\n\n### Recent Conversation (Working Memory):\n{working_context}"

代碼解析：

init(...)：初始化SlidingWindowMemory和RetrievalMemory，定義promotion_keywords。
add_message(...)：添加消息到working_memory，檢查是否包含keywords，若有則添加到long_term_memory。
get_context(...)：從兩種memory系統獲取上下文，合并為豐富prompt。

初始化并測試：

hierarchical_memory = HierarchicalMemory()
agent = AIAgent(memory_strategy=hierarchical_memory)

agent.chat("Please remember my User ID is AX-7890.")
agent.chat("Let's chat about the weather. It's very sunny today.")
agent.chat("I'm planning to go for a walk later.")
agent.chat("I need to log into my account, can you remind me of my ID?")

輸出：

==== NEW INTERACTION ====
User: Please remember my User ID is AX-7890.  
--- [Hierarchical Memory: Promoting message to long-term storage.] ---  
Bot: You have provided your User ID as AX-7890, which has been stored in long-term memory for future reference.  
...

==== NEW INTERACTION ====
User: I need to log into my account, can you remind me of my ID?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Retrieved Long-Term Memories:  
### Relevant Information Retrieved from Memory:  
User said: Please remember my User ID is AX-7890.  
...  
### Recent Conversation (Working Memory):  
User: Let's chat about the weather...  
User: I'm planning to go for a walk later...  

Bot: Your User ID is AX-7890. You can use this to log into your account. Is there anything else I can assist you with?  
>>>> Tokens: 452 | Response Time: 2.06s

代理成功結合不同memory類型，使用快速working memory維持對話流，查詢long-term memory檢索關鍵User ID。

Graph Based Optimization

之前memory以文本塊存儲，無論是完整對話、summary還是檢索document。如果代理能理解信息間的關系呢？這就是Graph-Based Memory的飛躍。

此策略將信息表示為knowledge graph：

?Nodes (Entities)：對話中的“事物”，如人（Clara）、公司（FutureScape）、概念（Project Odyssey）。

?Edges (Relations)：描述nodes關系的連接，如works_for、is_based_in、manages。

結果是結構化的網狀memory。例如，不是簡單事實“Clara works for FutureScape”，而是存儲連接：(Clara) --[works_for]--> (FutureScape)。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

這對于回答需要推理關系的復雜查詢非常強大。挑戰在于從非結構化對話填充graph。我們使用LLM提取結構化（Subject, Relation, Object）三元組。

實現，使用networkx庫：

import networkx as nx
import re

class GraphMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory，包含空的NetworkX有向圖。"""
        self.graph = nx.DiGraph()

    def _extract_triples(self, text: str) -> list[tuple[str, str, str]]:
        """使用LLM從文本提取(Subject, Relation, Object)三元組。"""
        print("--- [Graph Memory: Attempting to extract triples from text.] ---")
        extraction_prompt = (
            f"You are a knowledge extraction engine. Your task is to extract Subject-Relation-Object triples from the given text. "
            f"Format your output strictly as a list of Python tuples. For example: [('Sam', 'works_for', 'Innovatech'), ('Innovatech', 'focuses_on', 'Energy')]. "
            f"If no triples are found, return an empty list [].\n\n"
            f"Text to analyze:\n\"""{text}\""""
        )
        
        response_text = generate_text("You are an expert knowledge graph extractor.", extraction_prompt)
        
        try:
            found_triples = re.findall(r"\(['\"](.*?)['\"],\s*['\"](.*?)['\"],\s*['\"](.*?)['\"]\)", response_text)
            print(f"--- [Graph Memory: Extracted triples: {found_triples}] ---")
            return found_triples
        except Exception as e:
            print(f"Could not parse triples from LLM response: {e}")
            return []

    def add_message(self, user_input: str, ai_response: str):
        """從最新對話回合提取三元組并添加到knowledge graph。"""
        full_text = f"User: {user_input}\nAI: {ai_response}"
        triples = self._extract_triples(full_text)
        for subject, relation, obj in triples:
            self.graph.add_edge(subject.strip(), obj.strip(), relatinotallow=relation.strip())

    def get_context(self, query: str) -> str:
        """通過查詢中的實體查找graph，返回所有已知關系。"""
        if not self.graph.nodes:
            return "The knowledge graph is empty."
        
        query_entities = [word.capitalize() for word in query.replace('?','').split() if word.capitalize() in self.graph.nodes]
        
        if not query_entities:
            return "No relevant entities from your query were found in the knowledge graph."
        
        context_parts = []
        for entity in set(query_entities):
            for u, v, data in self.graph.out_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
            for u, v, data in self.graph.in_edges(entity, data=True):
                context_parts.append(f"{u} --[{data['relation']}]--> {v}")
        
        return "### Facts Retrieved from Knowledge Graph:\n" + "\n".join(sorted(list(set(context_parts))))

代碼解析：

?_extract_triples(…)：策略核心，將對話文本發送給LLM，請求結構化數據。

?add_message(…)：調用**_extract_triples**，將三元組添加到networkx graph。

?get_context(…)：搜索查詢中的實體，檢索所有相關關系作為結構化上下文。

測試：

graph_memory = GraphMemory()
agent = AIAgent(memory_strategy=graph_memory)

agent.chat("A person named Clara works for a company called 'FutureScape'.")
agent.chat("FutureScape is based in Berlin.")
agent.chat("Clara's main project is named 'Odyssey'.")
agent.chat("Tell me about Clara's project.")

輸出：

==== NEW INTERACTION ====
User: A person named Clara works for a company called 'FutureScape'.  
--- [Graph Memory: Attempting to extract triples from text.] ---  
--- [Graph Memory: Extracted triples: [('Clara', 'works_for', 'FutureScape')]] ---  
Bot: Understood. I've added the fact that Clara works for FutureScape to my knowledge graph.  
...

==== NEW INTERACTION ====
User: Clara's main project is named 'Odyssey'.  
--- [Graph Memory: Attempting to extract triples from text.] ---  
--- [Graph Memory: Extracted triples: [('Clara', 'manages_project', 'Odyssey')]] ---  
Bot: Got it. I've noted that Clara's main project is Odyssey.  

==== NEW INTERACTION ====
User: Tell me about Clara's project.  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Facts Retrieved from Knowledge Graph:  
Clara --[manages_project]--> Odyssey  
Clara --[works_for]--> FutureScape  
...  

Bot: Based on my knowledge graph, Clara's main project is named 'Odyssey', and Clara works for the company FutureScape.  
>>>> Tokens: 78 | Response Time: 1.5s

代理通過導航內部graph提供所有相關事實，適合構建高知識專家代理。

Compression & Consolidation Memory

summarization管理長對話效果不錯，但能否更激進地降低token使用？這就是Compression & Consolidation Memory，像是summarization的更強版本。

目標是將每條信息提煉為最密集的事實表示，例如將冗長會議記錄轉化為簡潔的單句要點。

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

Compression Approach流程：

1. 每回合（用戶輸入+AI響應）發送給LLM。

2. 使用特定prompt要求LLM作為“data compression engine”。

3. LLM將回合重寫為單一、必要語句，剔除寒暄、禮貌用語等。

4. 壓縮事實存儲在簡單列表中。

5. 代理的memory成為高效的核心事實列表，token效率極高。

實現：

class CompressionMemory(BaseMemoryStrategy):
    def __init__(self):
        """初始化memory，包含空的compressed facts列表。"""
        self.compressed_facts = []

    def add_message(self, user_input: str, ai_response: str):
        """使用LLM將最新回合壓縮為簡潔事實語句。"""
        text_to_compress = f"User: {user_input}\nAI: {ai_response}"
        
        compression_prompt = (
            f"You are a data compression engine. Your task is to distill the following text into its most essential, factual statement. "
            f"Be as concise as possible, removing all conversational fluff. For example, 'User asked about my name and I, the AI, responded that my name is an AI assistant' should become 'User asked for AI's name.'\n\n"
            f"Text to compress:\n\"{text_to_compress}\""
        )
        
        compressed_fact = generate_text("You are an expert data compressor.", compression_prompt)
        print(f"--- [Compression Memory: New fact stored: '{compressed_fact}'] ---")
        self.compressed_facts.append(compressed_fact)

    def get_context(self, query: str) -> str:
        """返回所有compressed facts列表，格式為項目符號列表。"""
        if not self.compressed_facts:
            return "No compressed facts in memory."
        return "### Compressed Factual Memory:\n- " + "\n- ".join(self.compressed_facts)

代碼解析：

?init(...)：創建空的compressed_facts列表。

?add_message(...)：將回合發送給LLM，用compression prompt存儲簡潔結果。

?get_context(...)：將compressed facts格式化為簡潔的項目符號列表。

測試：

compression_memory = CompressionMemory()
agent = AIAgent(memory_strategy=compression_memory)

agent.chat("Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.")
agent.chat("The date is confirmed for October 26th, 2025.")
agent.chat("Could you please summarize the key details for the conference plan?")

輸出：

==== NEW INTERACTION ====
User: Okay, I've decided on the venue for the conference. It's going to be the 'Metropolitan Convention Center'.  
--- [Compression Memory: New fact stored: 'The conference venue has been decided as the 'Metropolitan Convention Center'.'] ---  
Bot: Great! The Metropolitan Convention Center is an excellent choice. What's next on our planning list?  
...

==== NEW INTERACTION ====
User: The date is confirmed for October 26th, 2025.  
--- [Compression Memory: New fact stored: 'The conference date is confirmed for October 26th, 2025.'] ---  
Bot: Perfect, I've noted the date.  
...

==== NEW INTERACTION ====
User: Could you please summarize the key details for the conference plan?  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Compressed Factual Memory:  
- The conference venue has been decided as the 'Metropolitan Convention Center'.  
- The conference date is confirmed for October 26th, 2025.  
...  

Bot: Of course. Based on my notes, here are the key details for the conference plan:  
- **Venue:** Metropolitan Convention Center  
- **Date:** October 26th, 2025  
>>>> Tokens: 48 | Response Time: 1.2s

此策略極大降低token數，保留核心事實，適合需要長期事實召回且token預算緊張的應用。但對依賴細微語氣和個性的對話，壓縮可能過激。

OS-Like Memory Management

如果為代理構建一個像計算機memory一樣的系統呢？

AI總是忘事？教你9招，讓智能體“記性”變超強！-AI.x社區

此高級概念借鑒計算機Operating System管理RAM和hard disk的方式：

RAM：計算機用于活動程序的超快memory，昂貴且容量有限。代理的LLM context window即RAM，訪問快但大小受限。
Hard Disk：長期存儲，容量大且便宜，但訪問慢。代理可將其視為外部數據庫或文件，存儲舊對話歷史。

OS-Like Memory Management流程：

Active Memory (RAM)：最近對話回合保存在快速訪問的buffer中。
Passive Memory (Disk)：active memory滿時，最舊信息移到長期存儲，稱為“paging out”。
Page Fault：用戶詢問不在active memory的信息時，發生“page fault”。
系統從passive storage查找相關信息，加載到active context供LLM使用，稱為“paging in”。

實現，模擬active_memory（deque）和passive_memory（dictionary）：

class OSMemory(BaseMemoryStrategy):
    def __init__(self, ram_size: int = 2):
        """
        初始化OS-like memory系統。

        參數：
            ram_size: active memory (RAM)保留的最大對話回合數。
        """
        self.ram_size = ram_size
        self.active_memory = deque()
        self.passive_memory = {}
        self.turn_count = 0

    def add_message(self, user_input: str, ai_response: str):
        """添加回合到active memory，RAM滿時將最舊回合paging out到passive memory。"""
        turn_id = self.turn_count
        turn_data = f"User: {user_input}\nAI: {ai_response}"
        
        if len(self.active_memory) >= self.ram_size:
            lru_turn_id, lru_turn_data = self.active_memory.popleft()
            self.passive_memory[lru_turn_id] = lru_turn_data
            print(f"--- [OS Memory: Paging out Turn {lru_turn_id} to passive storage.] ---")
        
        self.active_memory.append((turn_id, turn_data))
        self.turn_count += 1

    def get_context(self, query: str) -> str:
        """提供RAM上下文，模擬page fault從passive memory拉取數據。"""
        active_context = "\n".join([data for _, data in self.active_memory])
        
        paged_in_context = ""
        for turn_id, data in self.passive_memory.items():
            if any(word in data.lower() for word in query.lower().split() if len(word) > 3):
                paged_in_context += f"\n(Paged in from Turn {turn_id}): {data}"
                print(f"--- [OS Memory: Page fault! Paging in Turn {turn_id} from passive storage.] ---")
        
        return f"### Active Memory (RAM):\n{active_context}\n\n### Paged-In from Passive Memory (Disk):\n{paged_in_context}"

    def clear(self):
        """清空active和passive memory。"""
        self.active_memory.clear()
        self.passive_memory = {}
        self.turn_count = 0
        print("OS-like memory cleared.")

代碼解析：

init(...)：設置固定大小的active_memory deque和空的passive_memory dictionary。
add_message(...)：添加新回合到active_memory，滿時將最舊回合popleft()移到passive_memory（paging out）。
get_context(...)：包含active_memory，搜索passive_memory，匹配查詢時paging in數據到上下文。

測試，代理被告知秘密代碼，強制其paging out到passive memory，然后詢問代碼：

os_memory = OSMemory(ram_size=2)
agent = AIAgent(memory_strategy=os_memory)

agent.chat("The secret launch code is 'Orion-Delta-7'.")
agent.chat("The weather for the launch looks clear.")
agent.chat("The launch window opens at 0400 Zulu.")
agent.chat("I need to confirm the launch code.")

輸出：

...

==== NEW INTERACTION ====
User: The launch window opens at 0400 Zulu.  
--- [OS Memory: Paging out Turn 0 to passive storage.] ---  
Bot: PROCESSING NEW LAUNCH WINDOW INFORMATION...  
...

==== NEW INTERACTION ====
User: I need to confirm the launch code.  
--- [OS Memory: Page fault! Paging in Turn 0 from passive storage.] ---  
--- Agent Debug Info ---  
[Full Prompt Sent to LLM]:  
---  
SYSTEM: You are a helpful AI assistant.  
USER: ### MEMORY CONTEXT  
### Active Memory (RAM):  
User: The weather for the launch looks clear.  
...  
User: The launch window opens at 0400 Zulu.  
...  
### Paged-In from Passive Memory (Disk):  
(Paged in from Turn 0): User: The secret launch code is 'Orion-Delta-7'.  
...  

Bot: CONFIRMING LAUNCH CODE: The stored secret launch code is 'Orion-Delta-7'.  
>>>> Tokens: 539 | Response Time: 2.56s

完美運行！代理成功將舊數據移到passive storage，僅在查詢需要時智能檢索。

此模型適合構建幾乎無限memory的大規模系統，同時保持active context小而快。

選擇合適的策略

我們探討了九種不同的memory optimization策略，從簡單到復雜。沒有單一“最佳”策略，選擇需平衡代理需求、預算和工程資源。

何時選擇什么？

簡單、短生命周期bots：Sequential或Sliding Window簡單易實現，效果好。
長、創意對話：Summarization維持對話流，減少token開銷。
需要精確長期召回的代理：Retrieval-Based memory是行業標準，強大且可擴展，RAG應用的基石。
高可靠性個人助手：Memory-Augmented或Hierarchical方法分離關鍵事實和對話雜音。
專家系統和知識庫：Graph-Based memory在推理數據點關系方面無與倫比。

生產中最強大的代理通常使用混合方法，結合這些技術。你可能使用hierarchical system，long-term memory結合vector database和knowledge graph。

關鍵是明確代理需要記住什么、多久、精度如何。掌握這些memory strategies，你能超越簡單chatbots，打造真正智能的代理，隨時間學習、記憶、表現更好。

本文轉載自??AI大模型觀察站??，作者：AI大模型觀察站

?著作權歸作者所有，如需轉載，請注明出處，否則將追究法律責任

標簽

AI代理記憶

AI代理

已于2025-7-18 14:36:48修改

贊

回復

舉報

回復

51CTO

51CTO博客

51CTO學堂

AI總是忘事？教你9招，讓智能體“記性”變超強！原創

9種技巧

優化AI代理記憶的9種技巧：從入門到高級

技巧總結

目錄

環境設置

創建輔助函數

創建基礎代理和Memory Class

Sequential Optimization Approa 的問題

Sliding Window Approach

Summarization Based Optimization

Retrieval Based Memory

Memory Augmented Transformers

Hierarchical Optimization for Multi-tasks

Graph Based Optimization

Compression & Consolidation Memory

OS-Like Memory Management

選擇合適的策略

目錄

51CTO

51CTO博客

51CTO學堂

AI總是忘事？教你9招，讓智能體“記性”變超強！ 原創

9種技巧

優化AI代理記憶的9種技巧：從入門到高級

技巧總結

目錄

環境設置

創建輔助函數

創建基礎代理和Memory Class

Sequential Optimization Approa 的問題

Sliding Window Approach

Summarization Based Optimization

Retrieval Based Memory

Memory Augmented Transformers

Hierarchical Optimization for Multi-tasks

Graph Based Optimization

Compression & Consolidation Memory

OS-Like Memory Management

選擇合適的策略

目錄

AI總是忘事？教你9招，讓智能體“記性”變超強！原創