精品欧美一区二区三区在线观看 _久久久久国色av免费观看性色_国产精品久久在线观看_亚洲第一综合网站_91精品又粗又猛又爽_小泽玛利亚一区二区免费_91亚洲精品国偷拍自产在线观看 _久久精品视频在线播放_美女精品久久久_欧美日韩国产成人在线

構建一套可自我改進的 Agentic RAG 系統 精華

發布于 2025-11-24 00:11
瀏覽
0收藏

Agentic RAG 系統可以被視為一個“高維向量空間”,其中每個維度都對應一次設計決策,例如 prompt engineering、agent 協同、retrieval 策略等。手動調優這些維度以找到“正確組合”非常困難,而且上線后的未見數據往往會打破測試時有效的配置。

一個更好的方法是讓系統學會“自我優化”。一條典型的、能“自我進化”的 Agentic RAG 流水線,遵循如下思考過程:

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Self Improving Agentic RAG System (Created by Fareed Khan)

  • 一個由“專家型代理(specialist agents)”組成的協作團隊執行任務。它基于一個高層概念,按照當前 SOP(標準作業程序)生成一份完整的、多來源文檔。
  • 一個“多維評價系統(multi-dimensional evaluation system)”對團隊輸出進行評分,度量準確性、可行性、合規性等多個目標,得到一個性能向量。
  • 一個“性能診斷代理(diagnostician agent)”分析該向量,像咨詢顧問一樣識別流程中的主要薄弱環節,并追溯根因。
  • 一個“SOP 架構代理(SOP architect agent)”基于診斷洞見更新流程,提出專門用于修復薄弱點的新變體。
  • 每個“SOP 新版本”都會在團隊重復執行任務時進行測試,每次輸出再被評估,以生成對應的性能向量。
  • 系統識別“Pareto front(帕累托前沿)”,即所有已測 SOP 的最優權衡組合,并將這些優化策略呈現給“人類決策者”,從而閉合進化回路。

在這篇博客中,我們將瞄準“醫療健康(healthcare)”領域。該領域的挑戰在于:需要針對輸入查詢或知識庫考慮“多種可能性”,同時“最終決策仍由人類掌握?!?/p>

我們將構建一條端到端、可自我改進的 Agentic RAG 流水線,用來生成 RAG 系統的不同設計模式。

完整代碼可在我的 GitHub 倉庫獲取:

GitHub - FareedKhan-dev/autonomous-agentic-rag: Self improving agentic rag pipeline

目錄

  • 醫學 AI 的知識基礎設施°安裝開源技術?!悱h境配置與依賴導入°配置本地大語言模型°準備知識庫
  • 構建內部臨床試驗設計網絡°定義標準操作規程(Guild SOP)°定義專業智能體(Specialist Agents)°使用 LangGraph 編排公會°完整運行工作流圖
  • 多維度評價體系°為每個參數構建自定義評估器°創建聚合型 LangSmith 評估器
  • 進化引擎的外層循環°管理配置°構建主任級智能體(Director-Level Agents)°運行完整的進化循環
  • 基于五維的帕累托分析°識別帕累托前沿°可視化前沿并做出決策
  • 理解認知工作流°可視化智能體工作流時間線°使用雷達圖剖析輸出結果
  • 將其轉變為自主策略

醫學 AI 的知識基礎設施

在編寫可自進化的 agentic RAG 系統之前,我們需要先建立合適的知識數據庫,以及搭建用于構建架構的必要工具。

一套生產級 RAG 系統通常包含多樣化的數據庫,既包括敏感的組織內部數據,也包含開源數據,用來提升檢索質量,并彌補信息過時或不完整的問題。這個基礎步驟可以說是最關鍵的……

因為數據源的質量將直接決定最終輸出的質量。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Sourcing the knowledge base (Created by Fareed Khan)

本節我們將組裝整套架構的各個組件。計劃如下:

  • 安裝開源技術棧(Open-Source Stack):搭建環境并安裝必要庫,堅持本地、開源優先(open-source-first)。
  • 配置安全可觀測性(Secure Observability):安全加載 API Key,并配置 LangSmith,從一開始就追蹤和調試復雜的代理交互。
  • 搭建本地 LLM 工坊(LLM Foundry):通過 Ollama 構建不同的開源模型組合,為不同任務分配不同模型,以優化表現與成本。
  • 獲取并處理多模態數據:下載并準備 4 類真實數據源:PubMed 科學文獻、FDA 監管指南、倫理原則,以及一個大型結構化臨床數據集(MIMIC-III)。
  • 索引知識庫(Index the Knowledge Stores):最終,將原始數據處理為高效可檢索的數據庫:對非結構化文本使用 FAISS 向量庫,對結構化臨床數據使用 DuckDB。

安裝開源技術棧

第一步是安裝所需的 Python 庫。可復現的環境是一切嚴肅項目的基石。我們選擇業界標準的開源棧,以便對系統進行完全掌控。包括用于核心 agentic 框架的 langchain 和 langgraph、與本地 LLM 交互的 ollama,以及訪問 PubMed 的 biopython、進行高性能臨床數據分析的 duckdb 等專業庫。

讓我們安裝需要的模塊……

# We uses pip "quiet" (-q) and "upgrade" (-U) flags to install all the required packages.
# - langchain, langgraph, etc.: These form the core of our agentic framework for building and orchestrating agents.
# - ollama: This is the client library that allows our Python code to communicate with a locally running Ollama server.
# - duckdb: An incredibly fast, in-process analytical database perfect for handling our structured MIMIC data without a heavy server setup.
# - faiss-cpu: Facebook AI's library for efficient similarity search, which will power the vector stores for our RAG agents.
# - sentence-transformers: A library for easy access to state-of-the-art models for creating text embeddings.
# - biopython, pypdf, beautifulsoup4: A suite of powerful utilities for downloading and parsing our diverse, real-world data sources.
%pip install -U langchain langgraph langchain_community langchain_openai langchain_core ollama pandas duckdb faiss-cpu sentence-transformers biopython pypdf pydantic lxml html2text beautifulsoup4 matplotlib -qqq

我們一次性準備好所有工具和“建筑材料”。各庫各司其職:從用 langgraph 編排 agent 工作流,到用 duckdb 做數據分析。

模塊安裝完成后,讓我們逐一初始化它們。

環境配置與依賴導入

我們需要安全地配置環境。把 API Key 硬編碼在筆記本里既有安全風險,也不利于共享代碼。

我們使用 ??.env?? 文件管理敏感信息,主要是 LangSmith 的 API Key。從一開始就配置 LangSmith 是不可妥協的要求,這將為我們提供深度可觀測性,以跟蹤、調試并理解 agents 之間的交互。上代碼:

import os
import getpass
from dotenv import load_dotenv

# This function from the python-dotenv library searches for a .env file and loads its key-value pairs
# into the operating system's environment variables, making them accessible to our script.
load_dotenv()

# This is a critical check. We verify that our script can access the necessary API keys from the environment.
if"LANGCHAIN_API_KEY"notin os.environ or"ENTREZ_EMAIL"notin os.environ:
    # If the keys are missing, we print an error and halt, as the application cannot proceed.
    print("Required environment variables not set. Please set them in your .env file or environment.")
else:
    # This confirmation tells us our secrets have been loaded securely and are ready for use.
    print("Environment variables loaded successfully.")

# We explicitly set the LangSmith project name. This is a best practice that ensures all traces

# generated by this project are automatically grouped together in the LangSmith user interface for easy analysis.
os.environ["LANGCHAIN_PROJECT"] = "AI_Clinical_Trials_Architect"

??load_dotenv()??? 是敏感憑據與代碼之間的一座“安全橋梁”。它讀取 ??.env??(絕不要提交到版本庫),并將密鑰注入環境。

從現在起,我們使用 LangChain 或 LangGraph 的所有操作都會自動被采集,并發送到 LangSmith 的項目中。

配置本地大語言模型

在生產級 agentic 系統中,“一刀切”的模型策略往往不是最佳。大型 SOTA 模型計算開銷大且慢,把它用于簡單任務會浪費資源(尤其自托管在 GPU 時)。但小模型雖然快速,卻可能缺乏做關鍵決策所需的深度推理能力。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Configuring Local LLMs (Created by Fareed Khan)

關鍵在于將“合適的模型放在系統的合適位置”。我們將構建一個多模型組合(均由 Ollama 本地服務以保障隱私、可控與成本效益),每個模型在特定角色上發揮所長。

先定義一個配置字典,集中管理每個選定模型的客戶端,便于替換與統一管理。

from langchain_community.chat_models import ChatOllama
from langchain_community.embeddings import OllamaEmbeddings

# This dictionary will act as our central registry, or "foundry," for all LLM and embedding model clients.
llm_config = {
    # For the 'planner', we use Llama 3.1 8B. It's a modern, highly capable model that excels at instruction-following.
    # We set `format='json'` to leverage Ollama's built-in JSON mode, ensuring reliable structured output for this critical task.
    "planner": ChatOllama(model="llama3.1:8b-instruct", temperature=0.0, format='json'),
    
    # For the 'drafter' and 'sql_coder', we use Qwen2 7B. It's a nimble and fast model, perfect for
    # tasks like text generation and code completion where speed is valuable.
    "drafter": ChatOllama(model="qwen2:7b", temperature=0.2),
    "sql_coder": ChatOllama(model="qwen2:7b", temperature=0.0),
    
    # For the 'director', the highest-level strategic agent, we use the powerful Llama 3 70B model.
    # This high-stakes task of diagnosing performance and evolving the system's own procedures
    # justifies the use of a larger, more powerful model.
    "director": ChatOllama(model="llama3:70b", temperature=0.0, format='json'),
    # For embeddings, we use 'nomic-embed-text', a top-tier, efficient open-source model.
    "embedding_model": OllamaEmbeddings(model="nomic-embed-text")
}

我們剛剛創建了 ??llm_config?? 字典,作為所有模型初始化的“中央樞紐”。通過為不同角色分配不同模型,構建一套按成本-性能權衡優化的層次結構。

  • 快速靈巧(7B–8B):??planner???、??drafter???、??sql_coder?? 處理頻繁、定義清晰的任務。使用 Qwen2 7B、Llama 3.1 8B 能保證低延遲與高性價比,同時具備足夠的指令跟隨能力生成計劃、撰寫文本或編寫 SQL。
  • 深度策略(70B):??director?? 需要分析多維性能數據并改寫整個 SOP,要求較強的因果推理與全局理解。為這種“低頻高風險”任務分配 Llama 3 70B 是合理的。

打印配置以確認:

# Print the configuration to confirm the clients are initialized and their parameters are set correctly.
print("LLM clients configured:")
print(f"Planner ({llm_config['planner'].model}): {llm_config['planner']}")
print(f"Drafter ({llm_config['drafter'].model}): {llm_config['drafter']}")
print(f"SQL Coder ({llm_config['sql_coder'].model}): {llm_config['sql_coder']}")
print(f"Director ({llm_config['director'].model}): {llm_config['director']}")
print(f"Embedding Model ({llm_config['embedding_model'].model}): {llm_config['embedding_model']}")

輸出示例:

#### OUTPUT ####
LLM clients configured:
Planner (llama3.1:8b-instruct): ChatOllama(model='llama3.1:8b-instruct', temperature=0.0, format='json')
Drafter (qwen2:7b): ChatOllama(model='qwen2:7b', temperature=0.2)
SQL Coder (qwen2:7b): ChatOllama(model='qwen2:7b', temperature=0.0)
Director (llama3:70b): ChatOllama(model='llama3:70b', temperature=0.0, format='json')
Embedding Model (nomic-embed-text): OllamaEmbeddings(model='nomic-embed-text')

這表明 ??ChatOllama??? 和 ??OllamaEmbeddings?? 客戶端已按指定模型與參數成功初始化。接下來連接知識庫。

準備知識庫

RAG 的“靈魂”在于一套豐富的多模態知識基座。面對臨床試驗設計這樣的專業任務,通用的網頁搜索遠遠不夠。我們需要以權威、領域特定的信息作為根基。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Knowledge store creation (Created by Fareed Khan)

為此,我們將構建一個全面的“知識庫”,從四類真實世界數據中采集、下載并處理內容。多源融合對幫助 agents 進行信息綜合至關重要,最終輸出也會更全面更可靠。

先創建數據目錄:

import os

# A dictionary to hold the paths for our different data types. This keeps our file management clean and centralized.
data_paths = {
    "base": "./data",
    "pubmed": "./data/pubmed_articles",
    "fda": "./data/fda_guidelines",
    "ethics": "./data/ethical_guidelines",
    "mimic": "./data/mimic_db"
}
# This loop iterates through our defined paths and uses os.makedirs() to create any directories that don't already exist.
# This prevents errors in later steps when we try to save files to these locations.
for path in data_paths.values():
    ifnot os.path.exists(path):
        os.makedirs(path)
        print(f"Created directory: {path}")

這確保項目從一開始就擁有干凈、組織良好的文件結構。

接著從 PubMed 獲取真實文獻,為 ??Medical Researcher?? 提供核心知識:

from Bio import Entrez
from Bio import Medline

defdownload_pubmed_articles(query, max_articles=20):
    """Fetches abstracts from PubMed for a given query and saves them as text files."""
    # The NCBI API requires an email address for identification. We fetch it from our environment variables.
    Entrez.email = os.environ.get("ENTREZ_EMAIL")
    print(f"Fetching PubMed articles for query: {query}")
    
    # Step 1: Use Entrez.esearch to find the PubMed IDs (PMIDs) for articles matching our query.
    handle = Entrez.esearch(db="pubmed", term=query, retmax=max_articles, sort="relevance")
    record = Entrez.read(handle)
    id_list = record["IdList"]
    print(f"Found {len(id_list)} article IDs.")
    
    print("Downloading articles...")
    # Step 2: Use Entrez.efetch to retrieve the full records (in MEDLINE format) for the list of PMIDs.
    handle = Entrez.efetch(db="pubmed", id=id_list, rettype="medline", retmode="text")
    records = Medline.parse(handle)
    
    count = 0
    # Step 3: Iterate through the retrieved records, parse them, and save each abstract to a file.
    for i, record inenumerate(records):
        pmid = record.get("PMID", "")
        title = record.get("TI", "No Title")
        abstract = record.get("AB", "No Abstract")
        if pmid:
            # We name the file after the PMID for easy reference and to avoid duplicates.
            filepath = os.path.join(data_paths["pubmed"], f"{pmid}.txt")
            withopen(filepath, "w") as f:
                f.write(f"Title: {title}\n\nAbstract: {abstract}")
            print(f"[{i+1}/{len(id_list)}] Fetching PMID: {pmid}... Saved to {filepath}")
            count += 1
    return count

該函數按 3 步連接 NCBI,檢索符合布爾查詢的 PMID、拉取 MEDLINE 記錄并保存標題與摘要到本地文本文件。

執行:

# We define a specific, boolean query to find articles highly relevant to our trial concept.
pubmed_query = "(SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)"
num_downloaded = download_pubmed_articles(pubmed_query)
print(f"PubMed download complete. {num_downloaded} articles saved.")

示例輸出:

#### OUTPUT ####
Fetching PubMed articles for query: (SGLT2 inhibitor) AND (type 2 diabetes) AND (renal impairment)
Found 20 article IDs.
Downloading articles...
[1/20] Fetching PMID: 38810260... Saved to ./data/pubmed_articles/38810260.txt
[2/20] Fetching PMID: 38788484... Saved to ./data/pubmed_articles/38788484.txt
...
PubMed download complete. 20 articles saved.

現在 ??Medical Researcher?? 具備扎實、最新、領域特定的科學依據。

接下來獲取監管文件,供 ??Regulatory Specialist?? 使用:

import requests
from pypdf import PdfReader
import io

defdownload_and_extract_text_from_pdf(url, output_path):
    """Downloads a PDF from a URL, saves it, and also extracts its text content to a separate .txt file."""
    print(f"Downloading FDA Guideline: {url}")
    try:
        # We use the 'requests' library to perform the HTTP GET request to download the file.
        response = requests.get(url)
        response.raise_for_status() # This is a good practice that will raise an error if the download fails (e.g., a 404 error).
        
        # We save the raw PDF file, which is useful for archival purposes.
        withopen(output_path, 'wb') as f:
            f.write(response.content)
        print(f"Successfully downloaded and saved to {output_path}")
        
        # We then use pypdf to read the PDF content directly from the in-memory response.
        reader = PdfReader(io.BytesIO(response.content))
        text = ""
        # We loop through each page of the PDF and append its extracted text.
        for page in reader.pages:
            text += page.extract_text() + "\n\n"
        
        # Finally, we save the clean, extracted text to a .txt file. This is the file our RAG system will actually use.
        txt_output_path = os.path.splitext(output_path)[0] + '.txt'
        withopen(txt_output_path, 'w') as f:
            f.write(text)
        returnTrue
    except requests.exceptions.RequestException as e:
        print(f"Error downloading file: {e}")
        returnFalse

運行下載 FDA 指南并抽取文本:

# This URL points to a real FDA guidance document for developing drugs for diabetes.
fda_url = "https://www.fda.gov/media/71185/download"
fda_pdf_path = os.path.join(data_paths["fda"], "fda_diabetes_guidance.pdf")
download_and_extract_text_from_pdf(fda_url, fda_pdf_path)

#### OUTPUT ####
Downloading FDA Guideline: https://www.fda.gov/media/71185/download
Successfully downloaded and saved to ./data/fda_guidelines/fda_diabetes_guidance.pdf

現在 ??Regulatory Specialist?? 擁有法律與監管文本的基礎語料。

接著為 ??Ethics Specialist?? 準備一份精要文檔(相當于 Belmont Report 的核心原則摘要),以確保其推理建立在最重要概念之上:

# This multi-line string contains a curated summary of the three core principles of the Belmont Report,
# which is the foundational document for ethics in human subject research in the United States.
ethics_content = """
Title: Summary of the Belmont Report Principles for Clinical Research
1. Respect for Persons: This principle requires that individuals be treated as autonomous agents and that persons with diminished autonomy are entitled to protection. This translates to robust informed consent processes. Inclusion/exclusion criteria must not unduly target or coerce vulnerable populations, such as economically disadvantaged individuals, prisoners, or those with severe cognitive impairments, unless the research is directly intended to benefit that population.
2. Beneficence: This principle involves two complementary rules: (1) do not harm and (2) maximize possible benefits and minimize possible harms. The criteria must be designed to select a population that is most likely to benefit and least likely to be harmed by the intervention. The risks to subjects must be reasonable in relation to anticipated benefits.
3. Justice: This principle concerns the fairness of distribution of the burdens and benefits of research. The selection of research subjects must be equitable. Criteria should not be designed to exclude certain groups without a sound scientific or safety-related justification. For example, excluding participants based on race, gender, or socioeconomic status is unjust unless there is a clear rationale related to the drug's mechanism or risk profile.
"""

# We define the path where our ethics document will be saved.
ethics_path = os.path.join(data_paths["ethics"], "belmont_summary.txt")

# We open the file in write mode and save the content.
with open(ethics_path, "w") as f:
    f.write(ethics_content)
print(f"Created ethics guideline file: {ethics_path}")

最后是最復雜的數據源:來自 MIMIC-III 的結構化臨床數據,為 ??Patient Cohort Analyst?? 提供真實世界人群數據,用以評估招募可行性。

import duckdb
import pandas as pd
import os


defload_real_mimic_data():
    """Loads real MIMIC-III CSVs into a persistent DuckDB database file, processing the massive LABEVENTS table efficiently."""
    print("Attempting to load real MIMIC-III data from local CSVs...")
    db_path = os.path.join(data_paths["mimic"], "mimic3_real.db")
    csv_dir = os.path.join(data_paths["mimic"], "mimiciii_csvs")
    
    # Define the paths to the required compressed CSV files.
    required_files = {
        "patients": os.path.join(csv_dir, "PATIENTS.csv.gz"),
        "diagnoses": os.path.join(csv_dir, "DIAGNOSES_ICD.csv.gz"),
        "labevents": os.path.join(csv_dir, "LABEVENTS.csv.gz"),
    }
    
    # Before starting, we check if all the necessary source files are present.
    missing_files = [path for path in required_files.values() ifnot os.path.exists(path)]
    if missing_files:
        print("ERROR: The following MIMIC-III files were not found:")
        for f in missing_files: print(f"- {f}")
        print("\nPlease download them as instructed and place them in the correct directory.")
        returnNone
    
    print("Required files found. Proceeding with database creation.")
    # Remove any old database file to ensure we are building from scratch.
    if os.path.exists(db_path):
        os.remove(db_path)
    # Connect to DuckDB. If the database file doesn't exist, it will be created.
    con = duckdb.connect(db_path)
    
    # Use DuckDB's powerful `read_csv_auto` to directly load data from the gzipped CSVs into SQL tables.
    print(f"Loading {required_files['patients']} into DuckDB...")
    con.execute(f"CREATE TABLE patients AS SELECT SUBJECT_ID, GENDER, DOB, DOD FROM read_csv_auto('{required_files['patients']}')")
    
    print(f"Loading {required_files['diagnoses']} into DuckDB...")
    con.execute(f"CREATE TABLE diagnoses_icd AS SELECT SUBJECT_ID, ICD9_CODE FROM read_csv_auto('{required_files['diagnoses']}')")
    
    # The LABEVENTS table is enormous. To handle it robustly, we use a two-stage process.
    print(f"Loading and processing {required_files['labevents']} (this may take several minutes)...")
    # 1. Load the data into a temporary 'staging' table, treating all columns as text (`all_varchar=True`).
    #    This prevents parsing errors with mixed data types. We also filter for only the lab item IDs we
    #    care about (50912 for Creatinine, 50852 for HbA1c) and use a regex to ensure VALUENUM is numeric.
    con.execute(f"""CREATE TABLE labevents_staging AS 
                   SELECT SUBJECT_ID, ITEMID, VALUENUM 
                   FROM read_csv_auto('{required_files['labevents']}', all_varchar=True) 
                   WHERE ITEMID IN ('50912', '50852') AND VALUENUM IS NOT NULL AND VALUENUM ~ '^[0-9]+(\\.[0-9]+)?$'
                """)
    # 2. Create the final, clean table by selecting from the staging table and casting the columns to their correct numeric types.
    con.execute("CREATE TABLE labevents AS SELECT SUBJECT_ID, CAST(ITEMID AS INTEGER) AS ITEMID, CAST(VALUENUM AS DOUBLE) AS VALUENUM FROM labevents_staging")
    # 3. Drop the temporary staging table to save space.
    con.execute("DROP TABLE labevents_staging")
    con.close()
    return db_path

這里利用 DuckDB 直接從磁盤處理大型 CSV,而不是用 pandas 全量讀入內存;對 LABEVENTS 采用兩階段清洗(先 all_varchar 過濾,再強制轉換類型),以穩健應對數據質量問題并得到清潔高效的查詢表。

執行并檢查:

# Execute the function to build the database.
db_path = load_real_mimic_data()

# If the database was created successfully, connect to it and inspect the schema and some sample data.
if db_path:
    print(f"\nReal MIMIC-III database created at: {db_path}")
    print("\nTesting database connection and schema...")
    con = duckdb.connect(db_path)
    print(f"Tables in DB: {con.execute('SHOW TABLES').df()['name'].tolist()}")
    print("\nSample of 'patients' table:")
    print(con.execute("SELECT * FROM patients LIMIT 5").df())
    print("\nSample of 'diagnoses_icd' table:")
    print(con.execute("SELECT * FROM diagnoses_icd LIMIT 5").df())
    con.close()

示例輸出略,顯示三張表均已創建成功,可查詢。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Pre-processing Step (Created by Fareed Khan)

最后,將所有非結構化文本數據索引為可檢索的向量庫,以便 RAG 使用:

from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document

defcreate_vector_store(folder_path: str, embedding_model, store_name: str):
    """Loads all .txt files from a folder, splits them into chunks, and creates an in-memory FAISS vector store."""
    print(f"--- Creating {store_name} Vector Store ---")
    # Use DirectoryLoader to efficiently load all .txt files from the specified folder.
    loader = DirectoryLoader(folder_path, glob="**/*.txt", loader_cls=TextLoader, show_progress=True)
    documents = loader.load()
    
    ifnot documents:
        print(f"No documents found in {folder_path}, skipping vector store creation.")
        returnNone
    
    # Use RecursiveCharacterTextSplitter to break large documents into smaller, 1000-character chunks with a 100-character overlap.
    # The overlap helps maintain context between chunks.
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    texts = text_splitter.split_documents(documents)
    
    print(f"Loaded {len(documents)} documents, split into {len(texts)} chunks.")
    print("Generating embeddings and indexing into FAISS... (This may take a moment)")
    # FAISS.from_documents is a convenient function that handles both embedding the text chunks
    # and building the efficient FAISS index in one step.
    db = FAISS.from_documents(texts, embedding_model)
    print(f"{store_name} Vector Store created successfully.")
    return db

defcreate_retrievers(embedding_model):
    """Creates vector store retrievers for all unstructured data sources and consolidates all knowledge stores."""
    # Create a separate, specialized vector store for each type of document.
    pubmed_db = create_vector_store(data_paths["pubmed"], embedding_model, "PubMed")
    fda_db = create_vector_store(data_paths["fda"], embedding_model, "FDA")
    ethics_db = create_vector_store(data_paths["ethics"], embedding_model, "Ethics")
    
    # Return a single dictionary containing all configured data access tools.
    # The 'as_retriever' method converts the vector store into a standard LangChain Retriever object.
    # The 'k' parameter in 'search_kwargs' controls how many top documents are returned by a search.
    return {
        "pubmed_retriever": pubmed_db.as_retriever(search_kwargs={"k": 3}) if pubmed_db elseNone,
        "fda_retriever": fda_db.as_retriever(search_kwargs={"k": 3}) if fda_db elseNone,
        "ethics_retriever": ethics_db.as_retriever(search_kwargs={"k": 2}) if ethics_db elseNone,
        "mimic_db_path": db_path # We also include the file path to our structured DuckDB database.
    }

??create_vector_store??? 封裝了“load -> split -> embed -> index”的標準 RAG 構建流程;??create_retrievers??? 則為每類語料構建獨立向量庫并返回 retriever 字典。我們采用“分域向量庫”而非“大一統”,以便各代理只檢索各自相關的知識源(例如 ??Regulatory Specialist??? 僅使用 ??fda_retriever??)。

執行創建:

# Execute the function to create all our retrievers.
knowledge_stores = create_retrievers(llm_config["embedding_model"])

print("\nKnowledge stores and retrievers created successfully.")

# Print the final dictionary to confirm all components are present.
for name, store in knowledge_stores.items():
    print(f"{name}: {store}")

輸出顯示各 retriever 創建成功。

至此,數據(下載、處理、索引)與 LLM(配置)均已就緒,可以開始構建系統的第一大組件:Trial Design Guild(試驗設計工會)。

構建內部臨床試驗設計網絡

隨著知識庫就緒,現在構建系統核心。這不是一個簡單線性的 RAG chain,而是一套基于 LangGraph 的協作式多代理工作流:一支 AI 專家團隊,共同將高層試驗概念轉化為一份詳細、數據支撐的標準化標準文檔。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Main Inner Loop RAG (Created by Fareed Khan)

整個架構的行為不是硬編碼的,而由一個動態配置對象治理:標準作業程序(Standard Operating Procedure,??GuildSOP??)。

這個 SOP 是我們 RAG 流水線的“基因組(genome)”,也是外層的“AI Research Director”將要進化與優化的對象。

本節計劃:

  • 定義 RAG 基因組:創建 Pydantic 模型??GuildSOP??,用于驅動整個工作流架構。
  • 設計共享工作臺:定義??GuildState??,作為代理共享計劃與發現的中央空間。
  • 構建專家型代理:將 Planner、Researchers、SQL Analyst、Synthesizer 分別實現為 Python 函數,作為圖中的節點。
  • 編排協作:用 LangGraph 將這些 agent 節點接線成完整端到端工作流。
  • 全量測試:用 baseline SOP 調用完整的 Guild graph,觀察其實際運行并生成首版標準文檔。

定義公會標準操作規程

先定義控制整體流程行為的結構。我們用 Pydantic ??BaseModel??? 創建 ??GuildSOP??。通過強類型、校驗、自文檔化,讓 SOP 穩定可進化。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Guild SOP Design (Created by Fareed Khan)

from pydantic import BaseModel, Field
from typing importLiteral

classGuildSOP(BaseModel):
    """Standard Operating Procedures for the Trial Design Guild. This object acts as the dynamic configuration for the entire RAG workflow."""
    
    # This field holds the system prompt for the Planner Agent, dictating its strategy.
    planner_prompt: str = Field(descriptinotallow="The system prompt for the Planner Agent.")
    
    # This parameter controls how many documents the Medical Researcher retrieves, allowing us to tune the breadth of its search.
    researcher_retriever_k: int = Field(descriptinotallow="Number of documents for the Medical Researcher to retrieve.", default=3)
    
    # This is the system prompt for the final writer, the Synthesizer Agent.
    synthesizer_prompt: str = Field(descriptinotallow="The system prompt for the Criteria Synthesizer Agent.")
    
    # This allows us to dynamically change the model used for the final drafting stage, trading off speed vs. quality.
    synthesizer_model: Literal["qwen2:7b", "llama3.1:8b-instruct"] = Field(descriptinotallow="The LLM to use for the Synthesizer.", default="qwen2:7b")
    
    # These booleans act as "feature flags," allowing the Director to turn entire agent capabilities on or off.
    use_sql_analyst: bool = Field(descriptinotallow="Whether to use the Patient Cohort Analyst agent.", default=True)
    use_ethics_specialist: bool = Field(descriptinotallow="Whether to use the Ethics Specialist agent.", default=True)

??GuildSOP??? 公開了關鍵參數(如 prompts、??researcher_retriever_k???、以及 agent 開關),使外層 AI Director 能夠拉動這些“策略杠桿”,進而調優整體性能。??synthesizer_model??? 使用 ??Literal?? 限定取值集合,保證類型安全。

構建 baseline 版本:

import json

baseline_sop = GuildSOP(
    planner_prompt="""You are a master planner for clinical trial design...""",
    synthesizer_prompt="""You are an expert medical writer...""",
    researcher_retriever_k=3,
    synthesizer_model="qwen2:7b",
    use_sql_analyst=True,
    use_ethics_specialist=True
)

打?。?/p>

print("Baseline GuildSOP (v1.0):")
print(json.dumps(baseline_sop.dict(), indent=4))

輸出顯示 baseline SOP 的全部配置,作為初始“手工工程”的最佳猜測,供 AI Director 后續優化與超越。

定義專業智能體(Specialist Agents)

有了“規則書”(SOP),接下來定義 agents。在 LangGraph 中,agent 是一個節點(Python 函數),輸入為當前圖狀態,輸出為狀態增量。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Specialist Agents (Created by Fareed Khan)

先定義共享狀態 ??GuildState??,充當協作“工作臺”,保存初始請求、planner 生成的計劃、各專家的發現、以及最終輸出。

from typing importList, Dict, Any, Optional
from langchain_core.pydantic_v1 import BaseModel
from typing_extensions import TypedDict

classAgentOutput(BaseModel):
    """A structured output for each agent's findings."""
    agent_name: str
    findings: Any

classGuildState(TypedDict):
    """The state of the Trial Design Guild's workflow, passed between all nodes."""
    initial_request: str
    plan: Optional[Dict[str, Any]]
    agent_outputs: List[AgentOutput]
    final_criteria: Optional[str]
    sop: GuildSOP

接著實現 ??planner_agent???,它讀取 SOP 中的 ??planner_prompt?? 并產出結構化計劃(JSON)指導后續 agents:

def planner_agent(state: GuildState) -> GuildState:
    """Receives the initial request and creates a structured plan for the specialist agents."""
    print("--- EXECUTING PLANNER AGENT ---")

    sop = state['sop']

    planner_llm = ll-config['planner'].with_structured_output(schema={"plan": []})
    
    prompt = f"{sop.planner_prompt}\n\nTrial Concept: '{state['initial_request']}'"
    print(f"Planner Prompt:\n{prompt}")
    
    response = planner_llm.invoke(prompt)
    print(f"Generated Plan:\n{json.dumps(response, indent=2)}")
    
    return {**state, "plan": response}

然后實現通用的“檢索型代理”函數 ??retrieval_agent???,供 ??Medical Researcher???、??Regulatory Specialist???、??Ethics Specialist?? 復用:

def retrieval_agent(task_description: str, state: GuildState, retriever_name: str, agent_name: str) -> AgentOutput:
    """A generic agent function that performs retrieval from a specified vector store based on a task description."""
    print(f"--- EXECUTING {agent_name.upper()} ---")
    print(f"Task: {task_description}")
    
    retriever = knowledge_stores[retriever_name]
    
    if agent_name == "Medical Researcher":
        retriever.search_kwargs['k'] = state['sop'].researcher_retriever_k
        print(f"Using k={state['sop'].researcher_retriever_k} for retrieval.")

    retrieved_docs = retriever.invoke(task_description)
    
    findings = "\n\n---\n\n".join([f"Source: {doc.metadata.get('source', 'N/A')}\n\n{doc.page_content}"for doc in retrieved_docs])
    print(f"Retrieved {len(retrieved_docs)} documents.")
    print(f"Sample Finding:\n{findings[:500]}...")
    
    return AgentOutput(agent_name=agent_name, findings=findings)

??Patient Cohort Analyst?? 是最復雜的代理:Text-to-SQL,將自然語言轉為有效 SQL 并在 DuckDB 上執行,給出可招募人群估算:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

defpatient_cohort_analyst(task_description: str, state: GuildState) -> AgentOutput:
    """Estimates cohort size by generating and then executing a SQL query against the MIMIC database."""
    print("--- EXECUTING PATIENT COHORT ANALYST ---")
    
    ifnot state['sop'].use_sql_analyst:
        print("SQL Analyst skipped as per SOP.")
        return AgentOutput(agent_name="Patient Cohort Analyst", findings="Analysis skipped as per SOP.")
    
    con = duckdb.connect(knowledge_stores['mimic_db_path'])
    schema_query = """
    SELECT table_name, column_name, data_type 
    FROM information_schema.columns 
    WHERE table_schema = 'main' ORDER BY table_name, column_name;
    """
    schema = con.execute(schema_query).df()
    con.close()
    
    sql_generation_prompt = ChatPromptTemplate.from_messages([
        ("system", f"You are an expert SQL writer specializing in DuckDB. ... schema:\n{schema.to_string()}\n\nIMPORTANT: All column names ...\n\nKey Mappings:\n- T2DM ... ICD9_CODE '25000'.\n- Moderate renal impairment ... creatinine ... ITEMID 50912 ... VALUENUM 1.5-3.0.\n- Uncontrolled T2D ... HbA1c ... ITEMID 50852 ... VALUENUM > 8.0."),
        ("human", "Please write a SQL query to count the number of unique patients who meet the following criteria: {task}")
    ])
    
    sql_chain = sql_generation_prompt | llm_config['sql_coder'] | StrOutputParser()
    
    print(f"Generating SQL for task: {task_description}")
    sql_query = sql_chain.invoke({"task": task_description})
    sql_query = sql_query.strip().replace("```sql", "").replace("```", "")
    print(f"Generated SQL Query:\n{sql_query}")
    try:
        con = duckdb.connect(knowledge_stores['mimic_db_path'])
        result = con.execute(sql_query).fetchone()
        patient_count = result[0] if result else0
        con.close()
        
        findings = f"Generated SQL Query:\n{sql_query}\n\nEstimated eligible patient count from the database: {patient_count}."
        print(f"Query executed successfully. Estimated patient count: {patient_count}")
    except Exception as e:
        findings = f"Error executing SQL query: {e}. Defaulting to a count of 0."
        print(f"Error during query execution: {e}")
    return AgentOutput(agent_name="Patient Cohort Analyst", findings=findings)

最后是 ??criteria_synthesizer???,將各專家發現匯織為正式的“入排標準(Inclusion/Exclusion Criteria)”文檔。支持在 SOP 中動態切換 ??synthesizer_model??:

def criteria_synthesizer(state: GuildState) -> GuildState:
    """Synthesizes all the structured findings from the specialist agents into the final criteria document."""
    print("--- EXECUTING CRITERIA SYNTHESIZER ---")
    
    sop = state['sop']
    drafter_llm = ChatOllama(model=sop.synthesizer_model, temperature=0.2)

    context = "\n\n---\n\n".join([f"**{out.agent_name} Findings:**\n{out.findings}"for out in state['agent_outputs']])
    
    prompt = f"{sop.synthesizer_prompt}\n\n**Context from Specialist Teams:**\n{context}"
    print(f"Synthesizer is using model '{sop.synthesizer_model}'.")
    
    response = drafter_llm.invoke(prompt)
    print("Final criteria generated.")
    
    return {**state, "final_criteria": response.content}

使用 LangGraph 編排

將以上 agent 節點用 LangGraph 編排:Planner → 專家并行執行 → Synthesizer。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Guild with langgraph (Created by Fareed Khan)

定義“調度節點”,根據 plan 分派任務:

from langgraph.graph import StateGraph, END

defspecialist_execution_node(state: GuildState) -> GuildState:
    """This node acts as a dispatcher, executing all specialist tasks defined in the plan."""
    plan_tasks = state['plan']['plan']
    outputs = []
    
    for task in plan_tasks:
        agent_name = task['agent']
        task_desc = task['task_description']
        
        if"Regulatory"in agent_name:
            output = retrieval_agent(task_desc, state, "fda_retriever", "Regulatory Specialist")
        elif"Medical"in agent_name:
            output = retrieval_agent(task_desc, state, "pubmed_retriever", "Medical Researcher")
        elif"Ethics"in agent_name and state['sop'].use_ethics_specialist:
            output = retrieval_agent(task_desc, state, "ethics_retriever", "Ethics Specialist")
        elif"Cohort"in agent_name:
            output = patient_cohort_analyst(task_desc, state)
        else:
            continue
        
        outputs.append(output)
    return {**state, "agent_outputs": outputs}

構建與編譯 graph:

workflow = StateGraph(GuildState)

workflow.add_node("planner", planner_agent)
workflow.add_node("execute_specialists", specialist_execution_node)
workflow.add_node("synthesizer", criteria_synthesizer)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "execute_specialists")
workflow.add_edge("execute_specialists", "synthesizer")
workflow.add_edge("synthesizer", END)

guild_graph = workflow.compile()
print("Graph compiled successfully.")

可選圖形化略。至此,“Inner Loop” 多代理 RAG 管線搭建完畢。

完整運行公會工作流圖

用 baseline SOP 和真實試驗概念進行端到端測試,驗證 agents、數據存儲與編排邏輯是否協作正常,并產出我們的首個“baseline”輸出,供后續評估與進化環路使用。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Run Workflow (Created by Fareed Khan)

test_request = "Draft inclusion/exclusion criteria for a Phase II trial of 'Sotagliflozin', a novel SGLT2 inhibitor, for adults with uncontrolled Type 2 Diabetes (HbA1c > 8.0%) and moderate chronic kidney disease (CKD Stage 3)."

print("Running the full Guild graph with baseline SOP v1.0...")
graph_input = {
    "initial_request": test_request,
    "sop": baseline_sop
}
final_result = guild_graph.invoke(graph_input)
print("\nFinal Guild Output:")
print("---------------------")
print(final_result['final_criteria'])

輸出日志顯示每個 agent 的執行過程,并最終得到結構良好的入排標準文檔。至此,我們已構建并測試了一套基于真實數據源的多代理 RAG 流水線。

多維度評價體系

一個能自我改進的系統,必須能夠衡量自己的表現。我們需要的不只是單一分數(如 accuracy),而是多維度質量評估。我們將構建一個多維評估套件,對 Guild 輸出在我們最初就確定的“五大支柱”上進行評分。這將為“外層進化環路”提供豐富、可操作的反饋信號。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Multi-dimension Eval (Created by Fareed Khan)

本節計劃:

  • LLM-as-a-Judge:用??llama3:70b?? 構建三個“專家評委”,分別評 Scientific Rigor、Regulatory Compliance、Ethical Soundness。
  • 程序化評估:用兩段快速、可靠、客觀的程序化函數,評 Recruitment Feasibility 與 Operational Simplicity。
  • 匯總評估器:將五個單項評估封裝為一個總評函數,接收 Guild 輸出并生成 5D 性能向量,供 AI Director 決策使用。

為每個參數構建自定義評估器

首先定義 LLM 評委的統一輸出結構:

from langchain_core.pydantic_v1 import BaseModel, Field

class GradedScore(BaseModel):
    """A Pydantic model to structure the output of our LLM-as-a-Judge evaluators."""
    score: float = Field(descriptinotallow="A score from 0.0 to 1.0")
    reasoning: str = Field(descriptinotallow="A brief justification for the score.")
  1. Scientific Rigor:

from langchain_core.prompts import ChatPromptTemplate

def scientific_rigor_evaluator(generated_criteria: str, pubmed_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert clinical scientist. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Supporting Scientific Context:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": pubmed_context})
  1. Regulatory Compliance:

def regulatory_compliance_evaluator(generated_criteria: str, fda_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert regulatory affairs specialist. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Applicable FDA Guidelines:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": fda_context})
  1. Ethical Soundness:

def ethical_soundness_evaluator(generated_criteria: str, ethics_context: str) -> GradedScore:
    evaluator_llm = llm_config['director'].with_structured_output(GradedScore)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are an expert on clinical trial ethics. ..."),
        ("human", "Evaluate the following criteria:\n\n**Generated Criteria:**\n{criteria}\n\n**Ethical Principles:**\n{context}")
    ])
    chain = prompt | evaluator_llm
    return chain.invoke({"criteria": generated_criteria, "context": ethics_context})
  1. Recruitment Feasibility(程序化):

def feasibility_evaluator(cohort_analyst_output: AgentOutput) -> GradedScore:
    findings_text = cohort_analyst_output.findings
    try:
        count_str = findings_text.split("database: ")[1].replace('.', '')
        patient_count = int(count_str)
    except (IndexError, ValueError):
        return GradedScore(score=0.0, reasnotallow="Could not parse patient count from analyst output.")
    
    IDEAL_COUNT = 150.0
    score = min(1.0, patient_count / IDEAL_COUNT)
    reasoning = f"Estimated {patient_count} eligible patients. Score is normalized against an ideal target of {int(IDEAL_COUNT)}."
    return GradedScore(score=score, reasnotallow=reasoning)
  1. Operational Simplicity(程序化):

def simplicity_evaluator(generated_criteria: str) -> GradedScore:
    EXPENSIVE_TESTS = ["mri", "genetic sequencing", "pet scan", "biopsy", "echocardiogram", "endoscopy"]
    test_count = sum(1 for test in EXPENSIVE_TESTS if test in generated_criteria.lower())
    score = max(0.0, 1.0 - (test_count * 0.5))
    reasoning = f"Found {test_count} expensive/complex screening procedures mentioned."
    return GradedScore(score=score, reasnotallow=reasoning)

創建聚合型 LangSmith 評估器

定義總評結果模型與匯總函數:

class EvaluationResult(BaseModel):
    rigor: GradedScore
    compliance: GradedScore
    ethics: GradedScore
    feasibility: GradedScore
    simplicity: GradedScore

def run_full_evaluation(guild_final_state: GuildState) -> EvaluationResult:
    """Orchestrates the entire evaluation process, calling each of the five specialist evaluators."""
    print("--- RUNNING FULL EVALUATION GAUNTLET ---")
    
    final_criteria = guild_final_state['final_criteria']
    agent_outputs = guild_final_state['agent_outputs']
    
    pubmed_context = next((o.findings for o in agent_outputs if o.agent_name == "Medical Researcher"), "")
    fda_context = next((o.findings for o in agent_outputs if o.agent_name == "Regulatory Specialist"), "")
    ethics_context = next((o.findings for o in agent_outputs if o.agent_name == "Ethics Specialist"), "")
    analyst_output = next((o for o in agent_outputs if o.agent_name == "Patient Cohort Analyst"), None)
    
    print("Evaluating: Scientific Rigor...")
    rigor = scientific_rigor_evaluator(final_criteria, pubmed_context)
    print("Evaluating: Regulatory Compliance...")
    compliance = regulatory_compliance_evaluator(final_criteria, fda_context)
    print("Evaluating: Ethical Soundness...")
    ethics = ethical_soundness_evaluator(final_criteria, ethics_context)
    print("Evaluating: Recruitment Feasibility...")
    feasibility = feasibility_evaluator(analyst_output) if analyst_output else GradedScore(score=0, reasnotallow="Analyst did not run.")
    print("Evaluating: Operational Simplicity...")
    simplicity = simplicity_evaluator(final_criteria)
    
    print("--- EVALUATION GAUNTLET COMPLETE ---")
    return EvaluationResult(rigor=rigor, compliance=compliance, ethics=ethics, feasibility=feasibility, simplicity=simplicity)

對 baseline 輸出運行評估,示例結果顯示在“Feasibility”維度明顯偏低(0.39),這為外層 AI Director 指出了明確改進方向。

進化引擎的外層循環

現在構建系統的“大腦”——“AI Research Director”(外層進化回路)。其職責不是設計試驗,而是改進“設計試驗”的過程:分析 5D 評分、診斷根因、智能改寫 GuildSOP。這是系統學習與自適應的核心。

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Outer Loop (Created by Fareed Khan)

本節計劃:

  • 創建“基因池(gene pool)”:管理 SOP 演化版本及其評分,形成可追溯的“基因史”。
  • 設計 Director 級別代理:??Performance Diagnostician??? 識別弱點;??SOP Architect?? 提出改良方案。
  • 架構進化循環:定義完整一代的進化過程:Diagnose → Evolve → Evaluate。
  • 運行一次全流程:展示系統如何自主發現“可行性”弱點并產生新的 SOP 變體修復它。

管理配置

定義 ??SOPGenePool??,存儲 SOP、評分與“父版本”信息:

class SOPGenePool:
    def__init__(self):
        self.pool: List[Dict[str, Any]] = []
        self.version_counter = 0

    defadd(self, sop: GuildSOP, eval_result: EvaluationResult, parent_version: Optional[int] = None):
        self.version_counter += 1
        entry = {
            "version": self.version_counter,
            "sop": sop,
            "evaluation": eval_result,
            "parent": parent_version
        }
        self.pool.append(entry)
        print(f"Added SOP v{self.version_counter} to the gene pool.")
        
    defget_latest_entry(self) -> Optional[Dict[str, Any]]:
        returnself.pool[-1] ifself.pool elseNone

構建主任級智能體(Director-Level Agents)

先是 ??Performance Diagnostician??,分析 5D 向量并給出結構化診斷:

class Diagnosis(BaseModel):
    primary_weakness: Literal['rigor', 'compliance', 'ethics', 'feasibility', 'simplicity']
    root_cause_analysis: str = Field(...)
    recommendation: str = Field(...)

defperformance_diagnostician(eval_result: EvaluationResult) -> Diagnosis:
    print("--- EXECUTING PERFORMANCE DIAGNOSTICIAN ---")
    diagnostician_llm = llm_config['director'].with_structured_output(Diagnosis)
    prompt = ChatPromptTemplate.from_messages([
        ("system", "You are a world-class management consultant ..."),
        ("human", "Please analyze the following performance evaluation report:\n\n{report}")
    ])
    chain = prompt | diagnostician_llm
    return chain.invoke({"report": eval_result.json()})

再是 ??SOP Architect??,根據診斷與當前 SOP 生成多個“變體” SOP 作為候選:

class EvolvedSOPs(BaseModel):
    mutations: List[GuildSOP]

def sop_architect(diagnosis: Diagnosis, current_sop: GuildSOP) -> EvolvedSOPs:
    print("--- EXECUTING SOP ARCHITECT ---")
    architect_llm = llm_config['director'].with_structured_output(EvolvedSOPs)
    prompt = ChatPromptTemplate.from_messages([
        ("system", f"You are an AI process architect. ... schema: {GuildSOP.schema_json()} ..."),
        ("human", "Here is the current SOP:\n{current_sop}\n\nHere is the performance diagnosis:\n{diagnosis}\n\nBased on the diagnosis, please generate 2-3 new, improved SOPs.")
    ])
    chain = prompt | architect_llm
    return chain.invoke({"current_sop": current_sop.json(), "diagnosis": diagnosis.json()})

運行完整的進化循環

封裝一次完整的進化循環:

def run_evolution_cycle(gene_pool: SOPGenePool, trial_request: str):
    print("\n" + "="*25 + " STARTING NEW EVOLUTION CYCLE " + "="*25)
    
    current_best_entry = gene_pool.get_latest_entry()
    parent_sop = current_best_entry['sop']
    parent_eval = current_best_entry['evaluation']
    parent_version = current_best_entry['version']
    print(f"Improving upon SOP v{parent_version}...")
    
    diagnosis = performance_diagnostician(parent_eval)
    print(f"Diagnosis complete. Primary Weakness: '{diagnosis.primary_weakness}'. Recommendation: {diagnosis.recommendation}")

    new_sop_candidates = sop_architect(diagnosis, parent_sop)
    print(f"Generated {len(new_sop_candidates.mutations)} new SOP candidates.")
    for i, candidate_sop inenumerate(new_sop_candidates.mutations):
        print(f"\n--- Testing SOP candidate {i+1}/{len(new_sop_candidates.mutations)} ---")
        guild_input = {"initial_request": trial_request, "sop": candidate_sop}
        final_state = guild_graph.invoke(guild_input)
        
        eval_result = run_full_evaluation(final_state)
        gene_pool.add(sop=candidate_sop, eval_result=eval_result, parent_versinotallow=parent_version)
    print("\n" + "="*25 + " EVOLUTION CYCLE COMPLETE " + "="*26)

初始化基因池、加入 baseline、運行一輪進化。示例輸出顯示:診斷識別“Feasibility”為主要弱項;Architect 生成兩個候選 SOP;測試后某個候選(v2)顯著提升 Feasibility(例如 0.81),且僅以輕微 Rigor 代價換取巨大實際可行性收益;另一個候選(v3)則未帶來改進。

基于五維的帕累托分析

進化循環完成一代。現在需要對結果進行多目標優化分析。在多目標問題中往往不存在單一“最好”解,而是存在“帕累托前沿(Pareto Frontier)”。目標是識別這一前沿并呈現給人類決策者。

本節計劃:

  • 分析基因池:打印所有 SOP 及其 5D 評分的摘要,以觀察變體的直接影響。
  • 識別 Pareto Front:編寫函數程序化識別基因池中的非支配解(non-dominated solutions)。
  • 可視化前沿:用并行坐標圖(parallel coordinates plot)展示 5D 維度的權衡,讓 trade-off 一目了然。

打印摘要略。然后識別 Pareto 前沿:

import numpy as np

defidentify_pareto_front(gene_pool: SOPGenePool) -> List[Dict[str, Any]]:
    pareto_front = []
    pool_entries = gene_pool.pool
    
    for i, candidate inenumerate(pool_entries):
        is_dominated = False
        cand_scores = np.array([s['score'] for s in candidate['evaluation'].dict().values()])
        
        for j, other inenumerate(pool_entries):
            if i == j: continue
            other_scores = np.array([s['score'] for s in other['evaluation'].dict().values()])
            if np.all(other_scores >= cand_scores) and np.any(other_scores > cand_scores):
                is_dominated = True
                break
        ifnot is_dominated:
            pareto_front.append(candidate)
    return pareto_front

運行后通常得到 v1 與 v2 組成帕累托前沿:v1 為“最大化 Rigor”的策略;v2 為“高 Feasibility”的策略。在實際決策中,如何取舍取決于業務優先級。

識別帕累托前沿

使用 2D 散點圖(Rigor vs. Feasibility)與 5D 并行坐標圖可視化:

import matplotlib.pyplot as plt
import pandas as pd

defvisualize_frontier(pareto_sops):
    ifnot pareto_sops:
        print("No SOPs on the Pareto front to visualize.")
        return
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 7))
    
    labels = [f"v{s['version']}"for s in pareto_sops]
    rigor_scores = [s['evaluation'].rigor.score for s in pareto_sops]
    feasibility_scores = [s['evaluation'].feasibility.score for s in pareto_sops]
    
    ax1.scatter(rigor_scores, feasibility_scores, s=200, alpha=0.7, c='blue')
    for i, txt inenumerate(labels):
        ax1.annotate(txt, (rigor_scores[i], feasibility_scores[i]), xytext=(10,-10), textcoords='offset points', fnotallow=14)
    ax1.set_title('Pareto Frontier: Rigor vs. Feasibility', fnotallow=16)
    ax1.set_xlabel('Scientific Rigor Score', fnotallow=14)
    ax1.set_ylabel('Recruitment Feasibility Score', fnotallow=14)
    ax1.grid(True, linestyle='--', alpha=0.6)
    ax1.set_xlim(min(rigor_scores)-0.05, max(rigor_scores)+0.05)
    ax1.set_ylim(min(feasibility_scores)-0.1, max(feasibility_scores)+0.1)

    data = []
    for s in pareto_sops:
        eval_dict = s['evaluation'].dict()
        scores = {k.capitalize(): v['score'] for k, v in eval_dict.items()}
        scores['SOP Version'] = f"v{s['version']}"
        data.append(scores)
    
    df = pd.DataFrame(data)
    pd.plotting.parallel_coordinates(df, 'SOP Version', colormap=plt.get_cmap("viridis"), ax=ax2, axvlines_kwargs={"linewidth": 1, "color": "grey"})
    ax2.set_title('5D Performance Trade-offs on Pareto Front', fnotallow=16)
    ax2.grid(True, which='major', axis='y', linestyle='--', alpha=0.6)
    ax2.set_ylabel('Normalized Score', fnotallow=14)
    ax2.legend(loc='lower center', bbox_to_anchor=(0.5, -0.15), ncol=len(labels))
    plt.tight_layout()
    plt.show()

渲染結果直觀展示 v1 與 v2 在各維的差異:兩者在 Compliance、Ethics、Simplicity 上幾乎一致,只在 Rigor 與 Feasibility 上形成明顯權衡(典型“交叉”形態)。

可視化前沿并做出決策

我們已經從宏觀層面(進化、帕累托前沿)看到了系統如何自我改進。現在從微觀層面理解一次“高表現”運行的內部過程:agents 如何協作?瓶頸在哪里?多維得分如何轉化為可視化剖面?

構建一套可自我改進的 Agentic RAG 系統-AI.x社區

Understand the Workflow (Created by Fareed Khan)

計劃:

  • 對工作流加儀表(instrumentation):精確記錄每個 agent 的開始/結束/耗時。
  • 可視化執行時間線:用甘特圖(Gantt chart)呈現工作流,顯示并行與串行階段。
  • 用雷達圖(Radar Chart)對比 baseline 與 evolved SOP 的 5D 表現剖面。

理解認知工作流

使用 graph 的 ??.stream()?? 方法逐節點獲取事件,記錄時間戳:

import time
from collections import defaultdict


definvoke_with_timing(graph, sop, request):
    """Invokes the Guild graph while capturing start and end times for each node."""
    print(f"--- Instrumenting Graph Run for SOP: {sop.dict()} ---")
    
    timing_data = []
    start_times = defaultdict(float)
    
    graph_input = {"initial_request": request, "sop": sop}
    
    for event in graph.stream(graph_input, stream_mode="values"):
        node_name = list(event.keys())[0]
        end_time = time.time()
        
        if node_name notin start_times:
            start_times[node_name] = end_time - 0.1
        
        start_time = end_time - duration
        timing_data.append({
            "node": node_name,
            "start_time": start_time,
            "end_time": end_time,
            "duration": duration
        })
        start_times[node_name] = start_time
    overall_start_time = min(d['start_time'] for d in timing_data)
    for data in timing_data:
        data['start_time'] -= overall_start_time
        data['end_time'] -= overall_start_time
        
    final_state = event[list(event.keys())[-1]]
    return final_state, timing_data

對 v2 執行并捕獲時序數據(示例輸出顯示 ??execute_specialists?? 是主要耗時階段,符合預期)。

繪制甘特圖:

import matplotlib.pyplot as plt

def plot_gantt_chart(timing_data: List[Dict[str, Any]], title: str):
    """Plots a Gantt chart of the agentic workflow from timing data."""
    fig, ax = plt.subplots(figsize=(12, 4))
    
    labels = [d['node'] for d in timing_data]
    ax.barh(labels, [d['duration'] for d in timing_data], left=[d['start_time'] for d in timing_data], color='skyblue')
    
    ax.set_xlabel('Time (seconds)')
    ax.set_title(title, fnotallow=16)
    ax.grid(True, which='major', axis='x', linestyle='--', alpha=0.6)
    ax.invert_yaxis()
    plt.show()

甘特圖清晰展示了串行的頂層流程與內部并行機會,提示性能優化應聚焦 ??execute_specialists?? 階段。

使用雷達圖剖析輸出結果

用雷達圖對比 baseline v1 與 evolved v2 的 5D 剖面:

import pandas as pd


defplot_radar_chart(eval_results: List[Dict[str, Any]], labels: List[str]):
    """Creates a radar chart to compare the 5D performance of multiple SOPs."""
    
    categories = ['Rigor', 'Compliance', 'Ethics', 'Feasibility', 'Simplicity']
    num_vars = len(categories)
    angles = np.linspace(0, 2 * np.pi, num_vars, endpoint=False).tolist()
    angles += angles[:1]
    fig, ax = plt.subplots(figsize=(8, 8), subplot_kw=dict(polar=True))
    for i, result inenumerate(eval_results):
        values = [res.score for res in result.dict().values()]
        values += values[:1]
        ax.plot(angles, values, linewidth=2, linestyle='solid', label=labels[i])
        ax.fill(angles, values, alpha=0.25)

    ax.set_yticklabels([])
    ax.set_xticks(angles[:-1])
    ax.set_xticklabels(categories, fnotallow=12)
    ax.set_title('5D Performance Profile Comparison', size=20, color='blue', y=1.1)
    plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    plt.show()

圖中可見兩者在 Compliance、Ethics、Simplicity 上都很強;v1 在 Rigor 略優,而 v2 在 Feasibility 顯著優越,清晰呈現 trade-off。

自主策略

我們已設計、構建并演示了一套可自我改進的 agentic 系統。這不僅是一個解決方案,更是一套可擴展的基礎架構:分層代理設計、動態 SOP、多維評估、自動進化。這些原則打開了廣闊的未來空間:

  1. 持續運行進化循環:當前完成一代,未來可連續迭代數百代,以發現更豐富、更多樣的 Pareto Frontier(經過實戰檢驗的 SOP)。
  2. 將 Director 的推理蒸餾為更小的策略模型:基于成功變體的歷史進行訓練,用更快、更便宜的專用模型替換 70B Director,使進化更高效。
  3. 讓 AI Director 動態改變 Guild 的結構:根據試驗概念的需求,學習增刪專家(如新增“Biostatistician”),實現團隊層面的進化。
  4. 用實時 API 替換靜態 MIMIC-III:將??Patient Cohort Analyst?? 連接到安全的實時 EHR 系統,使可行性評估基于最新患者數據。
  5. 強化??SOP Architect?? 的進化操作符:引入“crossover”等機制,融合不同成功 SOP 的優勢,加速新策略發現。
  6. 融合人類專家反饋:將臨床科學家的評分接入評估回路,用專家判斷作為最終“獎勵信號”,引導系統趨向“技術最優 + 實踐卓越”的方案。

原文地址:???https://medium.com/gitconnected/building-a-self-improving-agentic-rag-system-f55003af44c4??

本文轉載自??PyTorch研習社??,作者:AI研究生

已于2025-11-24 00:11:22修改
收藏
回復
舉報
回復
相關推薦
亚洲综合五月天| 午夜精品一区二区三区在线| 国产精品igao网网址不卡| 久久电影网站| 国产日韩欧美高清| 成人夜晚看av| 天天综合天天干| 婷婷综合激情| 日韩精品免费看| 成人不卡免费视频| 亚洲优女在线| 亚洲六月丁香色婷婷综合久久| 国产一区免费在线观看| 在线观看免费观看在线| 亚洲日本欧美| 大量国产精品视频| 欧美黄色一级生活片| 亚洲精品黑牛一区二区三区| 欧美性大战xxxxx久久久| 丰满的少妇愉情hd高清果冻传媒 | www.毛片com| 牛牛精品成人免费视频| 777奇米成人网| 那种视频在线观看| 黑人玩欧美人三根一起进| 中文字幕精品—区二区四季| 韩国成人av| 国产精品一区二区av白丝下载 | 欧美一级淫片videoshd| 欧美手机在线观看| 精品视频97| 日韩av在线网| 亚洲少妇中文字幕| 国产精品亚洲欧美日韩一区在线 | 97视频免费在线| 视频一区二区不卡| 欧美又大粗又爽又黄大片视频| 欧美丰满艳妇bbwbbw| 日韩综合网站| 亚洲欧洲午夜一线一品| 中文在线永久免费观看| 视频欧美一区| 91精品中文字幕一区二区三区| 爆乳熟妇一区二区三区霸乳| 桃色av一区二区| 亚洲高清久久久| 无码人妻精品一区二区蜜桃百度| a√资源在线| 中文乱码免费一区二区| 日本在线观看一区| 嫩草精品影院| 久久久精品中文字幕麻豆发布| 国产一区二区三区免费不卡| 日本高清视频免费看| 国产成人精品免费在线| 99久热re在线精品996热视频| 国产又粗又黄视频| 韩国理伦片一区二区三区在线播放| 国产精品久久久久久网站| 中文字幕777| 久久精品久久综合| 亚洲va久久久噜噜噜| 国产免费无遮挡| 国产传媒欧美日韩成人| 国产丝袜不卡| 理论视频在线| 国产精品免费视频观看| 这里只有精品66| 美女隐私在线观看| 亚洲激情一二三区| 加勒比成人在线| 瑟瑟视频在线看| 91福利资源站| 久热在线视频观看| 亚洲高清在线一区| 亚洲国产欧美一区二区三区同亚洲| 国产精品一区二区无码对白| 视频小说一区二区| 一区二区欧美久久| 黄色片子在线观看| 日韩五码在线| 国产精品扒开腿做爽爽爽视频| 在线观看毛片网站| 国产成人精品免费视频网站| 美媛馆国产精品一区二区| 成年人视频免费在线观看| 亚洲天堂免费看| 玩弄中年熟妇正在播放| 日韩欧美一区二区三区免费观看 | 91网址在线观看精品| 99精品在免费线中文字幕网站一区| 亚洲国产精品悠悠久久琪琪| 级毛片内射视频| 影视一区二区| 秋霞午夜一区二区| 国产亲伦免费视频播放| yourporn久久国产精品| 亚洲欧洲精品一区二区| 精灵使的剑舞无删减版在线观看| 色网综合在线观看| 三级黄色片免费观看| 伦理一区二区| 久久精品在线视频| 日韩毛片一区二区三区| 国产在线不卡视频| 蜜桃成人在线| 18+视频在线观看| 欧美性猛交xxxx富婆| 亚欧美一区二区三区| 欧美日韩播放| 色综合久久久久久中文网| 欧美一级做a爰片免费视频| 国产福利一区二区三区视频 | аⅴ资源新版在线天堂| 亚洲一区日韩精品中文字幕| 91看片在线免费观看| 久久久久久毛片免费看| 欧美裸体xxxx极品少妇| 久久国产香蕉视频| 91免费在线看| 日本欧美视频在线观看| 婷婷精品久久久久久久久久不卡| 日韩国产欧美精品在线| 久久久久久久9999| 国产一区二区免费视频| 亚洲a∨一区二区三区| 天堂网在线最新版www中文网| 7777精品伊人久久久大香线蕉| 全黄一级裸体片| 亚洲美女毛片| 91pron在线| 久久久久久久久免费视频| 91福利社在线观看| 一二三不卡视频| 亚洲激情黄色| 俄罗斯精品一区二区| av大片在线| 678五月天丁香亚洲综合网| 日本污视频网站| 久久aⅴ国产紧身牛仔裤| 久久久久久久久久久久久9999| 91超碰国产在线| 精品人在线二区三区| 日本一级二级视频| 精品无码三级在线观看视频| 伊人狠狠色丁香综合尤物| 超碰这里只有精品| 中文字幕亚洲无线码a| 精品国产xxx| 久久久久久**毛片大全| 国产亚洲天堂网| 国产探花在线精品一区二区| 国产精品第一页在线| 国产原创av在线| 欧美三级三级三级爽爽爽| 特级西西人体高清大胆| 麻豆国产精品一区二区三区| 最新不卡av| 精品999日本久久久影院| 久精品免费视频| www.五月婷婷| 激情av一区二区| 中文字幕一二三四区| 久久亚洲视频| 亚洲精品久久区二区三区蜜桃臀| 自拍偷拍亚洲| 欧美国产日韩在线| 熟妇高潮一区二区高潮| 欧美三级免费观看| 1024手机在线观看你懂的| 久久av中文字幕片| 喜爱夜蒲2在线| 久久动漫网址| 国产成人综合精品在线| 福利在线播放| 欧美一区二区三区人| 久久久久黄色片| 久久精品一区二区三区不卡牛牛 | 色一情一乱一伦| 中文天堂在线一区| 中文字幕一二三| 亚洲在线电影| 宅男噜噜99国产精品观看免费| **爰片久久毛片| 日韩女在线观看| free性欧美hd另类精品| 亚洲精品福利免费在线观看| 亚洲精品一区二三区| 亚洲精品中文在线| 亚洲一区二区三区综合| 麻豆国产91在线播放| www插插插无码免费视频网站| 性欧美xxxx免费岛国不卡电影| 国产精品一区电影| av资源中文在线天堂| 中文字幕欧美日韩精品| 日本久久一级片| 欧美电影一区二区三区| 亚洲黄色激情视频| 一区二区三区免费看视频| 蜜桃av免费看| 成人综合在线视频| 国产免费又粗又猛又爽| 亚洲神马久久| 日日噜噜噜夜夜爽爽| 亚洲精品国模| 国产精品免费一区二区三区| 欧美高清免费| 青草成人免费视频| 俺来也官网欧美久久精品| 最近日韩中文字幕中文| 偷拍自拍在线| 欧美mv日韩mv| 国产精品羞羞答答在线| 欧美日韩在线播放一区| 亚洲第一精品在线观看| 亚洲精品久久7777| 美国一级片在线观看| 久久精子c满五个校花| 香蕉视频污视频| 国产一区二区伦理| 一本岛在线视频| 久久久久久久高潮| 国产视频九色蝌蚪| 在线成人国产| 特级西西人体www高清大胆| 日韩免费高清| 日韩免费av电影| 免费黄色成人| 久久婷婷国产综合尤物精品| av不卡一区二区| 国产91视觉| 亚洲乱码一区| y111111国产精品久久婷婷| 久久国产精品免费一区二区三区| 国产精品日韩专区| 亚洲涩涩在线| 人体精品一二三区| 亚洲小少妇裸体bbw| 欧美亚洲另类视频| 欧美片第一页| 国产999视频| 韩国女主播一区二区| 国产精品福利在线观看网址| 日日av拍夜夜添久久免费| 日本一区二区不卡| japanese23hdxxxx日韩| 日产精品久久久一区二区福利| 午夜欧美激情| 国产成人精品在线| 欧美日韩精品一区二区三区视频| 国产精品69久久| 国产亚洲人成a在线v网站| 国产精品久久久av| 四虎影视国产精品| 92国产精品久久久久首页| 2021年精品国产福利在线| 国产精品麻豆免费版| 亚洲97av| 色综合久久88色综合天天提莫| 日韩免费在线| 亚洲小说欧美另类激情| 欧美a级一区| 日日碰狠狠添天天爽超碰97| 亚洲综合日本| 欧美成人黄色网址| 国产电影精品久久禁18| 欲求不满的岳中文字幕| 国产午夜一区二区三区| 国精产品久拍自产在线网站| 成人欧美一区二区三区视频网页 | a级在线观看视频| 久久久久久久久久久久久夜| 一级黄色录像毛片| 亚洲精品免费在线观看| 奇米影视第四色777| 在线免费观看日本一区| av一区二区三| 精品小视频在线| 亚洲乱亚洲乱妇| 欧美成人精品在线观看| 免费成人在线电影| 国产男人精品视频| www.久久东京| 手机在线观看国产精品| 亚洲欧美一级二级三级| 高清在线观看免费| 精品一区免费av| 青青草成人免费视频| 国产精品毛片高清在线完整版| 免费中文字幕在线观看| 色婷婷综合久久久久中文一区二区| 91成人在线免费| 亚洲精品久久在线| 久草中文在线| 国产99久久精品一区二区永久免费| 粉嫩av国产一区二区三区| 久久国产一区二区| 91精品蜜臀一区二区三区在线| 欧美a在线视频| 国产又黄又大久久| 一区二区黄色片| 亚洲第一福利一区| 国产一区二区女内射| 亚洲久久久久久久久久久| 国产美女av在线| 国产精品 欧美在线| 精品深夜福利视频| mm131午夜| 看国产成人h片视频| 日本黄色特级片| 亚洲国产一区二区三区| 在线观看免费视频a| 日韩久久精品成人| 免费看电影在线| 亚洲综合精品伊人久久| 欧美日韩黑人| 农村妇女精品一二区| 成人永久aaa| 放荡的美妇在线播放| 国产亚洲人成a在线v网站| 日韩av在线高清| 黄色动漫在线| 日韩美女免费线视频| 大桥未久女教师av一区二区| 一区二区免费电影| 天堂在线亚洲视频| 黄色网址在线视频| 亚洲综合自拍偷拍| 91麻豆成人精品国产免费网站| 日韩精品免费观看| 国产精品电影| 国产精品视频一区二区三区经| 日韩一区自拍| 超碰在线97免费| 久久精品一区二区三区不卡牛牛| 九九热在线视频播放| 亚洲精品一区在线观看| 欧美人体视频xxxxx| 51国偷自产一区二区三区的来源| 国产精品久久占久久| 色播五月激情五月| 国产精品国模大尺度视频| 中文字幕欧美色图| 中文字幕在线观看日韩| 国产香蕉久久| 综合久久国产| 国产精品一区免费在线观看| 免费人成在线观看| 精品国产一区二区三区久久影院| 中文国产字幕在线观看| 亚洲综合中文字幕在线| 国内精品久久久久久久影视蜜臀 | 免费成人动漫| 日本一区二区三区www| 日韩电影免费在线看| 中文字幕第69页| 日韩一区二区视频| 七七成人影院| 国产欧美一区二区三区另类精品 | 成人欧美一区二区三区| 91亚洲精品国偷拍自产在线观看| 久久久999国产精品| 亚洲超碰在线观看| 欧美极品欧美精品欧美| 久久久一区二区| 亚洲一级黄色大片| 欧美老女人性视频| 看全色黄大色大片免费久久久| 国产精品人人妻人人爽人人牛| 中文天堂在线一区| www香蕉视频| 1769国内精品视频在线播放| 精品视频黄色| 亚洲女人在线观看| 精品久久久国产| 在线观看免费网站黄| av日韩免费电影| 久久狠狠一本精品综合网| 网爆门在线观看| 欧美mv日韩mv亚洲| av一区在线| 17c丨国产丨精品视频| 久久色.com| 999av视频| 日av在线播放中文不卡| 希岛爱理一区二区三区| 国产精品久久久久久亚洲av| 91成人看片片| 免费在线观看的电影网站| 日韩精品一区二区三区四区五区| 国产伦精品一区二区三区视频青涩| 91久久国产视频| 日韩中文字在线| 色婷婷精品视频| 在线视频观看一区二区| 黑人精品xxx一区一二区| 国产色在线观看| 日韩av图片| 不卡av在线免费观看|