如何使用 Knowledge Graph 和 LLM 構建問答系統

發布于 2025-11-18 07:37

瀏覽

0收藏

如何使用 Knowledge Graph 和 LLM 構建問答系統-AI.x社區基于模擬 FAQ 文檔構建的 knowledge graph

今天我帶來一個基于 knowledge graph（用第一部分介紹的方法構建）和 LLM（這里用的是 Gemma3–4b-it-qat，與之前相同）的簡易 Question Answer (QA) System。我選擇使用 Gemma3–4b，是因為它體量足夠小，可以在普通筆記本上運行，同時又非常擅長遵循指令。

在這篇博客中，我們會用一個關于假想智能手機的 FAQ 文本作為樣例，利用上一篇文章中的代碼（同一倉庫）為其生成 knowledge graph，然后搭建一個系統來回答與該產品相關的問題，就像這樣：

如何使用 Knowledge Graph 和 LLM 構建問答系統-AI.x社區

一些問答示例

在本文中，我們將學習：

什么是 QA System
了解我們的思路
查看代碼
討論局限與改進方向

什么是 QA System

引用 Google 的定義：

A question answering (QA) system is a software application that takes a user's question in natural language and provides a direct, relevant answer by processing the question's intent and retrieving information from a knowledge source or generating a new response.

在本文中，我們的 “knowledge source” 是我用 Gemma3 生成的一段模擬 FAQ 文本，你可以在這篇博客的 Github 倉庫中找到。我們可以運行倉庫的 main.py 來構建 KG 并將其保存到輸出目錄：

python main.py --inputpath ./input/sample-faq.txt --outlabel faq

該命令會將 networkx graph 保存為文件 “nx_graph.pkl”，稍后在構建 QA system 時會加載它。

我們的思路

核心思路是：從用戶問題中抽取 entities/keywords，找到與它們相關的所有 nodes 與 edges，然后將這些信息與問題一起提供給 LLM，讓其基于 KG 中的信息進行回答。我們把方案正式化如下。

給定一個問題 q，以及基于任意語料構建的 knowledge graph (G)：

使用 LLM 從 q 中抽取 named entities（entity_keywords）和 relations/predicates（relation_keywords）。
從 entity_keywords 中枚舉所有可能的成對組合。這使我們之后可以用 source 和 target 參數查詢圖，因為我們無法預先假設某個 entity 是 source 還是 target。
對于第 2 步得到的每一對 (u, v)，在 G 中查找 u 與 v 之間的所有路徑。這樣可以找出兩個實體之間的所有 relations/paths/knowledge。引入這一步是個 game-changer。
對于找到的每條路徑（source 與 target node），提取它們之間的 relation。例如 (box, include, charger)。
將形成的 “triple” 加入列表 relations。
對 relation_keywords 做類似處理。對于每個 relation r，找到所有由 r 連接的 edges，形成 triples 并加入同一列表 relations。
最后一步，將這些 triples 與問題 q 一起（封裝在一個 prompt 中）傳給 LLM，讓其基于給定事實（triples）與查詢生成答案。

代碼實現

首先，用 main.py 構建 knowledge graph：

python main.py --inputpath ./input/sample-faq.txt --outlabel faq

然后，從上一步生成的 pickle 文件中加載 graph：

import pickle

G = pickle.load(open(graph_file, "rb"))

我們需要定義一個函數，能夠接收文本輸入與 system level prompt，并從 LLM 獲取響應。下面這個可復用函數實現了該功能：

def get_llm_response(text, system_prompt):
    response = ollama.chat(model=model, messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": text}
        ])
    resp_content = response['message']['content']
    return resp_content

接下來需要從給定 query 中抽取 entities 和 relations（對應上文的第 1 步）。我構造了一個基礎的 system prompt，如下所示：

system_prompt_key_words = """You are a helpful assistant, expert of English language who can extracts keyword from the given question in root form (e.g. ran becomes run) and lowercase.
The returned keywords should be critical to answer the question.
Categorize the keywords into entity and relation keywords.
keywords must be in root form and lowercase.
The response should be in following format, no additional text:
{"entity": [list of entity keywords], "relation": [list of relation keywords]}"""

response = get_llm_response(query, system_prompt_key_words)
keyword_resp = json.loads(response)
entity_keywords = keyword_resp.get('entity', [])
relation_keywords = keyword_resp.get('relation', [])

假設 entity_keywords 為 [box, charger, phone]，我們需要找出所有可能的 source-target 成對組合，以便對圖做全面查詢：

pairs = list(combinations(entities, 2))

對每個實體對，我們需要在圖中查找所有 nodes 與 edges：

paths = list(nx.all_simple_paths(G, source=u, target=target_nodes))

上述步驟（第 1–6 步，含以上代碼）都實現在下面這個函數中：

def search_kg2(G, query):
    response = get_llm_response(query, system_prompt_key_words)
    keyword_resp = json.loads(response)
    entity_keywords = keyword_resp.get('entity', [])
    relation_keywords = keyword_resp.get('relation', [])
    entities = [part.strip() for part in entity_keywords]
    pairs = list(combinations(entities, 2))
    relations = []
    for u, v in pairs:
        target_nodes = get_nodes(G, v)
        paths = list(nx.all_simple_paths(G, source=u, target=target_nodes))
        for path in paths:
            for i inrange(len(path)-1):
                for key in G[path[i]][path[i+1]]:
                    rel = G[path[i]][path[i+1]][key]['relation']
                    relations.append((path[i],rel, path[i+1]))

    for rel_keyword in relation_keywords:
        relations.extend([(u, rel, v) for u, v, rel in G.edges.data("relation") ifstr(rel) == rel_keyword])
    
    return relations

當我們從上述函數得到所有用 triples（entity->relation->entity）表示的 edges 后，就將這些 triples 嵌入到一個指令式 prompt 中傳給 LLM：

context = f"""
    You are given facts from a knowledge graph:

    {triples}

    Answer the user query based ONLY on these facts.
    Answer in full sentence.
    Query: {query}
    """
response = ollama.chat(model="gemma3:4b-it-qat",
             messages=[{"role": "user", "content": context}])
print(f'query: {query}\nAnswer:{response["message"]["content"]}')

它會返回如下這類答案：

如何使用 Knowledge Graph 和 LLM 構建問答系統-AI.x社區

如你所見，對于缺乏相關數據/事實的問題，LLM 會合理地拒絕作答。

本文所有代碼可在文件

??https://github.com/nayash/knowledge-graph-demo/blob/master/qa-from-kg.ipynb??

中找到。

局限性

如上所示，我們幾乎沒費太多力氣就創建了一個基礎的 QA system，因為 LLM 負責了大量工作，比如文本預處理、抽取等。但它還不完美。在我的初步評估中，至少發現了幾個問題。

系統無法回答 “what is the warranty period?”，因為在圖里 warranty 是 relation 的 label，但它從問題中被抽取成了 named entity，導致系統找不到任何 edge。因此，我們用于構建 knowledge graph 的核心 system prompt 還需要優化。

還有一些問題需要稍微改寫后系統才能回答。但我發現這類問題最終都可以歸因于 KG 的構建方式或從 query 中抽取的 keyword。兩者都可以通過改進 prompts 來修復。比如，我使用的圖中有一條 edge 是：

phone → support_dual_sim → nano sim

這顯然并不理想。但這些都可以通過更謹慎地設計用于構建 KG 的 prompt 來修正。正如我在上一篇文章中提到的，我最初是用 chatGPT 生成的 prompt，并在此基礎上略作修改。在真實生產場景中，應當花更多時間打磨 prompt。對于企業級應用，還可以嘗試更大的 model，因為資源限制不再是問題。

總之，這種方法很有前景，并且可以與 RAG 結合來提升回答質量。或許在下一篇博客中，我會用更精細的 prompts 與 RAG 繼續改進這個系統。

本文轉載自??AI大模型觀察站??，作者：AI研究生

標簽

Knowledge

Graph

LLM

已于2025-11-18 09:41:14修改

贊

回復