多文档代理式 RAG 工作流程

发布日期：2024-05-31 07:23:23 浏览次数： 4573

作者：知觉之门

微信搜一搜，关注“知觉之门”

多文档代理式 RAG 工作流程

导言

大型语言模型 (LLM) 彻底改变了我们从海量文本数据中提取见解的方式。在财务分析领域，LLM 应用程序正在被设计用来帮助分析师回答有关公司业绩、收益报告和市场趋势的复杂问题。

其中一个应用涉及使用检索增强生成 (RAG) 管道来促进从财务报表和其他来源中提取信息。

假设财务分析师想要了解公司第二季度财报电话会议的主要内容，特别是关注公司正在建立的技术壁垒。这类问题超越了简单的查找，需要更复杂的方法。这就是 LLM 代理的概念发挥作用的地方。

什么是代理？

根据 Llama-Index 的说法，“代理”是一个自动推理和决策引擎。它接收用户输入/查询，并可以做出内部决策来执行该查询，以便返回正确的结果。代理的关键组件可能包括但不限于：

将复杂问题分解成更小的问题
选择要使用的外部工具 + 想出调用工具的参数
规划一组任务
将先前完成的任务存储在内存模块中

LLM 代理是一个系统，它结合了各种技术，如规划、定制焦点、内存利用和使用不同的工具来回答复杂问题。

让我们分解如何开发 LLM 代理来回答上述问题：

规划：_ LLM 代理首先需要了解问题的性质并创建一个提取相关信息的计划。这涉及识别关键术语，如“第二季度财报电话会议”和“技术壁垒”，并确定收集这些信息的最佳来源。_
定制焦点：_ 然后，LLM 代理将注意力集中在与技术壁垒相关的问题的特定方面。这涉及过滤掉不相关的信息，并专注于最适合分析师查询的细节。_
内存：_ LLM 代理利用其内存从过去的财报电话会议、公司报告和其他来源中回忆相关信息。这有助于提供背景信息来支持其分析。_
使用不同的工具：_ LLM 代理利用各种工具和技术来提取和分析信息。这可能包括自然语言处理 (NLP) 算法、情绪分析和主题建模，以更深入地了解财报电话会议。_
将复杂问题分解：_ 最后，LLM 代理将复杂问题分解成更简单的子部分，使其更容易提取相关信息并提供连贯的答案。_

来源：代理的通用组件

工具调用

在标准 RAG 中，LLM 主要用于信息合成。

另一方面，工具调用在 RAG 管道之上增加了一层查询理解层，使用户能够提出复杂的查询并获得更精确的结果。这使 LLM 能够弄清楚如何使用向量数据库，而不仅仅是使用其输出。

工具调用使 LLM 能够通过动态接口与外部环境交互，在该接口中，工具调用不仅有助于选择合适的工具，而且还能推断执行所需的论点。因此，与标准 RAG 相比，它能够更好地理解请求并生成更好的响应。

代理推理循环

如果用户提出一个包含多个步骤的复杂问题，或者提出一个需要澄清的模糊问题怎么办？代理推理循环在这种情况下发挥作用。代理能够在工具上进行推理，经过多个步骤，而不是一次性调用。

来源：llama index

代理架构

在 LlamaIndex 中，代理包含两个组件：

AgentRunner
AgentWorkers

AgentRunner 对象与 AgentWorkers 交互。

AgentRunners 是协调器，它们存储：

状态
会话记忆
创建任务
维持任务
为每个任务运行步骤
展示面向用户的、高级的用户界面

AgentWorkers负责：

选择和使用工具
选择 LLM 来使用工具。

来源：Llama-Index

调用代理查询允许以一次性方式查询代理，但不会保留状态。这就是内存方面发挥作用的地方，它用于维护对话历史记录。在这里，代理将聊天历史记录维护到一个对话记忆缓冲区中。默认情况下，记忆缓冲区是一个扁平的项目列表，它是一个滚动缓冲区，具体取决于 LLM 的上下文窗口大小。因此，当代理决定使用工具时，它不仅使用当前聊天，而且还使用以前的对话历史记录来执行下一组操作。

在这里，我们将构建一个多文档代理来处理多个文档。在这里，我们已经在 3 个文档上实现了代理式 RAG，同样的方法可以扩展到更多文档。

使用的技术栈

Llama-Index：LlamaIndex 是面向上下文增强 LLM 应用程序的数据框架。
Mistral API：开发人员可以通过其 API 与 Mistral 交互，这与 OpenAI 的 API 系统类似。

Mistral Large 带来了新的功能和优势：

它在英语、法语、西班牙语、德语和意大利语中具有天然的流畅性，对语法和文化背景有细致的理解。
它 32K 个令牌的上下文窗口允许从大型文档中精确地调用信息。
它精确的指令遵循功能使开发人员能够设计自己的审核策略——我们使用它来设置 Le Chat 的系统级审核。
它天生具有函数调用功能。

代码实现

代码使用 google colab 实现。

安装所需依赖项：

 
%%writefile requirements.txt
llama-index
llama-index-llms-huggingface
llama-index-embeddings-fastembed
fastembed
Unstructured[md]
chromadb
llama-index-vector-stores-chroma
llama-index-llms-groq
einops
accelerate
sentence-transformers
llama-index-llms-mistralai
llama-index-llms-openai

 
!pip install -r requirements.txt

下载要处理的文档：

 
!mkdir data

! wget "https://arxiv.org/pdf/1810.04805.pdf" -O ./data/BERT_arxiv.pdf
! wget "https://arxiv.org/pdf/2005.11401" -O ./data/RAG_arxiv.pdf
! wget "https://arxiv.org/pdf/2310.11511" -O ./data/self_rag_arxiv.pdf
! wget "https://arxiv.org/pdf/2401.15884" -O ./data/crag_arxiv.pdf

导入所需的依赖项：

 
from llama_index.core import SimpleDirectoryReader,VectorStoreIndex,SummaryIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool,QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters,FilterCondition
from typing import List,Optional


importnest_asyncio
nest_asyncio.apply()

读取文档：

 
documents = SimpleDirectoryReader(input_files = ['./data/self_rag_arxiv.pdf']).load_data()
print(len(documents))
print(f"Document Meta{documents[0].metadata}")

将文档拆分成块/节点

 
splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")
print(f"get the content for node 0 :{nodes[0].get_content(metadata_mode='all')}")

输出：

 
Length of nodes : 43
get the content for node 0 :page_label: 1
file_name: self_rag_arxiv.pdf
file_path: data/self_rag_arxiv.pdf
file_type: application/pdf
file_size: 1405127
creation_date: 2024-05-11
last_modified_date: 2023-10-19

Preprint.
SELF-RAG: LEARNING TO RETRIEVE , GENERATE ,AND
CRITIQUE THROUGH SELF-REFLECTION
...（略）

实例化向量存储

 
import chromadb
db = chromadb.PersistentClient(path="./chroma_db_mistral")
chroma_collection = db.get_or_create_collection("multidocument-agent")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

实例化嵌入模型

 
from llama_index.embeddings.fastembed import FastEmbedEmbedding
from llama_index.core import Settings

embed_model = FastEmbedEmbedding(model_name="BAAI/bge-small-en-v1.5")

Settings.embed_model = embed_model

Settings.chunk_size = 1024

实例化 LLM

 
from llama_index.llms.mistralai import MistralAI
os.environ["MISTRAL_API_KEY"] = userdata.get("MISTRAL_API_KEY")
llm = MistralAI(model="mistral-large-latest")

图片：

简要概述：

这段代码展示了如何将文本拆分成多个块并存储在向量数据库中，以及如何实例化嵌入模型和 LLM。最后，它还展示了一个名为 Self-RAG 的框架，用于提高 LLM 的生成质量和准确性。

需要注意的是，代码中的部分内容可能需要根据实际情况进行修改，例如，需要修改 MISTRAL_API_KEY 的值，并根据自己的需要调整其他参数。

为特定文档实例化向量查询工具和摘要工具

LlamaIndex 数据代理处理自然语言输入以执行操作，而不是生成响应。创建有效数据代理的关键在于抽象工具。但在这种情况下，工具究竟指的是什么？可以将工具视为为代理交互而设计的 API 接口，而不是为人类设计的接口。

核心概念：

**工具：**本质上，工具包含一个通用接口和基本元数据，例如名称、描述和函数模式。
**工具规范：**深入探讨 API 的具体细节，提供一个全面的服务 API 规范，可以转换为各种工具。

有多种类型的工具可用：

**FunctionTool：**将任何用户定义的函数转换为工具，并能够推断函数的模式。
**QueryEngineTool：**围绕现有的查询引擎进行包装。由于我们的代理抽象源自 BaseQueryEngine，因此该工具也可以容纳代理。

 
name = "BERT_arxiv"
vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")

def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
perform vector search over index on
query(str): query string needs to be embedded
page_numbers(List[str]): list of page numbers to be retrieved,
leave blank if we want to perform a vector search over all pages
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]

query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_dicts(metadata_dict,
condition=FilterCondition.OR)
)

response = query_engine.query(query)
return response

vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
query_engine=summary_query_engine,
description=("Use ONLY IF you want to get a holistic summary of the documents."
"DO NOT USE if you have specified questions over the documents."))

测试 LLM

 
response = llm.predict_and_call([vector_query_tool],
"Summarize the content in page number 2",
verbose=True)

=== Calling Function ===
Calling function: vector_tool_BERT_arxiv with args: {"query": "summarize content", "page_numbers": ["2"]}
=== Function Output ===
The content discusses the use of RAG models for knowledge-intensive generation tasks, such as MS-MARCO and Jeopardy question generation, showing that the models produce more factual, specific, and diverse responses compared to a BART baseline. The models also perform well in FEVER fact verification, achieving results close to state-of-the-art pipeline models. Additionally, the models demonstrate the ability to update their knowledge as the world changes by replacing the non-parametric memory.

用于生成所有文档的向量存储工具和摘要工具的辅助函数

 
def get_doc_tools(file_path:str,name:str)->str:
'''
从文档中获取向量查询和摘要查询工具
'''

documents = SimpleDirectoryReader(input_files = [file_path]).load_data()
print(f"length of nodes")
splitter = SentenceSplitter(chunk_size=1024,chunk_overlap=100)
nodes = splitter.get_nodes_from_documents(documents)
print(f"Length of nodes : {len(nodes)}")

vector_index = VectorStoreIndex(nodes,storage_context=storage_context)
vector_index.storage_context.vector_store.persist(persist_path="/content/chroma_db")


def vector_query(query:str,page_numbers:Optional[List[str]]=None)->str:
'''
在索引上执行向量搜索
query(str): 需要嵌入的查询字符串
page_numbers(List[str]): 要检索的页码列表，
如果要对所有页面执行向量搜索，则留空
'''
page_numbers = page_numbers or []
metadata_dict = [{"key":'page_label',"value":p} for p in page_numbers]

query_engine = vector_index.as_query_engine(similarity_top_k =2,
filters = MetadataFilters.from_dicts(metadata_dict,
 condition=FilterCondition.OR)
)

response = query_engine.query(query)
return response


vector_query_tool = FunctionTool.from_defaults(name=f"vector_tool_{name}",
fn=vector_query)

summary_index = SummaryIndex(nodes)
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize",
 se_async=True,)
summary_query_tool = QueryEngineTool.from_defaults(name=f"summary_tool_{name}",
 query_engine=summary_query_engine,
description=("Use ONLY IF you want to get a holistic summary of the documents."
"DO NOT USE if you have specified questions over the documents."))
return vector_query_tool,summary_query_tool

准备一个包含指定文档名称的输入列表

 
import os
root_path = "/content/data"
file_name = []
file_path = []
for files in os.listdir(root_path):
if file.endswith(".pdf"):
file_name.append(files.split(".")[0])
file_path.append(os.path.join(root_path,file))

print(file_name)
print(file_path)

 
['self_rag_arxiv', 'crag_arxiv', 'RAG_arxiv', '', 'BERT_arxiv']
['/content/data/BERT_arxiv.pdf',
 /content/data/BERT_arxiv.pdf',
 /content/data/BERT_arxiv.pdf',
 /content/data/BERT_arxiv.pdf',
 /content/data/BERT_arxiv.pdf']

注意：FunctionTool 期望工具名称的字符串匹配模式 ^[a-zA-Z0--9_-]+$

为每个文档生成向量工具和摘要工具

 
papers_to_tools_dict = {}
for name,filename in zip(file_name,file_path):
vector_query_tool,summary_query_tool = get_doc_tools(filename,name)
papers_to_tools_dict[name] = [vector_query_tool,summary_query_tool]

 
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28
length of nodes
Length of nodes : 28

将工具放入一个扁平列表中

 
initial_tools = [t for f in file_name for t in papers_to_tools_dict[f]]
initial_tools

将太多工具选择塞入 LLM 提示会导致以下问题：

尤其是当文档数量很大时，工具可能不适合提示，因为我们将每个文档建模为单独的工具。
由于令牌数量的增加，成本和延迟会激增。
提示大纲也可能变得混乱，导致 LLM 无法按照指示执行。

这里的解决方案是在工具级别执行 RAG。为了执行此操作，我们将使用 Llama-Index 的 ObjectIndex 类。

ObjectIndex 类允许对任意 Python 对象进行索引。因此，它非常灵活，适用于各种用例。例如：

使用ObjectIndexObjectIndex
使用ObjectIndexObjectIndex

VectorStoreIndex 是 LlamaIndex 的一个关键组件，它有助于数据的存储和检索。它的工作原理是：

接受 Node 对象列表并从中构建索引。
使用不同的向量存储作为存储后端，增强应用程序的灵活性和可扩展性。

 
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(initial_tools,index_cls=VectorStoreIndex)

将 ObjectIndex 设置为检索器

 
obj_retriever = obj_index.as_retriever(similarity_top_k=2)
tools = obj_retriever.retrieve("compare and contrast the papers self rag and corrective rag")

print(tools[0].metadata)
print(tools[1].metadata)

 
ToolMetadata(description='Use ONLY IF you want to get a holistic summary of the documents.DO NOT USE if you have specified questions over the documents.', name='summary_tool_self_rag_arxiv', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

ToolMetadata(description='vector_tool_self_rag_arxiv(query: str, page_numbers: Optional[List[str]] = None) -> str\n\nperform vector search over index on\nquery(str): query string needs to be embedded\npage_numbers(List[str]): list of page numbers to be retrieved,\nleave blank if we want to perform a vector search over all pages\n', name='vector_tool_self_rag_arxiv', fn_schema=<class 'pydantic.v1.main.vector_tool_self_rag_arxiv'>, return_direct=False)

设置 RAG 智能体

 
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(tool_retriever=obj_retriever,
 llm=llm,
 system_prompt="""You are an agent designed to answer queries over a set of given papers.
 Please always use the tools provided to answer a question.Do not rely on prior knowledge.""",
 verbose=True)
agent = AgentRunner(agent_worker)

询问问题 1

 
response = agent.query("Compare and contrast self rag and crag.")
print(str(response))

Added user message to memory: Compare and contrast self rag and crag.
=== LLM Response ===
Sure, I'd be happy to help you understand the differences between Self RAG and CRAG, based on the functions provided to me.

Self RAG (Retrieval-Augmented Generation) is a method where the model generates a holistic summary of the documents provided as input. It'
s important to note that this method should only be used if you want a general summary of the documents, and not if you have specific questions over the documents.

On the other hand, CRAG (Contrastive Retrieval-Augmented Generation) is also a method for generating a holistic summary of the documents. The key difference between CRAG and Self RAG is not explicitly clear from the functions provided. However, the name suggests that CRAG might use a contrastive approach in its retrieval process, which could potentially lead to a summary that highlights the differences and similarities between the documents more effectively.

Again, it's crucial to remember that both of these methods should only be used for a holistic summary, and not for answering specific questions over the documents.

询问问题 2

 
response = agent.query("Summarize the paper corrective RAG.")
print(str(response))

Added user message to memory: Summarize the paper corrective RAG.
=== Calling Function ===
Calling function: summary_tool_RAG_arxiv with args: {"input": "corrective RAG"}
=== Function Output ===
The corrective RAG approach is a method used to address issues or errors in a system by categorizing them into three levels: Red, Amber, and Green. Red signifies critical problems that need immediate attention, Amber indicates issues that require monitoring or action in the near future, and Green represents no significant concerns. This approach helps prioritize and manage corrective actions effectively based on the severity of the identified issues.
=== LLM Response ===
The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.
assistant: The corrective RAG approach categorizes issues into Red, Amber, and Green levels to prioritize and manage corrective actions effectively based on severity. Red signifies critical problems needing immediate attention, Amber requires monitoring or action soon, and Green indicates no significant concerns.

结论

与适用于跨少量文档进行简单查询的标准 RAG 管道不同，这种智能方法根据初始发现进行调整以增强进一步的数据检索。我们在此开发了一个自主研究智能体，增强了我们全面参与和分析数据的能力。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费POC验证，效果达标后再合作。零风险落地应用大模型，已交付160+中大型企业