掌握 RAG：使用 Langchain 和 Ollama 创建本地智能应用程序

发布日期：2024-07-26 14:37:19 浏览次数： 3733

引言

随着大型语言模型（LLMs）的兴起，我们见证了一种新的工具类别的诞生。然而，LLMs 也存在局限性，尤其是当面对需要最新信息或专有数据的商业用例时。本文将介绍如何通过微调和 RAG 来解决这些问题。

LLMs 的局限性

传统 LLMs 训练成本高昂，且只能访问公共信息。对于商业用途，需要模型能够提供基于内部知识的最新回应。文章介绍了两种解决这一问题的方法：微调和 RAG。

微调

微调是针对特定数据集进一步训练预训练模型的过程，使其适应特定任务或领域。这类似于给一个通才型助手提供额外的、针对性的训练，使其成为某个特定领域的专家。

RAG

RAG 是一种模型从外部来源检索相关信息以生成更准确、更有信息量回应的方法。与传统依赖预训练知识的模型不同，RAG 通过数据库或搜索引擎查找额外数据，并结合这些数据生成回应。

开源解决方案

面对 LLMs 的法律和安全问题，开源社区提供了解决方案。自 Meta 发布了首个 Llama 模型以来，开源社区迅速响应，为本地实验提供了机会。

RAG 工作流程

文章通过图解介绍了 RAG 的核心工作流程，包括文本分块、向量化存储、语义搜索和组合提示。

1. 文本分块：将内容分割成文本块，以便更好地检索相关内容。
2. 向量化存储：将文本转换为向量并存储在向量数据库中。
3. 语义搜索：使用数值表示进行内容搜索，返回相关内容。
4. 组合提示：将问题与相关内容结合，生成更准确的提示。

构建 RAG 应用程序

文章通过示例代码，展示了如何使用 Langchain、ChromaDB、Ollama 和 Streamlit 构建 RAG 应用程序。

1. Langchain 是一个构建大型语言模型（LLM）驱动应用程序的框架。它通过将链、代理和检索策略整合在一起，简化了从概念到实际应用的整个开发过程。Langchain 的核心是其链的概念，这些链构成了应用程序的认知架构。
2. ChromaDB 是一款开源的轻量级矢量数据库，非常适合小规模服务和用例。它在任何基于 LLM 的应用程序中都扮演着重要角色，因为它以矢量格式存储文本数据，这是 AI 和 ML 模型原生使用的数据格式，可以视为 AI 的内存。
3. Ollama 是一个工具，允许用户轻松在本地运行开源模型。它简化了将这些模型集成到应用程序中的复杂性，使得开发者可以快速利用最新的模型，如 Meta 的 Llama3，进行本地开发和测试。
4. Streamlit 是一个开源框架，用于快速且容易地在机器学习和数据科学应用程序之上构建 Web 界面。它允许开发者使用纯 Python 代码将数据脚本转换为可共享的 Web 应用程序，无需前端开发经验，非常适合快速原型开发和应用部署。

初始化项目

使用 Poetry 进行依赖管理，并安装了必要的依赖项。

poetry add langchain chromadb streamlit

创建文档上传 UI

使用 Streamlit 创建简单的界面，允许用户上传 PDF 文档。

def init_ui():
"""init_ui Initializes the UI"""
    st.set_page_config(page_title="Langchain RAG Bot", layout="wide")
    st.title("Langchain RAG Bot")

# Initialise session state
if"chat_history"notin st.session_state:
        st.session_state.chat_history =[
AIMessage(content="Hello, I'm here to help. Ask me anything!")
]

with st.sidebar:
        st.header("Document Capture")
        st.write("Please select a single document to use as context")
        st.markdown("**Please fill the below form :**")
with st.form(key="Form", clear_on_submit=True):
            uploaded_file = st.file_uploader("Upload",type=["pdf"], key="pdf_upload")
            submit = st.form_submit_button(label="Upload")

if submit:
        persist_file(uploaded_file)

Ollama

确保本地运行 Ollama，以便创建正确格式的嵌入。

初始化向量存储

将文档内容转换为向量并存储在 ChromaDB 中。

def init_vector_store():
"""
    Initializes and returns ChromaDB vector store from document chunks

    Returns:
        ChromaDB: Initialized vector store
    """
# Get the first file - in reality this would be more robust
    files =[f for f in DATA_DIR.iterdir()if f.is_file]
ifnot files:
        st.error("No files uploaded")
returnNone

# Get the path to the first file in the directory
    first_file = files[0].resolve()
# Use the PDF loader in Langchain to fetch the document text
    loader =PyPDFLoader(first_file)
    document = loader.load_and_split()

# Now we initialise the text splitter we will use on the document
    text_splitter =RecursiveCharacterTextSplitter()
    document_chunks = text_splitter.split_documents(document)

# Lastly, we initialise the vector store using the split document
    vector_store =Chroma.from_documents(
        documents=document_chunks,
        embedding=OllamaEmbeddings(),
        persist_directory=str(DB_DIR),
        collection_name="pdf_v_db"# Important if you want to reference the DB later
)

return vector_store

创建检索链

构建检索链，以便根据用户查询检索相关内容。

def get_related_context(vector_store: Chroma)->RetrieverOutputLike:
"""
    Will retrieve the relevant context based on the user's query
    using Approximate Nearest Neighbor search (ANN)

    Args:
        vector_store (Chroma): The initialized vector store with context

    Returns:
        RetrieverOutputLike: The chain component to be used with the LLM
    """

# Specify the model to use
    llm =Ollama(model="llama3")

# Here we are using the vector store as the source
    retriever = vector_store.as_retriever()

# Create a prompt that will be used to query the vector store for related content
    prompt =ChatPromptTemplate.from_messages([
MessagesPlaceholder(variable_name="chat_history"),
("user","{input}"),
("user","Given the above conversation, generate a search query to look up to get information relevant to the conversation")
])

# Create the chain element which will fetch the relevant content from ChromaDB
    chain_element = create_history_aware_retriever(llm, retriever, prompt)
return chain_element

defget_context_aware_prompt(context_chain: RetrieverOutputLike)->Runnable:
"""
    Combined the chain element to fetch content with one that then creates the
    prompt used to interact with the LLM

    Args:
        context_chain (RetrieverOutputLike): The retriever chain that can
            fetch related content from ChromaDB

    Returns:
        Runnable: The full runnable chain that can be executed
    """

# Specify the model to use
    llm =Ollama(model="llama3")

# A standard prompt template which combined chat history with user query
# NOTE: You MUST pass the context into the system message
    prompt =ChatPromptTemplate.from_messages([
("system","You are a helpful assistant that can answer the users questions. Use provided context to answer the question as accurately as possible:\n\n{context}"),
MessagesPlaceholder(variable_name="chat_history"),
("user","{input}")
])

# This method creates a chain for passing documents to a LLM
    docs_chain = create_stuff_documents_chain(llm, prompt)

# Now we merge the context chain & docs chain to form the full prompt
    rag_chain = create_retrieval_chain(context_chain, docs_chain)
return rag_chain

聊天界面

使用 Streamlit 的内置聊天界面与 RAG LLM 进行交互。

def get_response(user_query: str)->str:
"""
    Will use the query to fetch context & form a query to send to an LLM.
    Responds with the result of the query

    Args:
        user_query (str): Query input but user

    Returns:
        str: Answer from the LLM
    """
    context_chain = get_related_context(st.session_state.vector_store)
    rag_chain = get_context_aware_prompt(context_chain)

    res = rag_chain.invoke({
"chat_history": st.session_state.chat_history,
"input": user_query
})
return res["answer"]

definit_chat_interface():
"""
    Initializes a chat interface which will leverage our rag chain & a local LLM
    to answer questions about the context provided
    """

    user_query = st.chat_input("Ask a question....")
if user_query isnotNoneand user_query !="":
        response = get_response(user_query)

# Add the current chat to the chat history
        st.session_state.chat_history.append(HumanMessage(content=user_query))
        st.session_state.chat_history.append(AIMessage(content=response))

# Print the chat history
for message in st.session_state.chat_history:
ifisinstance(message,HumanMessage):
with st.chat_message("Human"):
                st.write(message.content)
ifisinstance(message,AIMessage):
with st.chat_message("AI"):
                st.write(message.content)

结论

尽管 LLMs 功能强大，但它们并非没有缺点。通过一些创造性思维和正确的工具，可以将这些挑战转化为机遇。结合微调和 RAG，以及 Langchain、ChromaDB、Ollama 和 Streamlit 等开源模型和框架，可以为 LLMs 的实际应用提供强大的解决方案。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费POC验证，效果达标后再合作。零风险落地应用大模型，已交付160+中大型企业