RAG实战(一)：Simple RAG篇

发布日期：2025-11-13 05:49:57 浏览次数： 2120

作者：影子的分享站

微信搜一搜，关注“影子的分享站”

接下来带来一套RAG实战教程，涵盖各类RAG场景，教程代码均为Python版，保守预计文章30篇+。本篇是第一篇最简单的RAG示例流程，文章中会贴出所有代码，但若想学的更方便，可通过知识星球获取完整项目代码。

环境配置

conda create -n rag-learning python = 3.10.0

source activate rag-learning

安装依赖 requirements.txt

• fitz
• numpy
• openai
• requests
• rank-bm25
• scikit-learn
• networkx
• matplotlib
• tqdm
• Pillow
• faiss-cpu

Simpe RAG流程描述

以下实现一个简单RAG流程，遵循以下步骤

1. 文本提取：家在和预处理文本数据
2. 分块；将数据分成更小的块，提高检索性能
3. 嵌入创建：使用嵌入模型将文本块转化为数字表示
4. 语义搜索：根据用户查询检索相关块
5. 响应生成：使用AI模型根据检索到的文本生成相应

导入依赖

import fitz
import os
import numpy as np
import json
from openai import OpenAI
import faiss

from enum import Enum

class LLMsProvider(Enum):
    OPENAI = "openai"

def get_llms_provider(provider: LLMsProvider, model_id: str = None):
    if provider == LLMsProvider.OPENAI:
        from openai import OpenAI
        import os
        client = OpenAI(
            api_key=os.getenv("DASHSCOPE_API_KEY"),
            base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
        )
        return client

PDF解析

在本实例中，我们使用PyMuPDF库，从PDF文件提取文本

def extract_text_from_pdf(pdf_path):
    my_pdf = fitz.open(pdf_path)
    all_text = ""
    for page_num in range(my_pdf.page_count):
        page = my_pdf[page_num]
        text = page.get_text("text")
        all_text = text
    return all_text

PDF解析

pdf_path = "../data/google-ai-Agents-whitepaper.pdf"

extract_text = extract_text_from_pdf(pdf_path)
extract_text

"Agents\n42\nSeptember 2024\nEndnotes\n1.\t Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: \nhttps://arxiv.org/abs/2210.03629\n2.\t Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. \nAvailable at: https://arxiv.org/pdf/2201.11903.pdf.\n3.\t Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. \nAvailable at: https://arxiv.org/abs/2203.11171.\n4.\t Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: \nhttps://arxiv.org/pdf/2302.12246.pdf.\n5.\t Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: \nhttps://arxiv.org/abs/2302.00923.\n6.\t Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Available at: \nhttps://arxiv.org/abs/2305.08291.\n8.\t Google. 'Google Gemini Application'. Available at: http://gemini.google.com.\n9.\t Swagger. 'OpenAPI Specification'. Available at: https://swagger.io/specification/.\n10.\tXie, M., 2022, 'How does in-context learning work? A framework for understanding the differences from \ntraditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.\n11.\t Google Research. 'ScaNN (Scalable Nearest Neighbors)'. Available at: \nhttps://github.com/google-research/google-research/tree/master/scann.\n12.\t LangChain. 'LangChain'. Available at: https://python.langchain.com/v0.2/docs/introduction/.\n"

文本分块

当我们提取了文本，可将其分成更小的，重叠的块，以提高检索的准确性

def chunk_text(text, chunk_size=1000,overlap=200):
    chunks = []

    for i in range(0, len(text), chunk_size - overlap):
        chunk_text = text[i:i + chunk_size]
        chunks.append(chunk_text)
    return chunks

text_chunks = chunk_text(extract_text, 1000, 200)
print("Number of text chunks:", len(text_chunks))

print("\n First text chunk:")
print(text_chunks[0])

Number of text chunks: 3

 First text chunk:
Agents
42
September 2024
Endnotes
1.     Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: 
https://arxiv.org/abs/2210.03629
2.     Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. 
Available at: https://arxiv.org/pdf/2201.11903.pdf.
3.     Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. 
Available at: https://arxiv.org/abs/2203.11171.
4.     Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: 
https://arxiv.org/pdf/2302.12246.pdf.
5.     Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: 
https://arxiv.org/abs/2302.00923.
6.     Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: 
https://arxiv.org/abs/2305.10601.
7.     Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Av

文本向量化

def create_embeddings(text, client, model="text-embedding-v4"):
    response = client.embeddings.create(model=model, input=text)

    embeddings = [response.data[i].embedding for i in range(len(response.data))]
    passages_embs = np.array(embeddings).astype(np.float32)
    return passages_embs,passages_embs.shape[0],passages_embs.shape[1]

client = get_llms_provider(LLMsProvider.OPENAI)

知识库构建

def build_index(passages):
    passages_embs, num_vec, vec_dim = create_embeddings(passages, client)
    quantizer = faiss.IndexFlatIP(vec_dim)
    faiss_index = faiss.IndexIVFFlat(
        quantizer, vec_dim, int(np.sqrt(num_vec)), faiss.METRIC_INNER_PRODUCT
    )
    faiss.normalize_L2(passages_embs)
    faiss_index.train(passages_embs)
    faiss_index.add(passages_embs)
    return faiss_index

faiss_index = build_index(text_chunks)

WARNING clustering 3 points to 1 centroids: please provide at least 39 training points

知识库检索

def search(query, faiss_index, passages, recall_topk=3):
    passages_embs,_,_ = create_embeddings([query], client)
    res_distance, res_index = faiss_index.search(passages_embs, recall_topk)
    candidate_query_score_list = []
    for index, score in zip(res_index[0], res_distance[0]):
        candidate_query_score_list.append({"text": passages[index]})
        candidate_query_score_list[-1]["score"] = float(score)
    return candidate_query_score_list

query = "What is an agent"

top_chunks = search(query, faiss_index, text_chunks)

for i, chunk in enumerate(top_chunks):
    print(f"Context {i + 1}:\n{chunk}\n==============")

Context 1:
{'text': "Agents\n42\nSeptember 2024\nEndnotes\n1.\t Shafran, I., Cao, Y. et al., 2022, 'ReAct: Synergizing Reasoning and Acting in Language Models'. Available at: \nhttps://arxiv.org/abs/2210.03629\n2.\t Wei, J., Wang, X. et al., 2023, 'Chain-of-Thought Prompting Elicits Reasoning in Large Language Models'. \nAvailable at: https://arxiv.org/pdf/2201.11903.pdf.\n3.\t Wang, X. et al., 2022, 'Self-Consistency Improves Chain of Thought Reasoning in Language Models'. \nAvailable at: https://arxiv.org/abs/2203.11171.\n4.\t Diao, S. et al., 2023, 'Active Prompting with Chain-of-Thought for Large Language Models'. Available at: \nhttps://arxiv.org/pdf/2302.12246.pdf.\n5.\t Zhang, H. et al., 2023, 'Multimodal Chain-of-Thought Reasoning in Language Models'. Available at: \nhttps://arxiv.org/abs/2302.00923.\n6.\t Yao, S. et al., 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Av", 'score': 0.27010855078697205}
==============
Context 2:
{'text': 'vailable at: https://python.langchain.com/v0.2/docs/introduction/.\n', 'score': 0.15880665183067322}
==============
Context 3:
{'text': " 2023, 'Tree of Thoughts: Deliberate Problem Solving with Large Language Models'. Available at: \nhttps://arxiv.org/abs/2305.10601.\n7.\t Long, X., 2023, 'Large Language Model Guided Tree-of-Thought'. Available at: \nhttps://arxiv.org/abs/2305.08291.\n8.\t Google. 'Google Gemini Application'. Available at: http://gemini.google.com.\n9.\t Swagger. 'OpenAPI Specification'. Available at: https://swagger.io/specification/.\n10.\tXie, M., 2022, 'How does in-context learning work? A framework for understanding the differences from \ntraditional supervised learning'. Available at: https://ai.stanford.edu/blog/understanding-incontext/.\n11.\t Google Research. 'ScaNN (Scalable Nearest Neighbors)'. Available at: \nhttps://github.com/google-research/google-research/tree/master/scann.\n12.\t LangChain. 'LangChain'. Available at: https://python.langchain.com/v0.2/docs/introduction/.\n", 'score': 0.14151382446289062}
==============

LLMs回复

def generate_response(system_prompt, user_message, model="qwen-plus"):
    response = client.chat.completions.create(
        model=model,
        temperature=0,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "system", "content": user_message},
        ]
    )
    return response

system_prompt = "You are an AI Agent assistant that strictly answers based on the given context. If the answer cannot be derived directly from the provided context, respond with: 'I do not have enough information to answer that.'"
# Create the user prompt based on the top chunks
user_prompt = "\n".join([f"Context {i + 1}:\n{chunk['text']}\n=====================================\n" for i, chunk in enumerate(top_chunks)])
user_prompt = f"{user_prompt}\nQuestion: {query}"

# Generate AI response
ai_response = generate_response(system_prompt, user_prompt)
print(f"ai_response:{ai_response}")

ai_response:ChatCompletion(id='chatcmpl-f4693ca5-c995-4af3-b9b9-572ad33258db', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I do not have enough information to answer that.', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1762959661, model='qwen-plus', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=723, total_tokens=733, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

RAG评测

# Define the system prompt for the evaluation system
evaluate_system_prompt = "You are an intelligent evaluation system tasked with assessing the AI assistant's responses. If the AI assistant's response is very close to the true response, assign a score of 1. If the response is incorrect or unsatisfactory in relation to the true response, assign a score of 0. If the response is partially aligned with the true response, assign a score of 0.5."

# Create the evaluation prompt by combining the user query, AI response, true response, and evaluation system prompt
answer = "Agents are autonomous and can act independently of human intervention, especially when provided with proper goals or objectives they are meant to achieve"

evaluation_prompt = f"User Query: {query}\nAI Response:\n{ai_response.choices[0].message.content}\nTrue Response: {answer}\n{evaluate_system_prompt}"

# Generate the evaluation response using the evaluation system prompt and evaluation prompt
evaluation_response = generate_response(evaluate_system_prompt, evaluation_prompt)

# Print the evaluation response
print(evaluation_response.choices[0].message.content)

evaluation_response

0



ChatCompletion(id='chatcmpl-7a18d6c1-0b86-4806-b4c8-6fff1f59f81a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='0', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1762959661, model='qwen-plus', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=1, prompt_tokens=215, total_tokens=216, completion_tokens_details=None, prompt_tokens_details=PromptTokensDetails(audio_tokens=None, cached_tokens=0)))

上述项目代码可通过知识星球获取