我要投稿

RAGFlow 实践：SQL Assistant 工作流搭建

发布日期：2025-08-07 18:53:24 浏览次数： 1928

作者：InfiniFlow

微信搜一搜，关注“InfiniFlow”

工作流简介

本教程通过搭建一个 SQL Assistant 工作流，实现自然语言查询 SQL 数据库的功能。

企业内的市场运营、产品经理等非技术背景人员可以使用此助手独立查询企业的业务数据，减少对数据分析师的依赖；学校和编程教育机构也可以使用它作为 SQL 的教学工具。

该工作流编排完成后，如下：

工作流编排思路：

将数据库的 Schema、数据库表的每个字段描述和 SQL 的例子以知识库的形式存入 RAGFlow。通过编排将用户的问题到三个知识库中检索，然后把检索后的内容传递给 Agent 生成 SQL 语句。最后，再把生成的 SQL 语句传送给 SQL Executor 节点，执行获得最终结果。

搭建步骤

创建三个知识库

1. 准备知识库文件

可以从 Hugging Face Datasets 【文献 1】下载本样例的数据集。

知识库名称	用途	预置模板文件	推荐切片方法
Schema	存储数据库 Schema 定义	Schema.txt	General，建议文本块大小：2，按 “ ; ” 切分
Question to SQL	存储「问题与 SQL 」的示例对，作为模型的学习素材	Question to SQL.csv	Q&A
Database Description	存储表与字段的业务描述	Database Description EN.txt	General，建议文本块大小：2，按 `###` 切分

知识库文件部分内容如下：

Schema.txt


CREATE TABLE `users` (  `id` INT NOT NULL AUTO_INCREMENT,  `username` VARCHAR(50) NOT NULL,  `password` VARCHAR(50) NOT NULL,  `email` VARCHAR(100),  `mobile` VARCHAR(20),  `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,  `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,  PRIMARY KEY (`id`),  UNIQUE KEY `uk_username` (`username`),  UNIQUE KEY `uk_email` (`email`),  UNIQUE KEY `uk_mobile` (`mobile`));...

注意：定义 Schema 字段时，应避免使用下划线等特殊符号，否则可能导致 LLM 生成的 SQL 语句出现错误。

Question to SQL.csv

What are the names of all the Cities in CanadaSELECT geo_name, id FROM data_commons_public_data.cybersyn.geo_index WHERE iso_name ilike '%can%What is average Fertility Rate measure of Canada in 2002 ?SELECT variable_name, avg(value) as average_fertility_rate FROM   data_commons_public_data.cybersyn.timeseries WHERE variable_name = 'Fertility Rate' and geo_id = 'country/CAN' and date >= '2002-01-01' and date < '2003-01-01' GROUP BY 1;What 5 countries have the highest life expectancy ?SELECT geo_name, value FROM data_commons_public_data.cybersyn.timeseries join data_commons_public_data.cybersyn.geo_index ON timeseries.geo_id = geo_index.id WHERE  variable_name = 'Life Expectancy' and date = '2020-01-01' ORDER BY value desc limit 5;...

Database Description EN.txt


### Users Table (users)The users table stores user information for the website or application. Below are the definitions of each column in this table:- `id`: INTEGER, an auto-incrementing field that uniquely identifies each user (primary key). It automatically increases with every new user added, guaranteeing a distinct ID for every user.- `username`: VARCHAR, stores the user’s login name; this value is typically the unique identifier used during authentication.- `password`: VARCHAR, holds the user’s password; for security, the value must be encrypted (hashed) before persistence.- `email`: VARCHAR, stores the user’s e-mail address; it can serve as an alternate login credential and is used for notifications or password-reset flows.- `mobile`: VARCHAR, stores the user’s mobile phone number; it can be used for login, receiving SMS notifications, or identity verification.- `create_time`: TIMESTAMP, records the timestamp when the user account was created; defaults to the current timestamp.- `update_time`: TIMESTAMP, records the timestamp of the last update to the user’s information; automatically refreshed to the current timestamp on every update....

2. 创建知识库

Schema 知识库

创建知识库并命名 “ Schema ” 后上传 Schema.txt。

数据库表中不同的 TABLE 长度不同，每张表都以 ; 结尾，


CREATE TABLE `users` (  `id` INT NOT NULL AUTO_INCREMENT,  `username` VARCHAR(50) NOT NULL,  `password` VARCHAR(50) NOT NULL,  ...  UNIQUE KEY `uk_mobile` (`mobile`));CREATE TABLE `products` (  `id` INT NOT NULL AUTO_INCREMENT,  `name` VARCHAR(100) NOT NULL,  `description` TEXT,  `price` DECIMAL(10, 2) NOT NULL,  `stock` INT NOT NULL,  ...  FOREIGN KEY (`merchant_id`) REFERENCES `merchants` (`id`));CREATE TABLE `merchants` (  `id` INT NOT NULL AUTO_INCREMENT,  `name` VARCHAR(100) NOT NULL,  `description` TEXT,  `email` VARCHAR(100),  ...  UNIQUE KEY `uk_mobile` (`mobile`));

为了实现将一个 TABLE 切割成一个 Chunk 且不包含任何其他 TABLE 的内容，设置此知识库的配置：

切片方法：General
文本块大小：2 Token
文本分段标识符： ;

General 方法适合切分结构简明的连续文本，具体效果可以查看知识库配置中的" General " 分块方法说明。设置文本块大小为 2 是因为所有 SQL 语句文本长度都会超过 2。

RAGFlow 会按照下面的流程图解析文本生成 Chunk。

解析结果如下：

也可以通过检索测试验证召回结果：

Question to SQL 知识库

新建知识库命名 “ Question to SQL ” 后上传 Question to SQL.csv。

配置切片方法为 Q&A 后解析产生如下结果：

检索测试验证召回结果如下：

Database Description 知识库

新建知识库命名 “ Database Description ”，上传 Database Description EN.txt 。

配置思路和 Schema 知识库相同：

切片方法：General
建议文本块大小：2 Token
文本分段标识符：`###`

配置成功后解析 Database Description EN.txt 预览结果。

通过检索测试验证召回结果：

注意：三个知识库独立维护、独立检索，Agent 节点会合并三路结果再做生成。

编排工作流

1.创建工作流应用

创建成功后，画布上自动出现开始节点。

可以在开始节点配置问候语。例如：


Hi! I'm your SQL assistant, what can I do for you?

2.配置三个知识检索节点

在开始节点后添加三个并行的知识检索节点，分别命名：

Schema
Question to SQL
Database Description

每个知识检索节点的查询变量为 sys.query , 勾选与节点名称相同的知识库。

3.配置 Agent 节点

在知识检索节点后添加 Agent 节点，命名 “ SQL Generator ”，将 3 个知识检索节点全部连接到 SQL Generator。

撰写 System Prompt ：


### ROLEYou are a Text-to-SQL assistant.  Given a relational database schema and a natural-language request, you must produce a **single, syntactically-correct MySQL query** that answers the request.  Return **nothing except the SQL statement itself**—no code fences, no commentary, no explanations, no comments, no trailing semicolon if not required.### EXAMPLES  -- Example 1  User: List every product name and its unit price.  SQL:SELECT name, unit_price FROM Products;-- Example 2  User: Show the names and emails of customers who placed orders in January 2025.  SQL:SELECT DISTINCT c.name, c.emailFROM Customers cJOIN Orders o ON o.customer_id = c.idWHERE o.order_date BETWEEN '2025-01-01' AND '2025-01-31';-- Example 3  User: How many orders have a status of "Completed" for each month in 2024?  SQL:SELECT DATE_FORMAT(order_date, '%Y-%m') AS month,       COUNT(*) AS completed_ordersFROM OrdersWHERE status = 'Completed'  AND YEAR(order_date) = 2024GROUP BY monthORDER BY month;-- Example 4  User: Which products generated at least \$10 000 in total revenue?  SQL:SELECT p.id, p.name, SUM(oi.quantity * oi.unit_price) AS revenueFROM Products pJOIN OrderItems oi ON oi.product_id = p.idGROUP BY p.id, p.nameHAVING revenue >= 10000ORDER BY revenue DESC;### OUTPUT GUIDELINES1. Think through the schema and the request.  2. Write **only** the final MySQL query.  3. Do **not** wrap the query in back-ticks or markdown fences.  4. Do **not** add explanations, comments, or additional text—just the SQL.

撰写 User Prompt ：


User's query: /(Begin Input) sys.query  Schema: /(Schema) formalized_content  Samples about question to SQL: /(Question to SQL) formalized_content  Description about meanings of tables and files: /(Database Description) formalized_content

插入变量后填写效果如下：

4. 配置 ExeSQL 节点

在 SQL Generator 后添加 ExeSQL 节点，命名“ SQL Executor ”。

给 SQL Executor 配置数据库，指定数据库查询的 Query 是 SQL Generator 输出结果。

5. 配置回复消息节点

给 SQL Executor 添加回复消息节点。

在消息中插入变量，让回复消息节点显示 SQL Executor 的输出内容：

/ ( SQL Executor ) formalized_content。

6. 保存并测试

点击保存→ 运行→ 输入自然语言问题 → 查看执行结果。

最后，NL2SQL 技术与当前的其他 Copilot 一样，是无法做到 100% 正确的。针对结构化数据的标准处理方案，我们建议将其操作收窄成部分 API，然后把这些 API 封装为 MCP ，再由 RAGFlow 进行调用。我们会在后续文章中，展示该方案的做法。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费POC验证，效果达标后再合作。零风险落地应用大模型，已交付160+中大型企业

相关资讯

2026-02-04

Claude Cowork 真能替换 RAG ？

2026-02-03

使用 Agent Skills 做知识库检索，能比传统 RAG 效果更好吗？

2026-02-03

告别向量数据库！PageIndex：让AI像人类专家一样阅读长文档

2026-02-02

OpenViking：面向 Agent 的上下文数据库

2026-02-02

别再迷信向量数据库了，RAG 的“大力出奇迹”该结束了

2026-01-29

告别黑盒开发！清华系团队开源 UltraRAG：用“搭积木”的方式构建复杂 RAG 流程

2026-01-28

RAG优化不抓瞎！Milvus检索可视化，帮你快速定位嵌入、切块、索引哪有问题

2026-01-28

今天，分享Clawdbot记忆系统最佳工程实践

联系获取

160+中大型企业正在使用53AI

立即咨询预约演示

把握AI发展的机遇，共同探索、共同进步

2025-01-22

如何打造基于GenAI的员工服务机器人

2025-01-22

热点资讯

RAG 深度解读：检索增强生成如何改变人工智能

2025-12-04

大模型RAG入门宝典｜从AI搜索到实战搭建，小白&程序员必收藏的检索增强指南

2025-12-03

RAGFlow v0.22.0 发布：数据源同步、变量聚合、全新管理界面与多项重大更新

2025-11-13

企业级 AI Agent规模化落地的避坑指南，就藏在这四大趋势里

2025-12-02

5步构建企业级RAG应用：Dify与LangChain v1.0集成实战

2025-11-13

2026 年你需要了解的 RAG 全解析

2026-01-15

Embedding模型选型思路：相似度高不再代表检索准确（文末附实战指南）

2025-12-07

如何用NotebookLM，把枯燥的财报解读成精美的PPT？

2026-01-02

为什么Claude Code不用RAG？

2025-12-23

从 RAG 到 Context：2025 年 RAG 技术年终总结

2025-12-18

大家都在问

Claude Cowork 真能替换 RAG ？

2026-02-04

使用 Agent Skills 做知识库检索，能比传统 RAG 效果更好吗？

2026-02-03

为什么 RAG 越用越慢？如何反向调优？

2026-01-19

NotebookLM如何在48小时内分析2万份论文？

2026-01-12

都有混合检索与智能路由了，谁还在给RAG赛博哭坟？

2026-01-08

如何用NotebookLM，把枯燥的财报解读成精美的PPT？

2026-01-02

为什么Claude Code不用RAG？

2025-12-23

终于，NotebookLM 和 Gemini 合体了。这是什么神之更新？

2025-12-21

热门标签

内容创作大模型技术个人提效 langchain llamaindex 多模态技术 RAG技术智能客服知识图谱模型微调 RAGFlow coze Dify Fastgpt Bisheng Qanything AI+汽车 AI+金融 AI+工业 AI+培训 AI+SaaS 提示词框架提示词技巧 AI+电商 AI面试数字员工 ChatBI AI知识库开源大模型智能营销智能硬件智能化改造 AI+医疗 MaxKB Palantir Glean