深度剖析｜Claude Agent 是如何一步步动态加载 skill 的（续）

发布日期：2026-01-26 09:29:05 浏览次数： 2421

作者：Mr杂货铺

微信搜一搜，关注“Mr杂货铺”

书接上文，上一篇偷懒了，做得还不够深入。

这一篇进一步研究下 Claude Agent 的上下文是怎么构建的，Skill 相关的提示词是怎么注入的。

抓取请求日志

首先我们利用 litellm 的 proxy 来中转请求，这样就可以获取 Agent 发送的所有数据。

配置 proxy config 后，直接修改 anthropic 的 endpoints，把 request_body 写入指定日志：

# litellm/proxy/anthropic_endpoints/endpoints.py
@router.post(
    "/v1/messages",
    tags=["[beta] Anthropic `/v1/messages`"],
    dependencies=[Depends(user_api_key_auth)],
    include_in_schema=False,
)
async def anthropic_response(  # noqa: PLR0915
    fastapi_response: Response,
    request: Request,
    user_api_key_dict: UserAPIKeyAuth = Depends(user_api_key_auth),
):
    ...
    request_data = await _read_request_body(request=request)
    with open("/Users/leemysw/Projects/demo/agent_log.jsonl", 'a', encoding='utf-8') as f:
        f.write(json.dumps(request_data, ensure_ascii=False) + '\n')
    ...

跟之前一样，创建一个 agent 使用 feishu-docx 做测试，询问「我有什么 skills」。

日志分析

后台一共记录了 5 条日志，我们逐一来分析。

从日志可以看出，只是问了一句「我有什么 skills」，产生了 5 条请求。跟我们平时理解的「一问一答」差得有点远。

仔细看这 5 条日志的 system prompt 和 model 参数，有的写着 "File search specialist"，有的是 "Software architect"，只有一条是真正处理用户问题的，Claude Code 在后台同时跑了不同职能的子智能体。

第 1 条：Token Count 探测

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 1,
  "messages": [{"role": "user", "content": "count"}],
  "tools": [/* 完整的工具定义列表 */]
}

max_tokens设成 1，基本不让模型说话。这请求不是真想问啥，就是拿来算 token 的。Claude API 返回的时候会带 input_tokens统计，发个极小的请求就能知道 system prompt 加工具定义占了多少 token，干活之前先摸清楚自己的 context 还剩多少空间。

第 2 条：Explore Agent 预热

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 32000,
  "messages": [{"role": "user", "content": [{"type": "text", "text": "Warmup", "cache_control": {"type": "ephemeral"}}]}],
  "system": [
    {"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude."},
    {"type": "text", "text": "You are a file search specialist for Claude Code..."}
  ],
  "tools": []
}

文件搜索专家（Explore Agent）的预热。

• 用的是 haiku，便宜快，子智能体不需要太强的模型
• tools 是空的，预热不带工具
• cache_control 是 ephemeral，Anthropic 的缓存机制，后续相同请求可以复用
• max_tokens 是 32000

第 3 条：Plan Agent 预热

{
  "model": "glm-4.7",
  "max_tokens": 32000,
  "messages": [{"role": "user", "content": [{"type": "text", "text": "Warmup"}]}],
  "system": [
    {"type": "text", "text": "You are Claude Code, Anthropic's official CLI for Claude."},
    {"type": "text", "text": "You are a software architect and planning specialist for Claude Code..."}
  ],
  "tools": []
}

架构师（Plan Agent）的预热。跟第 2 条结构一样，但 model 换成了 glm-4.7（这里是手动设置了第三方模型）。搜文件用 haiku 够了，规划架构得用更强的模型。

第 4 条：真实的用户请求

{
  "model": "glm-4.7",
  "max_tokens": 32000,
  "messages": [{"role": "user", "content": [{"type": "text", "text": "我有什么skills"}]}],
  "system": [{"type": "text", "text": "You are a Claude agent, built on Anthropic's Claude Agent SDK."}],
  "tools": [/* Task, Bash, Glob, Grep, Read, Edit, Write, Skill, ... */]
}

这条才是真正处理用户输入的。

• system prompt 变了，主智能体叫 "Claude agent"，子智能体叫 "Claude Code"
• tools 列表，默认是 18个工具，token 比较多
• Skill 工具的 description 里塞了 available_skills，上一篇说的那个注入点

第 5 条：Explore Agent 再次预热

{
  "model": "claude-haiku-4-5-20251001",
  "max_tokens": 21333,
  "messages": [{"role": "user", "content": [{"type": "text", "text": "Warmup"}]}],
  "system": [/* 和第 2 条一样 */],
  "tools": [],
  "temperature": 1
}

这里又预热了一次？但仔细看参数不太一样：

参数第 2 条第 5 条
max_tokens3200021333
temperature无1

参数	第 2 条	第 5 条
max_tokens	32000	21333
temperature	无	1

具体是什么作用，不太清楚。或者修改了默认配置的缘故，本来应该是 haiku、sonnet、opus 的预热？有懂的朋友可以告知一二。

Skill Tool 的提示词结构

现在来看最关心的部分：Skill Tool 的提示词长啥样。

在第 4 条日志里面，可以找到 Skill 这个工具的完整定义：

{
  "name": "Skill",
  "description": "Execute a skill within the main conversation\n\n<skills_instructions>\nWhen users ask you to perform tasks, check if any of the available skills below can help complete the task more effectively. Skills provide specialized capabilities and domain knowledge.\n\nHow to invoke:\n- Use this tool with the skill name only (no arguments)\n- Examples:\n  - `skill: \"pdf\"` - invoke the pdf skill\n  - `skill: \"xlsx\"` - invoke the xlsx skill\n  - `skill: \"ms-office-suite:pdf\"` - invoke using fully qualified name\n\nImportant:\n- When a skill is relevant, you must invoke this tool IMMEDIATELY as your first action\n- NEVER just announce or mention a skill in your text response without actually calling this tool\n- This is a BLOCKING REQUIREMENT: invoke the relevant Skill tool BEFORE generating any other response about the task\n- Only use skills listed in <available_skills> below\n- Do not invoke a skill that is already running\n- Do not use this tool for built-in CLI commands (like /help, /clear, etc.)\n</skills_instructions>\n\n<available_skills>\n<skill>\n<name>\nfeishu-docx\n</name>\n<description>\nExport Feishu/Lark cloud documents to Markdown. Supports docx, sheets, bitable, and wiki. Use this skill when you need to read, analyze, or reference content from Feishu knowledge base. (project)\n</description>\n<location>\nmanaged\n</location>\n</skill>\n</available_skills>\n",
  "input_schema": {
    "type": "object",
    "properties": {
      "skill": {
        "type": "string",
        "description": "The skill name (no arguments). E.g., \"pdf\" or \"xlsx\""
      }
    },
    "required": ["skill"]
  }
}

把 description 单独拎出来，格式化一下：

Execute a skill within the main conversation

<skills_instructions>
When users ask you to perform tasks, check if any of the available skills below can help complete the task more effectively. Skills provide specialized capabilities and domain knowledge.

How to invoke:
- Use this tool with the skill name only (no arguments)
- Examples:
  - `skill: "pdf"` - invoke the pdf skill
  - `skill: "xlsx"` - invoke the xlsx skill
  - `skill: "ms-office-suite:pdf"` - invoke using fully qualified name

Important:
- When a skill is relevant, you must invoke this tool IMMEDIATELY as your first action
- NEVER just announce or mention a skill in your text response without actually calling this tool
- This is a BLOCKING REQUIREMENT: invoke the relevant Skill tool BEFORE generating any other response about the task
- Only use skills listed in <available_skills> below
- Do not invoke a skill that is already running
- Do not use this tool for built-in CLI commands (like /help, /clear, etc.)
</skills_instructions>

<available_skills>
<skill>
<name>feishu-docx</name>
<description>Export Feishu/Lark cloud documents to Markdown. Supports docx, sheets, bitable, and wiki. Use this skill when you need to read, analyze, or reference content from Feishu knowledge base. (project)</description>
<location>managed</location>
</skill>
</available_skills>

翻译一下关键部分：

在主对话中执行一个 Skill
<skills_instructions>
当用户要求你执行任务时，检查下方是否有可用的 Skill 能更高效地完成该任务。Skill 提供专门的能力和领域知识。
调用方式：
- 仅使用 Skill 名称调用该工具（不传参数）
- 示例：
  - skill: “pdf” —— 调用 pdf skill
  - skill: “xlsx” —— 调用 xlsx skill
  - skill: “ms-office-suite:pdf” —— 使用完整限定名调用
重要说明：
- 一旦某个 Skill 适用，必须立即作为第一步调用该工具
- 不允许只在文本回复中提到 Skill 而不实际调用工具
- 这是强制要求：在生成任何与任务相关的回复之前，必须先调用相应的 Skill
- 只能使用下方 `<available_skills>` 中列出的 Skill
- 不要调用已经在运行中的 Skill
- 不要将该工具用于内置 CLI 命令（如 /help、/clear 等）
</skills_instructions>
...

从这份提示词可以看出几个设计思路。

强制触发机制

提示词里面反复强调 "IMMEDIATELY"、"BLOCKING REQUIREMENT"、"BEFORE generating any other response"，用强硬的措辞来约束模型行为。

元信息嵌入到 Tool Description

<available_skills>这部分内容被完整地塞到了 Skill Tool 的 description 里面。模型在收到用户请求的时候，可以直接从 tool description 里面读到当前有哪些可用的 skill，每个 skill 能干什么。每个 skill 包含三个字段：name、description、location。

极简的 input_schema

{
  "type": "object",
  "properties": {
    "skill": {
      "type": "string",
      "description": "The skill name (no arguments). E.g., \"pdf\" or \"xlsx\""
    }
  },
  "required": ["skill"]
}

Skill Tool 只有一个参数：skill 名称。这说明 Skill Tool 的职责就是触发 skill 内容的注入，不负责传递具体的执行参数。具体的执行逻辑和参数传递是在后续的工具调用里完成的。

这个设计很符合「单一职责」原则。Skill Tool 只做一件事：把 SKILL.md 的内容注入到上下文里。至于用户想用这个 skill 做什么，让模型读完 skill 内容后自己决定怎么调用其他工具。

一些启发

Token 消耗比想象中大

Claude Code 还是比较耗 token 的，之前执行了一个任务消耗上百万 token。从日志里面可以看到，光是 tools 定义就非常长，每个工具的 description 都写得很详细。在实际场景可以根据需求移除一些非必要的工具。

预热机制的成本

多智能体预热这个机制虽然提升了响应速度，但也带来了额外的开销。每次启动 Agent 都会触发多个子智能体的预热请求，即使你这次对话根本用不到文件搜索或者架构规划功能。

Prompt Caching 的使用

日志里面频繁出现 cache_control: {"type": "ephemeral"}，这是 Anthropic 的 prompt caching 功能。通过缓存 system prompt 和工具定义，可以在后续请求中复用这些内容，减少延迟和成本。

如果你在自己的系统里接入 Claude API，也应该考虑启用 prompt caching，特别是当你的 system prompt 或工具定义比较长的时候。

不同子智能体用不同模型

从日志里可以看到，Explore Agent 用的是 haiku，Plan Agent 用的是 glm-4.7。这种差异化配置是一个很好的成本优化、发挥模型特长的思路：

• 简单任务（文件搜索）用便宜的小模型
• 复杂任务（架构规划）用更强的大模型
• 根据任务特点选择合适的模型，而不是一刀切

这个思路在设计自己的多智能体系统时可以借鉴。

53AI，企业落地大模型首选服务商

产品：场景落地咨询+大模型应用平台+行业解决方案

承诺：免费POC验证，效果达标后再合作。零风险落地应用大模型，已交付160+中大型企业