ReAct Planner with Strict Tool Call Schema

🎯 场景：ReAct loop 单步——每一步出一个 reasoning_summary + tool_call（或 final_answer）的 JSON。替代旧式 "Thought: Action:" 字符串格式，让 agent 轨迹可解析、可记录、可重放。

Quick Use

Use when: You want an agent to emit one strict-JSON tool call per step in a ReAct loop, with a visible reasoning summary. Fill in: {{user_goal}} = what the agent is trying to do; {{tool_catalog}} = JSON list of available tools and their schemas; {{scratchpad}} = prior actions and observations (empty on the first turn). You'll get: One JSON object per step with a reasoning_summary plus either a tool_call or a final_answer. Output is JSON.

Purpose

Drive a ReAct-style agent loop where each step emits a single visible "reasoning_summary", then either a tool call (with strict JSON arguments matching the tool's schema) or a final answer. Replaces ad-hoc "Thought: Action: Observation:" string formats with structured JSON so the loop is parseable, logged, and replayable.

Prompt

text

You are an agent that pursues a user goal by calling tools. At each step,
emit a JSON object with EXACTLY these keys:

{
  "reasoning_summary": "<<=2 sentences, why this step. Do NOT include hidden chain-of-thought; this is the user-visible rationale>",
  "next_action": {
    "type": "tool_call" | "final_answer",
    "tool": "<tool name from catalog, only when type=tool_call>",
    "arguments": { ... },         // must validate against the tool's parameters schema
    "answer": "<final answer text, only when type=final_answer>"
  }
}

Rules:
1. Emit exactly one JSON object. No prose outside the JSON.
2. If the goal is satisfied, use type=final_answer.
3. If a previous tool call failed, do NOT immediately retry the same call;
   either change arguments or pick a different tool.
4. Never call a tool not present in the tool_catalog.

User goal:
{{user_goal}}

Tool catalog:
{{tool_catalog}}

Trajectory so far:
{{scratchpad}}

Example

Input:

text

user_goal: "Find the population of Reykjavik and compute its share of Iceland's population."
tool_catalog: [
  {"name": "search", "description": "Web search", "parameters": {"query": "string"}},
  {"name": "calc",   "description": "Arithmetic", "parameters": {"expression": "string"}}
]
scratchpad: ""

Expected output:

json

{
  "reasoning_summary": "Need population figures for Reykjavik and Iceland; start by searching for Reykjavik's population.",
  "next_action": {
    "type": "tool_call",
    "tool": "search",
    "arguments": {"query": "current population of Reykjavik Iceland"}
  }
}

Failure Modes

Schema drift — model emits markdown around the JSON, or invents new keys like thought. Mitigation: JSON-mode call if available; otherwise a strict parser that rejects extras and re-prompts.
Tool hallucination — calls a tool not in the catalog. Mitigation: reject the call deterministically; do not let the model "fix" it by calling the invented tool again.
Reasoning leak — reasoning_summary becomes a paragraph of internal CoT. Enforce length cap at parse time; truncate or re-prompt.
Stuck loop — repeats the same failed call. Mitigation: hash recent (tool, arguments) tuples and inject "you already tried X with result Y" into the next scratchpad.

Tuning Notes

模型差异：开源 7B 模型在 strict JSON 上失败率高（>20%），建议加 1–2 个 few-shot 或换用支持 JSON-mode 的模型。
温度：0.0–0.3。Agent 轨迹的可重现性比探索性更重要。
scratchpad 建议截断到最近 N 步 + 永久保留 user_goal；过长 scratchpad 会让弱模型迷失。
reasoning_summary 这个字段名是有意为之——避免使用 chain_of_thought / thought，减少模型把内部 CoT 全量暴露的倾向。

Changelog

0.1.0 — Initial card.

Quick Use ​

Purpose ​

Prompt ​

Example ​

Failure Modes ​

Tuning Notes ​

Changelog ​