|
|
|
|
|
# Multi-Agent Management Architecture
|
|
|
|
|
|
|
|
|
|
|
|
**Version:** 2.0
|
|
|
|
|
|
**Date:** 2026-03-06
|
|
|
|
|
|
**Maintainer:** Eason
|
|
|
|
|
|
|
|
|
|
|
|
**Current state:** Main (陈医生) is the only active agent. The life agent (张大师) has been removed. All agents are defined in `agents.yaml`.
|
|
|
|
|
|
|
|
|
|
|
|
> **For Main Agent (陈医生):** 你是 Hub Agent。本文档既是架构参考,也是你的操作手册。
|
|
|
|
|
|
> 当用户要求创建、维护、排查或移除 Agent 时,跳转到对应的 Playbook 章节(11-14),按步骤执行。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 1. Hub-and-Spoke Model
|
|
|
|
|
|
|
|
|
|
|
|
Main agent acts as the **memory hub** -- responsible for publishing shared knowledge,
|
|
|
|
|
|
maintaining the project registry, and onboarding new agents. All other agents (local or
|
|
|
|
|
|
remote) are **spokes** that consume shared memory and contribute their own private/project
|
|
|
|
|
|
memories.
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
Main Agent (Hub) - defined in agents.yaml
|
|
|
|
|
|
|-- publish_knowledge() --> Qdrant mem0_v4_shared (visibility=public)
|
|
|
|
|
|
|-- publish_knowledge(project_id=X) --> (visibility=project)
|
|
|
|
|
|
|-- maintain project_registry.yaml
|
|
|
|
|
|
|-- maintain docs & best practices
|
|
|
|
|
|
|
|
|
|
|
|
|
+-- Local Spokes (same server, same Qdrant)
|
|
|
|
|
|
| |-- local-cli: main (openclaw gateway)
|
|
|
|
|
|
| |-- local-systemd: <agent_id> (port 187XX)
|
|
|
|
|
|
|
|
|
|
|
|
|
+-- Remote Spokes (Tailscale VPN -> Qdrant)
|
|
|
|
|
|
+-- remote-http: <agent_id> (health via HTTP)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 2. Memory Visibility Model
|
|
|
|
|
|
|
|
|
|
|
|
All agents share one Qdrant collection: `mem0_v4_shared`.
|
|
|
|
|
|
Isolation is achieved through metadata fields.
|
|
|
|
|
|
|
|
|
|
|
|
| Visibility | Who can read | Metadata filter |
|
|
|
|
|
|
|-----------|-------------|-----------------|
|
|
|
|
|
|
| public | All agents | `visibility=public` |
|
|
|
|
|
|
| project | Same project members | `visibility=project, project_id=X` |
|
|
|
|
|
|
| private | Only the writing agent | `visibility=private, agent_id=X` |
|
|
|
|
|
|
|
|
|
|
|
|
Project membership is defined in `skills/mem0-integration/project_registry.yaml`.
|
|
|
|
|
|
Main agent is registered as member of all projects for audit access.
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 3. Agent Registry (agents.yaml)
|
|
|
|
|
|
|
|
|
|
|
|
**Path:** `/root/.openclaw/workspace/agents.yaml`
|
|
|
|
|
|
|
|
|
|
|
|
This file is the **single source of truth** for all agent definitions. All tooling reads from it dynamically:
|
|
|
|
|
|
|
|
|
|
|
|
| Consumer | Purpose |
|
|
|
|
|
|
|----------|---------|
|
|
|
|
|
|
| `deploy.sh` | Service management (start/stop/debug/fix) |
|
|
|
|
|
|
| `agent-monitor.js` | Health monitoring |
|
|
|
|
|
|
| `local_search.py` | Agent lookup for search |
|
|
|
|
|
|
| `memory_cleanup.py` | Agent-aware cleanup |
|
|
|
|
|
|
| `onboard.sh` / `offboard.sh` | Add/remove agents |
|
|
|
|
|
|
|
|
|
|
|
|
**Helper script:** `scripts/parse_agents.py` parses agents.yaml for bash/JS:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 scripts/parse_agents.py list # list agent IDs
|
|
|
|
|
|
python3 scripts/parse_agents.py info <id> # get agent info as KEY=VALUE (shell-safe quoted)
|
|
|
|
|
|
python3 scripts/parse_agents.py services # list all agents with service details (tab-separated)
|
|
|
|
|
|
python3 scripts/parse_agents.py ids # space-separated agent IDs (for bash loops)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
> **Note:** The `info` subcommand outputs single-quoted values (`KEY='value'`) that are safe
|
|
|
|
|
|
> for `eval` in bash, even when values contain spaces, CJK characters, or special shell
|
|
|
|
|
|
> metacharacters. The `services` subcommand uses tab (`\t`) as the delimiter to avoid
|
|
|
|
|
|
> collisions with `|` or spaces in command strings.
|
|
|
|
|
|
|
|
|
|
|
|
**Agent types supported:**
|
|
|
|
|
|
|
|
|
|
|
|
| Type | Description |
|
|
|
|
|
|
|------|-------------|
|
|
|
|
|
|
| `local-cli` | Managed via `openclaw gateway` CLI (main agent) |
|
|
|
|
|
|
| `local-systemd` | Managed via user-level systemd unit |
|
|
|
|
|
|
| `remote-http` | Remote agent checked via HTTP health endpoint |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 4. Agent Lifecycle
|
|
|
|
|
|
|
|
|
|
|
|
### 4.1 Onboard (create)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd /root/.openclaw/workspace/templates
|
|
|
|
|
|
./onboard.sh <agent_id> <agent_name> <project_id> [qdrant_host]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Fully automated.** This script:
|
|
|
|
|
|
|
|
|
|
|
|
1. Creates workspace at `agents/<agent_id>-workspace/` (IDENTITY.md, SOUL.md, mem0 config)
|
|
|
|
|
|
2. Registers the agent in `agents.yaml`
|
|
|
|
|
|
3. Registers in `project_registry.yaml`
|
|
|
|
|
|
4. For local agents: generates systemd service + env file, installs, enables
|
|
|
|
|
|
5. Reloads `openclaw-agent-monitor` so it picks up the new agent
|
|
|
|
|
|
|
|
|
|
|
|
**Examples:**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./onboard.sh crypto "CryptoBot" crypto # local agent
|
|
|
|
|
|
./onboard.sh remote1 "RemoteBot" advert 100.115.94.1 # remote agent
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Remaining manual steps (local-systemd):** Edit IDENTITY.md, create `~/.openclaw-<agent_id>/openclaw.json`, then start the service.
|
|
|
|
|
|
|
|
|
|
|
|
### 4.2 Offboard (retire)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd /root/.openclaw/workspace/templates
|
|
|
|
|
|
./offboard.sh <agent_id> [--keep-data]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Options:**
|
|
|
|
|
|
|
|
|
|
|
|
- (default) Full removal: stops service, removes from agents.yaml and project_registry, deletes workspace, profile, and Qdrant memories
|
|
|
|
|
|
- `--keep-data` Unregister only: keeps workspace and profile files
|
|
|
|
|
|
|
|
|
|
|
|
**Examples:**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./offboard.sh crypto # full removal
|
|
|
|
|
|
./offboard.sh crypto --keep-data # keep files, just unregister
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
The main (hub) agent cannot be offboarded.
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 5. Knowledge Publishing
|
|
|
|
|
|
|
|
|
|
|
|
Main agent can publish best practices and shared knowledge to Qdrant:
|
|
|
|
|
|
|
|
|
|
|
|
**Via Python:**
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
from mem0_client import mem0_client
|
|
|
|
|
|
await mem0_client.start()
|
|
|
|
|
|
await mem0_client.publish_knowledge(
|
|
|
|
|
|
content="Always use EnvironmentFile= in systemd services for upgrade safety",
|
|
|
|
|
|
category="knowledge",
|
|
|
|
|
|
visibility="public",
|
|
|
|
|
|
)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Via CLI:**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 mem0_integration.py publish '{"content":"...", "visibility":"public"}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Via Node.js plugin (index.js):**
|
|
|
|
|
|
|
|
|
|
|
|
The `publish` action is available through the same spawn interface used by `search` and `add`.
|
|
|
|
|
|
|
|
|
|
|
|
### Visibility Guidelines
|
|
|
|
|
|
|
|
|
|
|
|
| Content type | Visibility | Example |
|
|
|
|
|
|
|-------------|-----------|---------|
|
|
|
|
|
|
| System best practices | public | "Use deploy.sh fix-service after upgrades" |
|
|
|
|
|
|
| Project-specific knowledge | project | "{agent_id} uses Google Calendar API" |
|
|
|
|
|
|
| User preferences | private | "User prefers dark mode" |
|
|
|
|
|
|
| API keys, secrets | NEVER store | Use environment variables |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 6. Cold Start Preload
|
|
|
|
|
|
|
|
|
|
|
|
When a new session starts, `session_init.py` calls `cold_start_search()` which
|
|
|
|
|
|
retrieves memories in three phases:
|
|
|
|
|
|
|
|
|
|
|
|
1. **Phase 0 (public)**: Best practices, shared config -- available to all agents
|
|
|
|
|
|
2. **Phase 1 (project)**: Project-specific guidelines -- based on agent's project membership
|
|
|
|
|
|
3. **Phase 2 (private)**: Agent's own recent context
|
|
|
|
|
|
|
|
|
|
|
|
Results are deduplicated, ordered by phase priority, and injected into the System Prompt.
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 7. Local Agent Configuration
|
|
|
|
|
|
|
|
|
|
|
|
Local agents run on the same server and connect to Qdrant at `localhost:6333`.
|
|
|
|
|
|
|
|
|
|
|
|
Key configuration points:
|
|
|
|
|
|
- `openclaw.json`: `collection_name: "mem0_v4_shared"` (NOT agent-specific collections)
|
|
|
|
|
|
- `systemd/<agent_id>-gateway.env`: contains `MEM0_DASHSCOPE_API_KEY`
|
|
|
|
|
|
- `EnvironmentFile=` in the service unit references the env file
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 8. Remote Agent Configuration
|
|
|
|
|
|
|
|
|
|
|
|
Remote agents run on different servers and connect to Qdrant via Tailscale.
|
|
|
|
|
|
|
|
|
|
|
|
### Prerequisites
|
|
|
|
|
|
|
|
|
|
|
|
1. Tailscale installed and joined to the same tailnet on both servers
|
|
|
|
|
|
2. Qdrant accessible at the hub server's Tailscale IP (e.g., `100.115.94.1:6333`)
|
|
|
|
|
|
3. Tailscale ACL allows the remote server to access port 6333
|
|
|
|
|
|
|
|
|
|
|
|
### Environment File
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
MEM0_QDRANT_HOST=100.115.94.1
|
|
|
|
|
|
MEM0_DASHSCOPE_API_KEY=sk-...
|
|
|
|
|
|
OPENAI_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
|
|
|
|
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### Onboarding
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./onboard.sh remote1 "RemoteBot" advert 100.115.94.1
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
The 4th argument sets `MEM0_QDRANT_HOST` in the generated env file. The agent is automatically added to `agents.yaml` and the monitor picks it up on reload.
|
|
|
|
|
|
|
|
|
|
|
|
### Monitoring
|
|
|
|
|
|
|
|
|
|
|
|
The monitor reads from `agents.yaml` dynamically. Remote agents (type `remote-http`) are checked via their `health_url`. Remote agents cannot be auto-started from the hub; the monitor will only alert on failure.
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 9. Agent Monitor Service Hardening
|
|
|
|
|
|
|
|
|
|
|
|
The `openclaw-agent-monitor.service` runs as a system-level systemd service with the following security constraints:
|
|
|
|
|
|
|
|
|
|
|
|
| Directive | Value | Purpose |
|
|
|
|
|
|
|-----------|-------|---------|
|
|
|
|
|
|
| `ProtectSystem` | `strict` | Mounts entire filesystem read-only |
|
|
|
|
|
|
| `ProtectHome` | `read-only` | Home directory is read-only |
|
|
|
|
|
|
| `ReadWritePaths` | `/root/.openclaw/workspace/logs /run/user/0` | Whitelist for writes: log output + D-Bus for `systemctl --user` |
|
|
|
|
|
|
| `NoNewPrivileges` | `true` | Cannot gain new privileges |
|
|
|
|
|
|
| `MemoryMax` | `512M` | OOM guard |
|
|
|
|
|
|
| `CPUQuota` | `20%` | Prevent monitor from starving other processes |
|
|
|
|
|
|
|
|
|
|
|
|
**Why `/run/user/0`?** The monitor uses `systemctl --user start/stop` to manage gateway processes, which requires D-Bus access at the user runtime directory. Without this path whitelisted, `ProtectSystem=strict` would block the D-Bus socket and prevent auto-restart.
|
|
|
|
|
|
|
|
|
|
|
|
**Initialization order in `agent-monitor.js`:**
|
|
|
|
|
|
|
|
|
|
|
|
1. `loadConfig()` -- read `openclaw.json`
|
|
|
|
|
|
2. `ensureLogDir()` -- create log directory (must happen before any `this.log()` calls)
|
|
|
|
|
|
3. `loadMonitoredServices()` -- parse `agents.yaml` (may log errors on failure)
|
|
|
|
|
|
4. Signal handlers + start monitoring loop
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 10. File Reference
|
|
|
|
|
|
|
|
|
|
|
|
| File | Purpose |
|
|
|
|
|
|
|------|---------|
|
|
|
|
|
|
| `agents.yaml` | Single source of truth for agent registry |
|
|
|
|
|
|
| `scripts/parse_agents.py` | Parses agents.yaml for bash/JS consumers |
|
|
|
|
|
|
| `skills/mem0-integration/mem0_client.py` | Core client: search, write, publish, cold_start |
|
|
|
|
|
|
| `skills/mem0-integration/mem0_integration.py` | CLI interface: init, search, add, publish, cold_start |
|
|
|
|
|
|
| `skills/mem0-integration/session_init.py` | Three-phase cold start hook |
|
|
|
|
|
|
| `skills/mem0-integration/project_registry.yaml` | Agent-to-project membership |
|
|
|
|
|
|
| `templates/onboard.sh` | Automated agent onboarding (adds to agents.yaml, installs service, reloads monitor) |
|
|
|
|
|
|
| `templates/offboard.sh` | Clean one-command agent removal |
|
|
|
|
|
|
| `templates/agent-workspace/` | Workspace file templates |
|
|
|
|
|
|
| `templates/systemd/` | Service and env file templates |
|
|
|
|
|
|
| `agent-monitor.js` | Config-driven health monitor (reads agents.yaml) |
|
|
|
|
|
|
| `deploy.sh` | Service management (reads agents.yaml) |
|
|
|
|
|
|
| `docs/EXTENSIONS_ARCHITECTURE.md` | Systemd, monitor, upgrade safety |
|
|
|
|
|
|
| `docs/MEMORY_ARCHITECTURE.md` | Four-layer memory system detail |
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
# PART B: Operational Playbooks (面向 Main Agent 的操作手册)
|
|
|
|
|
|
|
|
|
|
|
|
> **以下 Section 11-14 是 Main Agent (陈医生) 在对话中执行操作的分步指南。**
|
|
|
|
|
|
> 当用户说"帮我创建一个新 agent"、"检查 agent 状态"、"清理记忆"、"移除 agent"时,
|
|
|
|
|
|
> 按对应章节执行。每个步骤标注了需要向用户提问 (🗣️) 还是你自己执行 (🔧)。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 11. Playbook: Interactive Onboarding (创建新 Agent)
|
|
|
|
|
|
|
|
|
|
|
|
当用户说"我要创建新 agent"或类似意图时,按以下流程执行。
|
|
|
|
|
|
|
|
|
|
|
|
### 11.1 信息收集阶段 (🗣️ 向用户逐步提问)
|
|
|
|
|
|
|
|
|
|
|
|
按以下顺序收集信息。每轮只问 1-2 个问题,不要一次全部列出。
|
|
|
|
|
|
|
|
|
|
|
|
**第 1 轮:基本身份**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
1. agent_id — 英文小写标识符,无空格(例: crypto, hr_bot, advert_pm)
|
|
|
|
|
|
2. agent_name — 显示名称,可以是中文(例: "加密分析师", "HR助手")
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "新 Agent 的 ID 用什么?(英文小写,如 crypto)显示名称叫什么?"
|
|
|
|
|
|
|
|
|
|
|
|
**第 2 轮:角色定义**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
3. role — 一句话角色描述(例: "加密货币行情分析与投资策略助手")
|
|
|
|
|
|
4. scope — 职责范围,2-5 条(例: "行情监控、策略分析、风险提醒")
|
|
|
|
|
|
5. personality — 性格/沟通风格(例: "专业严谨、数据驱动、适度幽默")
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "这个 Agent 的角色是什么?负责哪些事情?你希望它是什么样的沟通风格?"
|
|
|
|
|
|
|
|
|
|
|
|
**第 3 轮:项目归属**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
6. project_id — 所属项目(已有: advert, global; 或新建)
|
|
|
|
|
|
7. new_project — 如果是新项目,需要项目名称和描述
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
先展示已有项目: 读取 `skills/mem0-integration/project_registry.yaml`
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "这个 Agent 属于哪个项目?现有项目有: advert(广告业务)、global(全局)。需要新建项目吗?"
|
|
|
|
|
|
|
|
|
|
|
|
**第 4 轮:Telegram Bot**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
8. bot_token — Telegram Bot Token
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
如果用户还没有 token,给出创建指引:
|
|
|
|
|
|
```
|
|
|
|
|
|
创建 Telegram Bot 的步骤:
|
|
|
|
|
|
1. 在 Telegram 搜索 @BotFather,发送 /newbot
|
|
|
|
|
|
2. 按提示输入 bot 显示名称(如: CryptoBot)
|
|
|
|
|
|
3. 输入 bot username(必须以 Bot 结尾,如: openclaw_crypto_bot)
|
|
|
|
|
|
4. BotFather 会返回一个 token(格式: 1234567890:ABCdef...)
|
|
|
|
|
|
5. 把这个 token 发给我
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**第 5 轮:部署方式**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
9. deploy_type — 本地(localhost) 还是远程(Tailscale IP)
|
|
|
|
|
|
10. qdrant_host — 远程时需要 Tailscale IP 地址
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "这个 Agent 部署在本服务器还是远程?如果远程,Tailscale IP 是多少?"
|
|
|
|
|
|
|
|
|
|
|
|
### 11.2 端口分配规则
|
|
|
|
|
|
|
|
|
|
|
|
| 端口 | 用途 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| 18789 | main agent (已占用) |
|
|
|
|
|
|
| 18790 | 第 2 个本地 agent |
|
|
|
|
|
|
| 18791 | 第 3 个本地 agent |
|
|
|
|
|
|
| ... | 依次递增 |
|
|
|
|
|
|
|
|
|
|
|
|
🔧 自动分配: 读取 `agents.yaml` 中已注册 agent 数量,port = 18789 + count。
|
|
|
|
|
|
远程 agent 不需要在本服务器分配端口。
|
|
|
|
|
|
|
|
|
|
|
|
### 11.3 执行阶段 (🔧 按顺序执行)
|
|
|
|
|
|
|
|
|
|
|
|
收集完信息后,按以下步骤执行。**每步完成后向用户报告进度。**
|
|
|
|
|
|
|
|
|
|
|
|
**Step 1: 运行 onboard.sh**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd /root/.openclaw/workspace/templates
|
|
|
|
|
|
# 本地 agent:
|
|
|
|
|
|
./onboard.sh <agent_id> "<agent_name>" <project_id>
|
|
|
|
|
|
# 远程 agent:
|
|
|
|
|
|
./onboard.sh <agent_id> "<agent_name>" <project_id> <qdrant_host>
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
这会自动完成: 创建 workspace、注册 agents.yaml、注册 project_registry、
|
|
|
|
|
|
生成 systemd service/env、重载 monitor。
|
|
|
|
|
|
|
|
|
|
|
|
**Step 2: 填充 IDENTITY.md**
|
|
|
|
|
|
|
|
|
|
|
|
写入 `agents/<agent_id>-workspace/IDENTITY.md`:
|
|
|
|
|
|
|
|
|
|
|
|
```markdown
|
|
|
|
|
|
# Agent Identity
|
|
|
|
|
|
|
|
|
|
|
|
- **Name**: <agent_name>
|
|
|
|
|
|
- **Agent ID**: <agent_id>
|
|
|
|
|
|
- **Role**: <用户提供的角色描述>
|
|
|
|
|
|
- **Project**: <project_id>
|
|
|
|
|
|
- **Created**: <今天日期>
|
|
|
|
|
|
|
|
|
|
|
|
## Scope
|
|
|
|
|
|
<用户提供的职责范围,每条一行>
|
|
|
|
|
|
|
|
|
|
|
|
## Communication Style
|
|
|
|
|
|
<用户提供的性格/沟通风格描述>
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Step 3: 填充 SOUL.md**
|
|
|
|
|
|
|
|
|
|
|
|
写入 `agents/<agent_id>-workspace/SOUL.md`:
|
|
|
|
|
|
|
|
|
|
|
|
```markdown
|
|
|
|
|
|
# <agent_name> - Core Personality
|
|
|
|
|
|
|
|
|
|
|
|
## Beliefs
|
|
|
|
|
|
<从用户描述的角色推导 2-3 条核心信念>
|
|
|
|
|
|
|
|
|
|
|
|
## Behavior Rules
|
|
|
|
|
|
- Follow shared best practices from public memory
|
|
|
|
|
|
- Respect memory visibility boundaries (public/project/private)
|
|
|
|
|
|
- Log important decisions to memory for team awareness
|
|
|
|
|
|
<根据角色补充 2-3 条特定行为准则>
|
|
|
|
|
|
|
|
|
|
|
|
## Communication Style
|
|
|
|
|
|
<用户描述的沟通风格,展开为 2-3 句具体描述>
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Step 4: 如果是新项目,注册到 project_registry.yaml**
|
|
|
|
|
|
|
|
|
|
|
|
如果第 3 轮收集的是新项目,编辑 `skills/mem0-integration/project_registry.yaml`:
|
|
|
|
|
|
|
|
|
|
|
|
```yaml
|
|
|
|
|
|
<project_id>:
|
|
|
|
|
|
name: "<项目名称>"
|
|
|
|
|
|
description: "<项目描述>"
|
|
|
|
|
|
members:
|
|
|
|
|
|
- "<agent_id>"
|
|
|
|
|
|
- "main"
|
|
|
|
|
|
owner: "main"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Step 5: 创建 openclaw.json**
|
|
|
|
|
|
|
|
|
|
|
|
这是最关键的步骤。从 main 的配置复制并修改:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cp /root/.openclaw/openclaw.json /root/.openclaw-<agent_id>/openclaw.json
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**必须修改的字段(字段映射表):**
|
|
|
|
|
|
|
|
|
|
|
|
| JSON 路径 | main 的值 | 新 agent 应改为 |
|
|
|
|
|
|
|-----------|----------|----------------|
|
|
|
|
|
|
| `agents.list[0].id` | `"main"` | `"<agent_id>"` |
|
|
|
|
|
|
| `agents.defaults.workspace` | `"/root/.openclaw/workspace"` | `"/root/.openclaw/workspace/agents/<agent_id>-workspace"` |
|
|
|
|
|
|
| `channels.telegram.botToken` | `"7047245486:AAF..."` | `"<用户提供的 token>"` |
|
|
|
|
|
|
| `gateway.port` | `18789` | `<分配的端口>` |
|
|
|
|
|
|
| `gateway.controlUi.allowedOrigins[2]` | `"http://100.115.94.1:18789"` | **`"http://100.115.94.1:<端口>"`(必须与该 agent 的 gateway.port 一致)** |
|
|
|
|
|
|
| `gateway.controlUi.dangerouslyDisableDeviceAuth` | `true` | **保持 `true`**(否则从 Tailscale IP 打开 Control UI 会提示 "device identity required",需先配对浏览器设备) |
|
|
|
|
|
|
| `gateway.controlUi.allowInsecureAuth` | 无或 `true` | **建议 `true`**(与 main 一致;HTTP 非 localhost 访问时需此选项才能绕过浏览器无法生成设备密钥的限制,否则仍会报 device identity required) |
|
|
|
|
|
|
| `plugins.entries.mem0-integration.config.agent_id` | `"main"` | `"<agent_id>"` |
|
|
|
|
|
|
|
|
|
|
|
|
⚠️ **Control UI 访问**:若 `allowedOrigins[2]` 未改为该 agent 的端口,用户访问 `http://100.115.94.1:<端口>/` 会报 **"origin not allowed"**,无法打开配对页。创建 openclaw.json 时务必同时改 `gateway.port` 与 `gateway.controlUi.allowedOrigins[2]`。
|
|
|
|
|
|
|
|
|
|
|
|
**保持不变的字段(继承 main 的配置):**
|
|
|
|
|
|
|
|
|
|
|
|
- `models` — 使用相同的模型配置
|
|
|
|
|
|
- `auth` — 使用相同的认证
|
|
|
|
|
|
- `memory` — 使用 qmd 后端
|
|
|
|
|
|
- `skills` — 继承 tavily, find-skills-robin, mem0-integration
|
|
|
|
|
|
- `plugins.load.paths` — 可保留或改为 agent 自己的 skills 路径
|
|
|
|
|
|
|
|
|
|
|
|
**Step 6: 启动服务**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 本地 agent:
|
|
|
|
|
|
export XDG_RUNTIME_DIR=/run/user/$(id -u)
|
|
|
|
|
|
systemctl --user start openclaw-gateway-<agent_id>.service
|
|
|
|
|
|
|
|
|
|
|
|
# 检查状态:
|
|
|
|
|
|
systemctl --user status openclaw-gateway-<agent_id>.service
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**Step 7: 验证**
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./deploy.sh health
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 11.4 完成 Checklist (🔧 逐项确认后告知用户)
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
□ onboard.sh 运行成功
|
|
|
|
|
|
□ agents.yaml 已注册
|
|
|
|
|
|
□ project_registry.yaml 已注册(含 main 作为成员)
|
|
|
|
|
|
□ IDENTITY.md 已填充角色/职责
|
|
|
|
|
|
□ SOUL.md 已填充性格/行为准则
|
|
|
|
|
|
□ openclaw.json 已创建,字段已修改:
|
|
|
|
|
|
□ agents.list[0].id = <agent_id>
|
|
|
|
|
|
□ agents.defaults.workspace 指向 agent workspace
|
|
|
|
|
|
□ channels.telegram.botToken 使用新 token
|
|
|
|
|
|
□ gateway.port 不与其他 agent 冲突
|
|
|
|
|
|
□ gateway.controlUi.allowedOrigins[2] = "http://100.115.94.1:<该 agent 端口>"(否则 Control UI 会报 origin not allowed)
|
|
|
|
|
|
□ gateway.controlUi.dangerouslyDisableDeviceAuth = true(否则会报 device identity required)
|
|
|
|
|
|
□ gateway.controlUi.allowInsecureAuth = true(从 Tailscale/LAN IP 用 HTTP 打开 UI 时建议开启)
|
|
|
|
|
|
□ plugins.entries.mem0-integration.config.agent_id 正确
|
|
|
|
|
|
□ systemd 服务已启动
|
|
|
|
|
|
□ deploy.sh health 全部通过
|
|
|
|
|
|
□ Telegram Bot 配对完成(用户确认)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 11.6 Telegram 配对说明 (用户必须自行完成)
|
|
|
|
|
|
|
|
|
|
|
|
新建 agent 使用 `dmPolicy: pairing` 时,**配对必须由用户在 Telegram 与 Control UI 中完成**,main agent 无法代为执行。
|
|
|
|
|
|
|
|
|
|
|
|
**标准步骤(提供给用户):**
|
|
|
|
|
|
|
|
|
|
|
|
1. 在 Telegram 中搜索该 agent 的 Bot(如 @xxx_bot),发送 `/start`
|
|
|
|
|
|
2. 打开该 agent 的 Control UI:`http://100.115.94.1:<端口>/`(端口即该 agent 的 gateway.port)
|
|
|
|
|
|
3. 若出现 **"origin not allowed"**:说明该 agent 的 `openclaw.json` 中 `gateway.controlUi.allowedOrigins[2]` 未设为 `http://100.115.94.1:<端口>`,需由 main agent 修正后重启该 agent 服务
|
|
|
|
|
|
4. 若出现 **"device identity required"**:因通过 `http://100.115.94.1:<端口>` 访问时浏览器不在安全上下文(仅 HTTPS 或 localhost 才是),无法生成设备密钥。可选两种方式:
|
|
|
|
|
|
- **方式 A(推荐)**:用 **带 token 的 URL** 打开 Control UI:`http://100.115.94.1:<端口>/?token=<该 agent 的 gateway.auth.token>`,即可绕过设备校验。
|
|
|
|
|
|
- **方式 B**:在该 agent 的 `openclaw.json` 的 `gateway.controlUi` 下同时设置 `"dangerouslyDisableDeviceAuth": true` 与 `"allowInsecureAuth": true`,保存后重启该 agent 服务(仅限受信任内网环境)。
|
|
|
|
|
|
5. 在 Control UI 进入 Devices/配对页面,确认 **Telegram 用户**的配对请求(此为 Bot 与用户的配对,与上一步的浏览器设备无关)
|
|
|
|
|
|
6. 配对完成后,Bot 即可正常回复私信
|
|
|
|
|
|
|
|
|
|
|
|
main agent 可做:提供上述步骤、给出 Control UI 地址、在出现 origin 错误时修正配置并提醒用户重试。
|
|
|
|
|
|
|
|
|
|
|
|
### 11.5 对话范例
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
用户: 帮我创建一个新的加密货币分析 agent
|
|
|
|
|
|
Eason: 好的。先确定基本信息:
|
|
|
|
|
|
1. Agent ID 用什么?建议用英文小写,如 crypto
|
|
|
|
|
|
2. 显示名称叫什么?
|
|
|
|
|
|
用户: ID 就叫 crypto,名称叫"加密分析师"
|
|
|
|
|
|
Eason: 收到。这个 agent 的具体角色是什么?负责哪些事?
|
|
|
|
|
|
沟通风格方面,你希望它专业严谨还是轻松活泼?
|
|
|
|
|
|
用户: 负责行情监控、策略分析和风险提醒。专业但简洁。
|
|
|
|
|
|
Eason: 了解。项目归属方面,现有项目有:
|
|
|
|
|
|
- advert (广告业务)
|
|
|
|
|
|
- global (全局共享)
|
|
|
|
|
|
需要新建一个 crypto 项目吗?
|
|
|
|
|
|
用户: 新建 crypto 项目
|
|
|
|
|
|
Eason: 好。接下来需要一个独立的 Telegram Bot Token。
|
|
|
|
|
|
如果还没有,步骤如下:
|
|
|
|
|
|
1. Telegram 搜索 @BotFather → 发 /newbot
|
|
|
|
|
|
2. 设置名称和 username
|
|
|
|
|
|
3. 把返回的 token 发给我
|
|
|
|
|
|
用户: 1234567890:ABCdef...
|
|
|
|
|
|
Eason: 最后确认:部署在本服务器还是远程?
|
|
|
|
|
|
用户: 本服务器
|
|
|
|
|
|
Eason: 好的,开始部署...
|
|
|
|
|
|
[执行 Step 1-7]
|
|
|
|
|
|
部署完成!Checklist 全部通过。
|
|
|
|
|
|
请在 Telegram 给 @你的bot 发条消息测试一下。
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 12. Playbook: Agent Maintenance & Troubleshooting (日常维护)
|
|
|
|
|
|
|
|
|
|
|
|
### 12.1 日常健康检查 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
当用户问"agent 状态怎么样"或在 Heartbeat 中执行:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./deploy.sh health # 全局健康检查
|
|
|
|
|
|
./deploy.sh status # 详细服务状态
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
解读结果并向用户报告:
|
|
|
|
|
|
- ✓ 表示正常
|
|
|
|
|
|
- ✗ 表示服务未运行 → 尝试重启
|
|
|
|
|
|
- ⚠ 表示资源告警 → 报告具体数值
|
|
|
|
|
|
|
|
|
|
|
|
### 12.2 Agent 未响应排查流程
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
Step 1: 检查服务是否运行
|
|
|
|
|
|
systemctl --user status openclaw-gateway-<agent_id>.service
|
|
|
|
|
|
|
|
|
|
|
|
Step 2: 如果 inactive → 检查日志
|
|
|
|
|
|
journalctl --user -u openclaw-gateway-<agent_id> -n 50 --no-pager
|
|
|
|
|
|
|
|
|
|
|
|
Step 3: 常见问题及解决:
|
|
|
|
|
|
- "Address already in use" → 端口冲突,检查 openclaw.json 的 gateway.port
|
|
|
|
|
|
- "Cannot find module" → openclaw 版本问题,运行 ./deploy.sh fix-service
|
|
|
|
|
|
- "ECONNREFUSED" → Qdrant 未启动,检查 docker ps | grep qdrant
|
|
|
|
|
|
- "API key invalid" → 检查 systemd/<agent_id>-gateway.env 中的 API key
|
|
|
|
|
|
- **"origin not allowed"(Control UI 打不开)** → 该 agent 的 openclaw.json 中 gateway.controlUi.allowedOrigins[2] 必须为 "http://100.115.94.1:<该 agent 的端口>";修改后执行 systemctl --user restart openclaw-gateway-<agent_id>.service
|
|
|
|
|
|
- **"device identity required"(Control UI 要求设备配对)** → 通过 HTTP 访问非 localhost 时,浏览器无法生成设备密钥。解决:① 用带 token 的 URL:`http://100.115.94.1:<端口>/?token=<gateway.auth.token>`;或 ② 在该 agent 的 openclaw.json 的 gateway.controlUi 下同时设置 `"dangerouslyDisableDeviceAuth": true` 与 `"allowInsecureAuth": true`,保存后重启该 agent 服务(仅限受信任内网)。
|
|
|
|
|
|
|
|
|
|
|
|
Step 4: 重启
|
|
|
|
|
|
systemctl --user restart openclaw-gateway-<agent_id>.service
|
|
|
|
|
|
|
|
|
|
|
|
Step 5: 仍然失败 → 收集日志给用户
|
|
|
|
|
|
journalctl --user -u openclaw-gateway-<agent_id> -n 200 --no-pager > /tmp/agent-debug.log
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 12.3 OpenClaw 升级后恢复
|
|
|
|
|
|
|
|
|
|
|
|
当用户通过 UI 升级 OpenClaw 后,自定义配置可能丢失:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./deploy.sh fix-service # 重新注入 EnvironmentFile 到 systemd 服务
|
|
|
|
|
|
./deploy.sh restart # 重启所有服务使配置生效
|
|
|
|
|
|
./deploy.sh health # 确认恢复正常
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
向用户报告修复结果。
|
|
|
|
|
|
|
|
|
|
|
|
### 12.4 查看 Agent 列表
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 scripts/parse_agents.py list
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
输出格式: `<id>\t<type>\t<name>`,向用户展示时格式化为表格。
|
|
|
|
|
|
|
|
|
|
|
|
### 12.5 调试模式
|
|
|
|
|
|
|
|
|
|
|
|
当用户需要调试某个 agent:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./deploy.sh debug-stop # 停止所有服务(含 monitor,防止自动重启)
|
|
|
|
|
|
# ... 用户调试 ...
|
|
|
|
|
|
./deploy.sh debug-start # 恢复所有服务
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 13. Playbook: Memory Management (记忆管理)
|
|
|
|
|
|
|
|
|
|
|
|
### 13.1 发布共享知识 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
当用户说"把这条最佳实践共享给所有 agent":
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 skills/mem0-integration/mem0_integration.py publish \
|
|
|
|
|
|
'{"content":"<知识内容>", "visibility":"public", "category":"knowledge"}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
当用户说"把这个信息共享给某项目":
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 skills/mem0-integration/mem0_integration.py publish \
|
|
|
|
|
|
'{"content":"<内容>", "visibility":"project", "project_id":"<项目>", "category":"knowledge"}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 13.2 查看记忆统计
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 skills/mem0-integration/memory_cleanup.py --dry-run
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
向用户报告各 agent、各类型、各可见性的记忆数量。
|
|
|
|
|
|
|
|
|
|
|
|
### 13.3 清理过期记忆
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 先 dry-run 查看:
|
|
|
|
|
|
python3 skills/mem0-integration/memory_cleanup.py --dry-run --max-age-days 90
|
|
|
|
|
|
|
|
|
|
|
|
# 确认后执行:
|
|
|
|
|
|
python3 skills/mem0-integration/memory_cleanup.py --max-age-days 90
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 13.4 为新 Agent 预载知识 (Cold Start)
|
|
|
|
|
|
|
|
|
|
|
|
新 agent 创建后,可以为其预载公共知识:
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 skills/mem0-integration/mem0_integration.py cold_start \
|
|
|
|
|
|
'{"agent_id":"<agent_id>", "user_id":"wang_yuanzhang", "top_k":10}'
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 13.5 检查记忆可见性
|
|
|
|
|
|
|
|
|
|
|
|
当用户质疑"某 agent 能看到这条记忆吗":
|
|
|
|
|
|
|
|
|
|
|
|
1. 确定记忆的 `visibility` 和 `project_id`
|
|
|
|
|
|
2. 读 `project_registry.yaml` 确认 agent 是否在该 project 的 members 列表中
|
|
|
|
|
|
3. 可见性规则:
|
|
|
|
|
|
- `public` → 所有 agent 可见
|
|
|
|
|
|
- `project` → 只有 project members 可见
|
|
|
|
|
|
- `private` → 只有写入者可见
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 14. Playbook: Interactive Offboarding (移除 Agent)
|
|
|
|
|
|
|
|
|
|
|
|
### 14.1 信息收集 (🗣️)
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
需要收集:
|
|
|
|
|
|
1. agent_id — 要移除的 agent ID
|
|
|
|
|
|
2. keep_data — 是否保留数据(workspace、profile、Qdrant 记忆)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "要移除哪个 Agent?需要保留它的数据吗?(保留可以日后恢复)"
|
|
|
|
|
|
|
|
|
|
|
|
🔧 先展示当前 agent 列表:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
python3 scripts/parse_agents.py list
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 14.2 安全检查 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
□ 确认不是 main agent(main 不能被移除)
|
|
|
|
|
|
□ 确认 agent 存在于 agents.yaml
|
|
|
|
|
|
□ 向用户再次确认: "确定要移除 <agent_name> (<agent_id>) 吗?这将停止服务并从注册表中删除。"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 14.3 执行 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd /root/.openclaw/workspace/templates
|
|
|
|
|
|
|
|
|
|
|
|
# 完全移除(含数据):
|
|
|
|
|
|
./offboard.sh <agent_id>
|
|
|
|
|
|
|
|
|
|
|
|
# 仅注销(保留数据):
|
|
|
|
|
|
./offboard.sh <agent_id> --keep-data
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
脚本会交互确认 (y/N),需要输入 y 确认。
|
|
|
|
|
|
|
|
|
|
|
|
### 14.4 完成后报告
|
|
|
|
|
|
|
|
|
|
|
|
向用户报告:
|
|
|
|
|
|
```
|
|
|
|
|
|
Agent <agent_name> (<agent_id>) 已移除:
|
|
|
|
|
|
- 服务: 已停止并卸载
|
|
|
|
|
|
- agents.yaml: 已移除
|
|
|
|
|
|
- project_registry: 已移除
|
|
|
|
|
|
- Workspace: <已删除 / 已保留>
|
|
|
|
|
|
- Qdrant 记忆: <已删除 / 已保留>
|
|
|
|
|
|
- Monitor: 已重载
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
运行 `./deploy.sh health` 确认系统正常。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 15. Playbook: Backup & Cleanup (备份与清理)
|
|
|
|
|
|
|
|
|
|
|
|
### 15.1 备份命令
|
|
|
|
|
|
|
|
|
|
|
|
| 命令 | 说明 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| `./deploy.sh backup` | 完整备份 (workspace + Qdrant snapshot + agent profiles + docker-compose) |
|
|
|
|
|
|
| `./deploy.sh backup quick` | 快速备份 (仅 workspace 文件,不含 Qdrant) |
|
|
|
|
|
|
| `bash scripts/10-create-backup.sh` | 独立备份脚本(包含 mem0 配置 + agents.yaml + Qdrant snapshot) |
|
|
|
|
|
|
|
|
|
|
|
|
**备份保留策略**: 自动保留最近 10 个备份,旧备份自动删除。
|
|
|
|
|
|
|
|
|
|
|
|
**备份目录结构**:
|
|
|
|
|
|
```
|
|
|
|
|
|
/root/.openclaw/backups/<TIMESTAMP>/
|
|
|
|
|
|
├── workspace.tar.gz # Layer 1+2 所有 MD 和配置文件
|
|
|
|
|
|
├── .openclaw__openclaw.json # main agent profile
|
|
|
|
|
|
├── .openclaw-tongge__openclaw.json # 副 agent profiles (如有)
|
|
|
|
|
|
├── docker-compose.yml # Qdrant docker 配置
|
|
|
|
|
|
├── qdrant-mem0_v4_shared.snapshot # Layer 4 向量数据 (full 模式)
|
|
|
|
|
|
├── qdrant-point-count.txt # 备份时的 point 数量 (用于校验)
|
|
|
|
|
|
└── manifest.txt # 备份清单
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 15.2 恢复命令
|
|
|
|
|
|
|
|
|
|
|
|
| 命令 | 说明 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| `./deploy.sh restore <backup-dir>` | 恢复 workspace 文件 + agent profiles |
|
|
|
|
|
|
| `./deploy.sh restore-qdrant <snapshot-file>` | 恢复 Qdrant 向量数据 |
|
|
|
|
|
|
|
|
|
|
|
|
恢复前会自动创建 quick 备份,且需要交互确认 (y/N)。
|
|
|
|
|
|
|
|
|
|
|
|
### 15.3 记忆清理
|
|
|
|
|
|
|
|
|
|
|
|
清理脚本: `skills/mem0-integration/memory_cleanup.py`
|
|
|
|
|
|
|
|
|
|
|
|
| 命令 | 说明 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| `python3 memory_cleanup.py --dry-run` | 统计各维度记忆 + 列出过期记忆数量 (不删除) |
|
|
|
|
|
|
| `python3 memory_cleanup.py --execute --max-age-days 90` | 实际删除过期记忆 |
|
|
|
|
|
|
|
|
|
|
|
|
**保留策略** (与 `mem0_client.py` 的 `EXPIRATION_MAP` 对齐):
|
|
|
|
|
|
- `session`: 7 天后过期
|
|
|
|
|
|
- `chat_summary`: 30 天后过期
|
|
|
|
|
|
- `preference`: 永久保留
|
|
|
|
|
|
- `knowledge`: 永久保留
|
|
|
|
|
|
|
|
|
|
|
|
`--max-age-days` 作为强制上限: 超过该天数的 session/chat_summary 无论 expiration_date 均会删除。preference 和 knowledge 永远不会被自动清理。
|
|
|
|
|
|
|
|
|
|
|
|
**审计日志**: 每次清理写入 `logs/security/memory-cleanup-<date>.log`。
|
|
|
|
|
|
|
|
|
|
|
|
### 15.4 自动化 Cron
|
|
|
|
|
|
|
|
|
|
|
|
安装脚本: `scripts/setup-cron.sh`
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
./scripts/setup-cron.sh # 安装定时任务
|
|
|
|
|
|
./scripts/setup-cron.sh remove # 移除定时任务
|
|
|
|
|
|
./scripts/setup-cron.sh status # 查看当前任务
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**定时计划**:
|
|
|
|
|
|
| 时间 | 任务 |
|
|
|
|
|
|
|------|------|
|
|
|
|
|
|
| 每天 02:00 | `./deploy.sh backup` — 完整备份 |
|
|
|
|
|
|
| 每周日 03:00 | `memory_cleanup.py --execute --max-age-days 90` — 清理过期记忆 |
|
|
|
|
|
|
|
|
|
|
|
|
日志输出到 `logs/system/cron-backup.log` 和 `logs/system/cron-cleanup.log`。
|
|
|
|
|
|
|
|
|
|
|
|
### 15.5 交互式备份恢复流程 (🗣️)
|
|
|
|
|
|
|
|
|
|
|
|
当用户要求备份或恢复时的对话流程:
|
|
|
|
|
|
|
|
|
|
|
|
**备份**:
|
|
|
|
|
|
```
|
|
|
|
|
|
陈医生: "需要创建什么类型的备份?"
|
|
|
|
|
|
1. 完整备份 (含 Qdrant 向量数据,推荐)
|
|
|
|
|
|
2. 快速备份 (仅 workspace 文件)
|
|
|
|
|
|
|
|
|
|
|
|
→ 执行相应命令,报告备份路径和 Qdrant point 数量
|
|
|
|
|
|
→ 建议: 重大变更前务必执行完整备份
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**恢复**:
|
|
|
|
|
|
```
|
|
|
|
|
|
陈医生: "需要恢复到哪个备份?"
|
|
|
|
|
|
→ 列出 /root/.openclaw/backups/ 下可用备份
|
|
|
|
|
|
→ 展示 manifest.txt 内容让用户确认
|
|
|
|
|
|
→ 先恢复 workspace: ./deploy.sh restore <dir>
|
|
|
|
|
|
→ 如有 Qdrant 快照且用户确认: ./deploy.sh restore-qdrant <file>
|
|
|
|
|
|
→ 恢复后执行 ./deploy.sh restart + ./deploy.sh health
|
|
|
|
|
|
→ 对比 qdrant-point-count.txt 与当前 point 数量
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 16. Playbook: Server Migration (服务器迁移)
|
|
|
|
|
|
|
|
|
|
|
|
### 16.1 迁移前准备 (🗣️)
|
|
|
|
|
|
|
|
|
|
|
|
信息收集:
|
|
|
|
|
|
```
|
|
|
|
|
|
需要确认:
|
|
|
|
|
|
1. target_server — 目标服务器地址 (IP 或 Tailscale hostname)
|
|
|
|
|
|
2. target_user — 目标服务器用户名 (通常 root)
|
|
|
|
|
|
3. keep_source — 迁移后是否保留源服务器数据
|
|
|
|
|
|
4. tailscale — 目标服务器是否已加入 Tailscale 网络
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
示例提问: "要迁移到哪台服务器?是否已安装 Tailscale?迁移后源服务器数据要保留吗?"
|
|
|
|
|
|
|
|
|
|
|
|
### 16.2 源服务器: 完整备份 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
cd /root/.openclaw/workspace
|
|
|
|
|
|
./deploy.sh backup
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
确认备份完整性:
|
|
|
|
|
|
```bash
|
|
|
|
|
|
ls -la /root/.openclaw/backups/<TIMESTAMP>/
|
|
|
|
|
|
cat /root/.openclaw/backups/<TIMESTAMP>/manifest.txt
|
|
|
|
|
|
cat /root/.openclaw/backups/<TIMESTAMP>/qdrant-point-count.txt
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 16.3 传输到目标服务器 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
BACKUP_DIR="/root/.openclaw/backups/<TIMESTAMP>"
|
|
|
|
|
|
TARGET="root@<target_server>"
|
|
|
|
|
|
|
|
|
|
|
|
rsync -avzP "$BACKUP_DIR" "$TARGET:/root/.openclaw/backups/"
|
|
|
|
|
|
rsync -avzP /root/.openclaw/workspace/ "$TARGET:/root/.openclaw/workspace/" --exclude='.git' --exclude='logs'
|
|
|
|
|
|
rsync -avzP /root/.openclaw/openclaw.json "$TARGET:/root/.openclaw/"
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
副 agent profiles (如有):
|
|
|
|
|
|
```bash
|
|
|
|
|
|
for d in /root/.openclaw-*/; do
|
|
|
|
|
|
agent_name=$(basename "$d")
|
|
|
|
|
|
rsync -avzP "$d" "$TARGET:/root/$agent_name/"
|
|
|
|
|
|
done
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 16.4 目标服务器: 安装基础设施 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 1. 安装 Node.js (v24+) 和 OpenClaw
|
|
|
|
|
|
curl -fsSL https://get.openclaw.com | bash
|
|
|
|
|
|
|
|
|
|
|
|
# 2. 安装 Docker + Qdrant
|
|
|
|
|
|
mkdir -p /opt/mem0-center && cd /opt/mem0-center
|
|
|
|
|
|
# 从备份恢复 docker-compose.yml
|
|
|
|
|
|
cp /root/.openclaw/backups/<TIMESTAMP>/docker-compose.yml .
|
|
|
|
|
|
docker compose up -d
|
|
|
|
|
|
|
|
|
|
|
|
# 3. 等待 Qdrant 启动
|
|
|
|
|
|
sleep 5
|
|
|
|
|
|
curl -sf http://localhost:6333/collections | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin),indent=2))"
|
|
|
|
|
|
|
|
|
|
|
|
# 4. 恢复 Qdrant 数据
|
|
|
|
|
|
cd /root/.openclaw/workspace
|
|
|
|
|
|
./deploy.sh restore-qdrant /root/.openclaw/backups/<TIMESTAMP>/qdrant-mem0_v4_shared.snapshot
|
|
|
|
|
|
|
|
|
|
|
|
# 5. 安装 Python 依赖
|
|
|
|
|
|
pip3 install qdrant-client mem0ai pyyaml
|
|
|
|
|
|
|
|
|
|
|
|
# 6. 安装系统服务
|
|
|
|
|
|
./deploy.sh install
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 16.5 验证 (🔧)
|
|
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
|
# 服务状态
|
|
|
|
|
|
./deploy.sh health
|
|
|
|
|
|
|
|
|
|
|
|
# Qdrant 数据对比
|
|
|
|
|
|
curl -sf http://localhost:6333/collections/mem0_v4_shared | python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Points: {d[\"result\"][\"points_count\"]}')"
|
|
|
|
|
|
# 对比源服务器的 qdrant-point-count.txt
|
|
|
|
|
|
|
|
|
|
|
|
# 记忆检索测试
|
|
|
|
|
|
cd /root/.openclaw/workspace/skills/mem0-integration
|
|
|
|
|
|
python3 mem0_integration.py search "测试查询" --agent-id main
|
|
|
|
|
|
|
|
|
|
|
|
# Telegram 连通性
|
|
|
|
|
|
# 在 Telegram 上发送测试消息给 bot
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 16.6 完成后 Checklist
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
□ 所有 agent 服务正常运行 (deploy.sh health 全绿)
|
|
|
|
|
|
□ Qdrant point 数量与源服务器一致
|
|
|
|
|
|
□ 记忆检索正常返回结果
|
|
|
|
|
|
□ Telegram bot 回复正常
|
|
|
|
|
|
□ Cron 定时任务已安装 (scripts/setup-cron.sh install)
|
|
|
|
|
|
□ 环境变量已设置 (MEM0_DASHSCOPE_API_KEY 等)
|
|
|
|
|
|
□ Monitor 服务运行中 (systemctl status openclaw-agent-monitor)
|
|
|
|
|
|
□ Tailscale 已加入 (如需远程 agent 连接)
|
|
|
|
|
|
□ 源服务器数据处理 (保留/清理)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 16.7 回滚计划
|
|
|
|
|
|
|
|
|
|
|
|
如果迁移失败:
|
|
|
|
|
|
```
|
|
|
|
|
|
1. 在源服务器上 ./deploy.sh debug-start 恢复服务
|
|
|
|
|
|
2. 目标服务器上 ./deploy.sh debug-stop 停止所有服务
|
|
|
|
|
|
3. 排查问题后重新尝试
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## 17. 技能/插件管理 SOP
|
|
|
|
|
|
|
|
|
|
|
|
### 17.1 Skill vs Plugin 选型指南
|
|
|
|
|
|
|
|
|
|
|
|
OpenClaw 有两套扩展加载机制,选型规则如下:
|
|
|
|
|
|
|
|
|
|
|
|
| 类型 | 加载方式 | 配置位置 | 适用场景 |
|
|
|
|
|
|
|------|----------|----------|----------|
|
|
|
|
|
|
| **内置 Skill** | OpenClaw 自动发现 | `skills.entries.<id>` | Clawhub 市场内置技能(如 `find-skills-robin`) |
|
|
|
|
|
|
| **自定义 Plugin** | 手动指定路径 | `plugins.load.paths` + `plugins.entries.<id>` | 自研工具(tavily)、lifecycle hook(mem0)、任何需要自定义代码的扩展 |
|
|
|
|
|
|
|
|
|
|
|
|
**判断规则:**
|
|
|
|
|
|
|
|
|
|
|
|
- 如果只需要开关一个 Clawhub 内置功能 -> `skills.entries`
|
|
|
|
|
|
- 如果有自己的 `openclaw.plugin.json` + `index.js` -> `plugins`
|
|
|
|
|
|
- 如果需要 lifecycle hook(对话前后自动执行) -> 必须 `plugins`
|
|
|
|
|
|
- **不要**同时在 `skills.entries` 和 `plugins.entries` 中重复启用同一个技能
|
|
|
|
|
|
|
|
|
|
|
|
**Plugin 必需文件:**
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
/root/.openclaw/workspace/skills/<id>/
|
|
|
|
|
|
├── openclaw.plugin.json # 插件清单(必需)
|
|
|
|
|
|
├── index.js # 工具/hook 实现(必需)
|
|
|
|
|
|
├── CONFIG_SUMMARY.md # 配置文档(推荐)
|
|
|
|
|
|
└── TEST_REPORT.md # 测试报告(推荐)
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 17.2 分阶段发布流程
|
|
|
|
|
|
|
|
|
|
|
|
所有新技能必须先在 main agent 上验证通过,再部署到辅 agent。
|
|
|
|
|
|
|
|
|
|
|
|
**Stage 1 -- 安装代码**
|
|
|
|
|
|
|
|
|
|
|
|
1. 将技能代码放入 `/root/.openclaw/workspace/skills/<id>/`
|
|
|
|
|
|
2. 确保有 `openclaw.plugin.json`(含 id、name、kind、main、tools/configSchema)
|
|
|
|
|
|
3. 确保有 `index.js`(导出 `register`/`activate` 和工具定义)
|
|
|
|
|
|
|
|
|
|
|
|
**Stage 2 -- Main 启用并测试**
|
|
|
|
|
|
|
|
|
|
|
|
1. 在 main 的 `openclaw.json` 中:
|
|
|
|
|
|
- `plugins.load.paths` 添加 `"/root/.openclaw/workspace/skills/<id>"`
|
|
|
|
|
|
- `plugins.entries.<id>` 设为 `{ "enabled": true }` (如有 config 一并填写)
|
|
|
|
|
|
2. 重启 main gateway:`systemctl --user restart openclaw-gateway.service`
|
|
|
|
|
|
3. 检查日志确认插件加载:`journalctl --user -u openclaw-gateway -n 50 | grep -i <id>`
|
|
|
|
|
|
4. 通过 Telegram 对 main 发消息测试功能
|
|
|
|
|
|
|
|
|
|
|
|
**Stage 3 -- 审核**
|
|
|
|
|
|
|
|
|
|
|
|
按 `templates/SKILL_REVIEW_TEMPLATE.md` 完成审核,包括:
|
|
|
|
|
|
|
|
|
|
|
|
| 审核维度 | 检查内容 |
|
|
|
|
|
|
|----------|----------|
|
|
|
|
|
|
| 安全 | API key 管理(环境变量 vs 硬编码)、网络请求范围、文件读写、权限提升 |
|
|
|
|
|
|
| 功能 | agent 能否正确调用、结果是否准确、错误处理是否合理 |
|
|
|
|
|
|
| 性能 | 响应时间、并发调用、对 agent 整体延迟的影响 |
|
|
|
|
|
|
| 最佳实践 | 推荐参数、适用场景、已知限制,记录到 `CONFIG_SUMMARY.md` |
|
|
|
|
|
|
|
|
|
|
|
|
**Stage 4 -- 推送辅 Agent**
|
|
|
|
|
|
|
|
|
|
|
|
1. 技能代码在共享 workspace 下,无需复制
|
|
|
|
|
|
2. 在辅 agent 的 `openclaw.json` 中:
|
|
|
|
|
|
- `plugins.load.paths` 添加相同路径
|
|
|
|
|
|
- `plugins.entries.<id>` 启用(注意 agent-specific 配置,如 mem0 的 `agent_id` 必须改为该 agent 的 ID)
|
|
|
|
|
|
3. 重启辅 agent gateway
|
|
|
|
|
|
4. 验证插件加载和功能正常
|
|
|
|
|
|
|
|
|
|
|
|
### 17.3 当前技能清单
|
|
|
|
|
|
|
|
|
|
|
|
| 技能 ID | 类型 | 加载方式 | Main | Tongge | 说明 |
|
|
|
|
|
|
|---------|------|----------|------|--------|------|
|
|
|
|
|
|
| `find-skills-robin` | 内置 | `skills.entries` | 启用 | 启用 | Clawhub 技能发现 |
|
|
|
|
|
|
| `mem0-integration` | lifecycle | `skills.entries` + `plugins` | 启用 | 启用 | 记忆系统(agent_id 需区分) |
|
|
|
|
|
|
| `tavily` | tool | `plugins` | 启用 | 启用 | AI 搜索(共享 API key) |
|
|
|
|
|
|
| `active-learning` | 内置 | `skills.entries` | -- | 启用 | 主动学习(仅 tongge) |
|
|
|
|
|
|
| `memos-cloud-openclaw-plugin` | 内置 | `plugins.entries` | 启用 | 启用 | Memos 云插件 |
|
|
|
|
|
|
| `qwen-portal-auth` | 内置 | `plugins.entries` | 启用 | 启用 | Qwen Portal OAuth |
|
|
|
|
|
|
|
|
|
|
|
|
> **维护要求:** 每次新增或移除技能时,同步更新此表。
|
|
|
|
|
|
|
|
|
|
|
|
### 17.4 Agent-Specific 配置注意事项
|
|
|
|
|
|
|
|
|
|
|
|
部分 plugin 在不同 agent 间需要不同配置:
|
|
|
|
|
|
|
|
|
|
|
|
| Plugin | 需区分的配置项 | Main | Tongge |
|
|
|
|
|
|
|--------|---------------|------|--------|
|
|
|
|
|
|
| `mem0-integration` | `config.agent_id` | `"main"` | `"tongge"` |
|
|
|
|
|
|
| `mem0-integration` | `config.user_id` | `"wang院长"` | `"wang院长"` |
|
|
|
|
|
|
|
|
|
|
|
|
部署到新 agent 时,务必检查以上配置项。
|
|
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
|
|
## Changelog
|
|
|
|
|
|
|
|
|
|
|
|
| Version | Date | Changes |
|
|
|
|
|
|
|---------|------|---------|
|
|
|
|
|
|
| 1.0 | 2026-03-06 | Initial version: hub-and-spoke model, templates, remote support |
|
|
|
|
|
|
| 1.1 | 2026-03-06 | Config-driven architecture: agents.yaml as single registry; automated onboard/offboard; parse_agents.py helper; life agent (张大师) removed; main is only active agent |
|
|
|
|
|
|
| 1.2 | 2026-03-06 | Code review + bug fixes (7 items): `parse_agents.py` output now shell-safe quoted; `agent-monitor.js` constructor ordering fixed (ensureLogDir before loadMonitoredServices) and fallback uses full `openclaw` path; `deploy.sh` switched `grep -qP` to `grep -qE` for portability; `offboard.sh` Qdrant delete uses `FilterSelector` wrapper; `onboard.sh`/`offboard.sh` inline Python rewritten with `sys.argv` to prevent shell injection; `openclaw-agent-monitor.service` added `/run/user/0` to `ReadWritePaths` for D-Bus access; removed corrupted trailing bytes in `offboard.sh` |
|
|
|
|
|
|
| 2.0 | 2026-03-06 | Added operational playbooks (Part B): Interactive Onboarding (Sec 11, with conversation flow, field mapping table, port allocation, checklist, dialog example), Agent Maintenance & Troubleshooting (Sec 12), Memory Management (Sec 13), Interactive Offboarding (Sec 14). Document restructured into Part A (Architecture Reference) and Part B (Operational Playbooks). |
|
|
|
|
|
|
| 2.1 | 2026-03-06 | Added Backup & Cleanup Playbook (Sec 15): backup/restore commands, memory cleanup with retention policy, cron automation, interactive dialogue flow. Added Server Migration Playbook (Sec 16): step-by-step migration with pre/post checklist, Qdrant snapshot recovery, rollback plan. |
|
|
|
|
|
|
| 2.2 | 2026-03-09 | Added Skill/Plugin Management SOP (Sec 17): skill vs plugin selection guide, staged release workflow (main-first), current skill inventory, agent-specific config notes. Unified tavily loading to plugin mode across all agents. |
|