diff --git a/docs/AGENT_DEPLOYMENT_BEST_PRACTICES.md b/docs/AGENT_DEPLOYMENT_BEST_PRACTICES.md new file mode 100644 index 0000000..4c62949 --- /dev/null +++ b/docs/AGENT_DEPLOYMENT_BEST_PRACTICES.md @@ -0,0 +1,597 @@ +# Agent 部署最佳实践 + +**版本:** 1.0 +**创建日期:** 2026-02-23 +**作者:** Eason (陈医生) 👨‍⚕️ +**基于:** 张大师 (life) 部署经验总结 + +--- + +## 📋 部署前检查清单 + +### 1. 架构规划 + +- [ ] **确定 Agent 类型**: 独立 Gateway vs 路由模式 + - 独立 Gateway:隔离性好,需要单独配置所有 Skills + - 路由模式:共享配置,资源节省 +- [ ] **端口规划**: 确保端口不冲突(主 Gateway 18789,张大师 18790) +- [ ] **数据库隔离**: Mem0 collection 命名(如 `mem0_v4_life`) + +### 2. 配置文件结构 + +``` +新 Agent 部署结构: +├── ~/.openclaw-{agent-id}/ # 独立配置目录 +│ ├── openclaw.json # Gateway 配置 +│ ├── agents/ # Agent 配置 +│ ├── credentials/ # 凭证文件 +│ └── telegram/ # Telegram 状态 +├── ~/.config/systemd/user/ # systemd 服务 +│ └── openclaw-gateway-{agent-id}.service +└── ~/.openclaw/workspace/ # 共享 workspace + ├── agents/{agent-id}-agent.json # Agent 定义 + ├── skills/ # Skills(共享) + └── logs/agents/{agent-id}/ # 日志目录 +``` + +--- + +## ⚠️ 常见错误与解决方案 + +### 错误 1: Skill 配置字段错误 + +**问题:** +```json +// ❌ 错误 - openclaw.json 中不支持 description 字段 +"skills": { + "entries": { + "chinese-almanac": { + "enabled": true, + "description": "黄历查询" // 不支持! + } + } +} +``` + +**错误信息:** +``` +skills.entries.xxx: Unrecognized key: "description" +Config invalid +``` + +**正确配置:** +```json +// ✅ 正确 - 只使用支持的字段 +"skills": { + "entries": { + "chinese-almanac": { + "enabled": true, + "config": { // 技能特定配置放在 config 中 + "tavily_api_key": "tvly-xxx" + } + } + } +} +``` + +**教训:** +- `openclaw.json` 中 Skill 配置只支持 `enabled` 和 `config` 字段 +- `description`、`name` 等元数据应放在 `skill.json` 中 +- 配置验证失败会导致 Gateway 无法启动 + +--- + +### 错误 2: Python Skill 在 Node.js 环境中调用 + +**问题:** +```json +// ❌ 错误 - Python 脚本无法在 Node.js Gateway 中直接调用 +{ + "name": "google-calendar", // Python 实现 + "handler": "google_calendar.handle_calendar_command" +} +``` + +**症状:** +- Skill 加载失败 +- Agent 报告"功能未配置"或"需要 MCP 连接" +- 命令行测试成功,但 Gateway 中失败 + +**解决方案 A: 创建 Node.js 包装器(推荐)** +``` +skills/google-calendar-node/ +├── calendar.js // Node.js 接口 +│ └── spawn('python3', ['google_calendar.py', command]) +└── skill.json + └── "handler": "calendar.getCalendarInfo" // Node.js 模块 +``` + +**解决方案 B: 纯 Node.js 实现** +```javascript +// 使用 googleapis npm 包 +const { google } = require('googleapis'); +``` + +**教训:** +- OpenClaw Gateway 是 Node.js 环境 +- Python Skills 需要 Node.js 包装器才能集成 +- 测试时不要只测试 Python 脚本,要测试 Gateway 集成 + +--- + +### 错误 3: Systemd Watchdog 配置 + +**问题:** +```ini +# ❌ 错误 - OpenClaw 不支持 systemd watchdog 通知 +[Service] +WatchdogSec=60s +``` + +**症状:** +``` +Watchdog timeout (limit 1min)! +Killing process with signal SIGABRT +Main process exited, code=dumped, status=6/ABRT +``` + +**正确配置:** +```ini +# ✅ 正确 - 移除 WatchdogSec +[Service] +Restart=always +RestartSec=10s +MemoryMax=1G +CPUQuota=50% +# 不要设置 WatchdogSec +``` + +**教训:** +- OpenClaw Gateway 不发送 systemd watchdog 通知 +- 设置 WatchdogSec 会导致服务被误杀 +- 使用 `Restart=always` 实现自动恢复 + +--- + +### 错误 4: Gateway 绑定地址 + +**问题:** +```json +// ❌ 错误 - loopback 绑定导致 Telegram pairing 失败 +"gateway": { + "bind": "loopback" +} +``` + +**错误信息:** +``` +Error: Gateway is only bound to loopback. +Set gateway.bind=lan, enable tailscale serve, +or configure plugins.entries.device-pair.config.publicUrl. +``` + +**正确配置:** +```json +// ✅ 正确 - LAN 绑定支持 Telegram pairing +"gateway": { + "bind": "lan", + "port": 18790, + "auth": { + "mode": "token", + "token": "your-token" + } +} +``` + +**安全考虑:** +- 绑定 LAN 后,确保防火墙限制访问 +- 仅暴露 80/443 端口(通过 Nginx 反向代理) +- 使用 token 认证 + +--- + +### 错误 5: Agent 配置与 Gateway 配置不一致 + +**问题:** +```json +// life-agent.json +{ + "name": "google-calendar", // ❌ Python 版本 + "enabled": true +} + +// openclaw-life.json +{ + "skills": { + "entries": { + "google-calendar-node": { // ✅ Node.js 版本 + "enabled": true + } + } + } +} +``` + +**症状:** +- Agent 认为功能未配置 +- System prompt 与实际可用工具不符 + +**解决方案:** +```json +// ✅ 保持一致 +// life-agent.json +{ + "name": "google-calendar-node", + "enabled": true +} + +// openclaw-life.json +{ + "skills": { + "entries": { + "google-calendar-node": { + "enabled": true + } + } + } +} + +// System Prompt 中明确说明 +"## 可用工具\n\n### Google Calendar\n- 使用 google-calendar-node skill\n- 已配置完成,无需 MCP 连接" +``` + +**教训:** +- `agent.json` 中的 skills 列表必须与 `openclaw.json` 一致 +- System prompt 应准确描述可用工具 +- 更新配置后重启 Gateway + +--- + +### 错误 6: 硬编码数据 vs 动态计算 + +**问题:** +```javascript +// ❌ 错误 - 硬编码农历日期 +const query = `2026 年 2 月 24 日 农历黄历 宜忌 正月初八`; +``` + +**症状:** +- 日期变化后数据错误 +- 不同数据源返回不同结果 + +**正确做法:** +```javascript +// ✅ 正确 - 动态计算 +const springFestival = new Date('2026-02-17'); // 春节 +const lunarDay = Math.floor((targetDate - springFestival) / (1000*60*60*24)) + 1; +const lunarDateStr = `农历正月初${lunarDay}`; +``` + +**教训:** +- 避免硬编码日期、时间等动态数据 +- 使用权威数据源(API)而非内部推算 +- 在 system prompt 中强调使用工具查询 + +--- + +## 📝 标准部署流程 + +### 步骤 1: 创建配置目录 + +```bash +mkdir -p ~/.openclaw-{agent-id}/{agents,credentials,telegram} +mkdir -p ~/.openclaw/workspace/logs/agents/{agent-id}/ +``` + +### 步骤 2: 复制并修改 Gateway 配置 + +```bash +cp ~/.openclaw/openclaw.json ~/.openclaw-{agent-id}/openclaw.json +# 修改: +# - gateway.port +# - gateway.bind (lan for Telegram) +# - channels.telegram.botToken +# - skills.entries (添加/移除 skills) +``` + +### 步骤 3: 创建 systemd 服务 + +```bash +cat > ~/.config/systemd/user/openclaw-gateway-{agent-id}.service << EOF +[Unit] +Description=OpenClaw Gateway - {Agent Name} +After=network.target openclaw-gateway.service + +[Service] +Type=simple +User=root +Environment="XDG_RUNTIME_DIR=/run/user/0" +Environment="DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus" +Environment="NODE_ENV=production" +Environment="TZ=Asia/Shanghai" +WorkingDirectory=/root/.openclaw-{agent-id} +ExecStart=/www/server/nodejs/v24.13.1/bin/openclaw --profile {agent-id} gateway +Restart=always +RestartSec=10s +MemoryMax=1G +CPUQuota=50% +TimeoutStopSec=30s +StandardOutput=journal +StandardError=journal +SyslogIdentifier=openclaw-gateway-{agent-id} + +[Install] +WantedBy=default.target +EOF +``` + +**注意:** 不要设置 `WatchdogSec`! + +### 步骤 4: 创建 Agent 定义 + +```bash +cat > ~/.openclaw/workspace/agents/{agent-id}-agent.json << EOF +{ + "id": "{agent-id}", + "name": "{Agent Name}", + "role": "{Agent Role}", + "system_prompt": "你是{Agent Name},...", + "skills": [ + { + "name": "skill-name", + "enabled": true, + "config": { ... } + } + ] +} +EOF +``` + +### 步骤 5: 启用并启动服务 + +```bash +# 启用 linger(允许用户服务在后台运行) +loginctl enable-linger $(whoami) + +# 设置环境变量 +export XDG_RUNTIME_DIR=/run/user/0 +export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus" + +# 启用并启动服务 +systemctl --user daemon-reload +systemctl --user enable openclaw-gateway-{agent-id}.service +systemctl --user start openclaw-gateway-{agent-id}.service + +# 验证状态 +systemctl --user status openclaw-gateway-{agent-id}.service +journalctl --user -u openclaw-gateway-{agent-id}.service -f +``` + +### 步骤 6: 配置 Telegram Pairing + +```bash +# 发送 pairing 命令 +curl -X POST https://api.telegram.org/bot{BOT_TOKEN}/sendMessage \ + -d "chat_id={USER_CHAT_ID}" \ + -d "text=/pair {PAIRING_CODE}" + +# 验证配对状态 +cat ~/.openclaw-{agent-id}/credentials/telegram-default-allowFrom.json +``` + +### 步骤 7: 更新 Registry + +```bash +# 更新 agents/registry.md +# 添加新 Agent 信息 +``` + +### 步骤 8: 提交 Git + +```bash +cd ~/.openclaw/workspace +git add agents/{agent-id}-agent.json agents/registry.md +git commit -m "feat: 部署 {Agent Name} - {agent-id}" +``` + +--- + +## 🔧 故障排查 + +### Gateway 无法启动 + +```bash +# 检查配置 +openclaw --profile {agent-id} doctor + +# 查看日志 +journalctl --user -u openclaw-gateway-{agent-id}.service --since "10 minutes ago" + +# 检查端口 +ss -tlnp | grep {port} + +# 检查进程 +ps aux | grep openclaw | grep {agent-id} +``` + +### Skill 加载失败 + +```bash +# 检查 skill.json 是否存在 +ls -la ~/.openclaw/workspace/skills/{skill-name}/ + +# 检查 openclaw.json 配置 +cat ~/.openclaw-{agent-id}/openclaw.json | python3 -m json.tool + +# 查看 Gateway 日志 +journalctl --user -u openclaw-gateway-{agent-id}.service | grep -i skill +``` + +### Telegram 不回复 + +```bash +# 检查配对状态 +cat ~/.openclaw-{agent-id}/credentials/telegram-default-allowFrom.json + +# 检查 Bot Token +curl -X POST https://api.telegram.org/bot{BOT_TOKEN}/getMe + +# 检查 Gateway 绑定 +cat ~/.openclaw-{agent-id}/openclaw.json | grep bind +``` + +--- + +## 📊 配置模板 + +### openclaw.json 模板 + +```json +{ + "meta": { + "lastTouchedVersion": "2026.2.22-2", + "lastTouchedAt": "2026-02-23T00:00:00.000Z" + }, + "env": { + "TAVILY_API_KEY": "tvly-xxx" + }, + "models": { + "mode": "merge", + "providers": { + "bailian": { + "baseUrl": "https://coding.dashscope.aliyuncs.com/v1", + "apiKey": "sk-sp-xxx", + "api": "openai-completions" + } + } + }, + "agents": { + "defaults": { + "model": { + "primary": "bailian/qwen3.5-plus" + }, + "workspace": "/root/.openclaw/workspace/agents/{agent-id}-workspace" + }, + "list": [ + { + "id": "{agent-id}", + "name": "{Agent Name}", + "workspace": "/root/.openclaw/workspace/agents/{agent-id}-workspace" + } + ] + }, + "channels": { + "telegram": { + "enabled": true, + "dmPolicy": "pairing", + "botToken": "{BOT_TOKEN}", + "groupPolicy": "allowlist", + "streaming": "partial" + } + }, + "gateway": { + "port": 18790, + "mode": "local", + "bind": "lan", + "auth": { + "mode": "token", + "token": "{GATEWAY_TOKEN}" + }, + "trustedProxies": ["127.0.0.1", "::1"] + }, + "memory": { + "backend": "qmd", + "citations": "auto", + "qmd": { + "includeDefaultMemory": true, + "update": { + "interval": "5m", + "debounceMs": 15000 + } + } + }, + "skills": { + "install": { + "nodeManager": "npm" + }, + "entries": { + "tavily": { + "enabled": true, + "apiKey": "tvly-xxx" + }, + "mem0-integration": { + "enabled": true, + "config": { + "agent_id": "{agent-id}", + "user_id": "{user-id}", + "collection_name": "mem0_v4_{agent-id}" + } + } + } + }, + "plugins": { + "entries": { + "telegram": { "enabled": true } + } + } +} +``` + +### systemd 服务模板 + +```ini +[Unit] +Description=OpenClaw Gateway - {Agent Name} +Documentation=https://docs.openclaw.ai +After=network.target openclaw-gateway.service + +[Service] +Type=simple +User=root +Environment="XDG_RUNTIME_DIR=/run/user/0" +Environment="DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus" +Environment="NODE_ENV=production" +Environment="TZ=Asia/Shanghai" +WorkingDirectory=/root/.openclaw-{agent-id} +ExecStart=/www/server/nodejs/v24.13.1/bin/openclaw --profile {agent-id} gateway +Restart=always +RestartSec=10s +MemoryMax=1G +CPUQuota=50% +TimeoutStopSec=30s +StandardOutput=journal +StandardError=journal +SyslogIdentifier=openclaw-gateway-{agent-id} + +[Install] +WantedBy=default.target +``` + +--- + +## 🎯 检查清单(部署后) + +- [ ] Gateway 服务运行正常(`systemctl --user status`) +- [ ] 端口监听正确(`ss -tlnp | grep {port}`) +- [ ] Telegram Bot 已连接(日志中显示 `starting provider`) +- [ ] Telegram Pairing 完成(`allowFrom` 包含用户 ID) +- [ ] Skills 加载成功(日志无错误) +- [ ] Mem0 collection 已创建(独立 collection 名) +- [ ] 日志目录已创建(`/logs/agents/{agent-id}/`) +- [ ] Registry 已更新(`agents/registry.md`) +- [ ] Git 已提交(配置备份) +- [ ] 功能测试通过(实际发送消息测试) + +--- + +## 📚 相关文档 + +- [OpenClaw 官方文档](https://docs.openclaw.ai) +- [张大师部署日志](../logs/agents/life/2026-02-23-deployment-check.log) +- [张大师问题修复报告](../logs/agents/life/2026-02-23-issue-fixes.md) +- [Agent Registry](../agents/registry.md) + +--- + +**最后更新:** 2026-02-23 +**维护者:** Eason (陈医生) 👨‍⚕️