You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

655 lines
15 KiB

# Agent 部署最佳实践
**版本:** 1.0
**创建日期:** 2026-02-23
**作者:** Eason (陈医生) 👨
**基于:** 张大师 (life) 部署经验总结
---
## 📋 部署前检查清单
### 1. 架构规划
- [ ] **确定 Agent 类型**: 独立 Gateway vs 路由模式
- 独立 Gateway:隔离性好,需要单独配置所有 Skills
- 路由模式:共享配置,资源节省
- [ ] **端口规划**: 确保端口不冲突(主 Gateway 18789,张大师 18790)
- [ ] **数据库隔离**: Mem0 collection 命名(如 `mem0_v4_life`
### 2. 配置文件结构
```
新 Agent 部署结构:
├── ~/.openclaw-{agent-id}/ # 独立配置目录
│ ├── openclaw.json # Gateway 配置
│ ├── agents/ # Agent 配置
│ ├── credentials/ # 凭证文件
│ └── telegram/ # Telegram 状态
├── ~/.config/systemd/user/ # systemd 服务
│ └── openclaw-gateway-{agent-id}.service
└── ~/.openclaw/workspace/ # 共享 workspace
├── agents/{agent-id}-agent.json # Agent 定义
├── skills/ # Skills(共享)
└── logs/agents/{agent-id}/ # 日志目录
```
---
## ⚠ 常见错误与解决方案
### 错误 1: Skill 配置字段错误
**问题:**
```json
// ❌ 错误 - openclaw.json 中不支持 description 字段
"skills": {
"entries": {
"chinese-almanac": {
"enabled": true,
"description": "黄历查询" // 不支持!
}
}
}
```
**错误信息:**
```
skills.entries.xxx: Unrecognized key: "description"
Config invalid
```
**正确配置:**
```json
// ✅ 正确 - 只使用支持的字段
"skills": {
"entries": {
"chinese-almanac": {
"enabled": true,
"config": { // 技能特定配置放在 config 中
"tavily_api_key": "tvly-xxx"
}
}
}
}
```
**教训:**
- `openclaw.json` 中 Skill 配置只支持 `enabled``config` 字段
- `description`、`name` 等元数据应放在 `skill.json`
- 配置验证失败会导致 Gateway 无法启动
---
### 错误 2: Python Skill 在 Node.js 环境中调用
**问题:**
```json
// ❌ 错误 - Python 脚本无法在 Node.js Gateway 中直接调用
{
"name": "google-calendar", // Python 实现
"handler": "google_calendar.handle_calendar_command"
}
```
**症状:**
- Skill 加载失败
- Agent 报告"功能未配置"或"需要 MCP 连接"
- 命令行测试成功,但 Gateway 中失败
**解决方案 A: 创建 Node.js 包装器(推荐)**
```
skills/google-calendar-node/
├── calendar.js // Node.js 接口
│ └── spawn('python3', ['google_calendar.py', command])
└── skill.json
└── "handler": "calendar.getCalendarInfo" // Node.js 模块
```
**解决方案 B: 纯 Node.js 实现**
```javascript
// 使用 googleapis npm 包
const { google } = require('googleapis');
```
**教训:**
- OpenClaw Gateway 是 Node.js 环境
- Python Skills 需要 Node.js 包装器才能集成
- 测试时不要只测试 Python 脚本,要测试 Gateway 集成
---
### 错误 3: Systemd Watchdog 配置
**问题:**
```ini
# ❌ 错误 - OpenClaw 不支持 systemd watchdog 通知
[Service]
WatchdogSec=60s
```
**症状:**
```
Watchdog timeout (limit 1min)!
Killing process with signal SIGABRT
Main process exited, code=dumped, status=6/ABRT
```
**正确配置:**
```ini
# ✅ 正确 - 移除 WatchdogSec
[Service]
Restart=always
RestartSec=10s
MemoryMax=1G
CPUQuota=50%
# 不要设置 WatchdogSec
```
**教训:**
- OpenClaw Gateway 不发送 systemd watchdog 通知
- 设置 WatchdogSec 会导致服务被误杀
- 使用 `Restart=always` 实现自动恢复
---
### 错误 4: Gateway 绑定地址
**问题:**
```json
// ❌ 错误 - loopback 绑定导致 Telegram pairing 失败
"gateway": {
"bind": "loopback"
}
```
**错误信息:**
```
Error: Gateway is only bound to loopback.
Set gateway.bind=lan, enable tailscale serve,
or configure plugins.entries.device-pair.config.publicUrl.
```
**正确配置:**
```json
// ✅ 正确 - LAN 绑定支持 Telegram pairing
"gateway": {
"bind": "lan",
"port": 18790,
"auth": {
"mode": "token",
"token": "your-token"
}
}
```
**安全考虑:**
- 绑定 LAN 后,确保防火墙限制访问
- 仅暴露 80/443 端口(通过 Nginx 反向代理)
- 使用 token 认证
---
### 错误 5: Agent 配置与 Gateway 配置不一致
**问题:**
```json
// life-agent.json
{
"name": "google-calendar", // ❌ Python 版本
"enabled": true
}
// openclaw-life.json
{
"skills": {
"entries": {
"google-calendar-node": { // ✅ Node.js 版本
"enabled": true
}
}
}
}
```
**症状:**
- Agent 认为功能未配置
- System prompt 与实际可用工具不符
**解决方案:**
```json
// ✅ 保持一致
// life-agent.json
{
"name": "google-calendar-node",
"enabled": true
}
// openclaw-life.json
{
"skills": {
"entries": {
"google-calendar-node": {
"enabled": true
}
}
}
}
// System Prompt 中明确说明
"## 可用工具\n\n### Google Calendar\n- 使用 google-calendar-node skill\n- 已配置完成,无需 MCP 连接"
```
**教训:**
- `agent.json` 中的 skills 列表必须与 `openclaw.json` 一致
- System prompt 应准确描述可用工具
- 更新配置后重启 Gateway
---
### 错误 6: 硬编码数据 vs 动态计算
**问题:**
```javascript
// ❌ 错误 - 硬编码农历日期
const query = `2026 年 2 月 24 日 农历黄历 宜忌 正月初八`;
```
**症状:**
- 日期变化后数据错误
- 不同数据源返回不同结果
**正确做法:**
```javascript
// ✅ 正确 - 动态计算
const springFestival = new Date('2026-02-17'); // 春节
const lunarDay = Math.floor((targetDate - springFestival) / (1000*60*60*24)) + 1;
const lunarDateStr = `农历正月初${lunarDay}`;
```
**教训:**
- 避免硬编码日期、时间等动态数据
- 使用权威数据源(API)而非内部推算
- 在 system prompt 中强调使用工具查询
---
## 📝 标准部署流程
### 步骤 1: 创建配置目录
```bash
mkdir -p ~/.openclaw-{agent-id}/{agents,credentials,telegram}
mkdir -p ~/.openclaw/workspace/logs/agents/{agent-id}/
```
### 步骤 2: 复制并修改 Gateway 配置
```bash
cp ~/.openclaw/openclaw.json ~/.openclaw-{agent-id}/openclaw.json
# 修改:
# - gateway.port
# - gateway.bind (lan for Telegram)
# - channels.telegram.botToken
# - skills.entries (添加/移除 skills)
```
### 步骤 3: 创建 systemd 服务
```bash
cat > ~/.config/systemd/user/openclaw-gateway-{agent-id}.service << EOF
[Unit]
Description=OpenClaw Gateway - {Agent Name}
After=network.target openclaw-gateway.service
[Service]
Type=simple
User=root
Environment="XDG_RUNTIME_DIR=/run/user/0"
Environment="DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus"
Environment="NODE_ENV=production"
Environment="TZ=Asia/Shanghai"
WorkingDirectory=/root/.openclaw-{agent-id}
ExecStart=/www/server/nodejs/v24.13.1/bin/openclaw --profile {agent-id} gateway
Restart=always
RestartSec=10s
MemoryMax=1G
CPUQuota=50%
TimeoutStopSec=30s
StandardOutput=journal
StandardError=journal
SyslogIdentifier=openclaw-gateway-{agent-id}
[Install]
WantedBy=default.target
EOF
```
**注意:** 不要设置 `WatchdogSec`
### 步骤 4: 创建 Agent 定义
```bash
cat > ~/.openclaw/workspace/agents/{agent-id}-agent.json << EOF
{
"id": "{agent-id}",
"name": "{Agent Name}",
"role": "{Agent Role}",
"system_prompt": "你是{Agent Name},...",
"skills": [
{
"name": "skill-name",
"enabled": true,
"config": { ... }
}
]
}
EOF
```
### 步骤 5: 启用并启动服务
```bash
# 启用 linger(允许用户服务在后台运行)
loginctl enable-linger $(whoami)
# 设置环境变量
export XDG_RUNTIME_DIR=/run/user/0
export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/0/bus"
# 启用并启动服务
systemctl --user daemon-reload
systemctl --user enable openclaw-gateway-{agent-id}.service
systemctl --user start openclaw-gateway-{agent-id}.service
# 验证状态
systemctl --user status openclaw-gateway-{agent-id}.service
journalctl --user -u openclaw-gateway-{agent-id}.service -f
```
### 步骤 6: 配置 Telegram Pairing
```bash
# 发送 pairing 命令
curl -X POST https://api.telegram.org/bot{BOT_TOKEN}/sendMessage \
-d "chat_id={USER_CHAT_ID}" \
-d "text=/pair {PAIRING_CODE}"
# 验证配对状态
cat ~/.openclaw-{agent-id}/credentials/telegram-default-allowFrom.json
```
### 步骤 7: 更新 Registry
```bash
# 更新 agents/registry.md
# 添加新 Agent 信息
```
### 步骤 8: 提交 Git
```bash
cd ~/.openclaw/workspace
git add agents/{agent-id}-agent.json agents/registry.md
git commit -m "feat: 部署 {Agent Name} - {agent-id}"
```
---
## 🔧 故障排查
### Gateway 无法启动
```bash
# 检查配置
openclaw --profile {agent-id} doctor
# 查看日志
journalctl --user -u openclaw-gateway-{agent-id}.service --since "10 minutes ago"
# 检查端口
ss -tlnp | grep {port}
# 检查进程
ps aux | grep openclaw | grep {agent-id}
```
### Skill 加载失败
```bash
# 检查 skill.json 是否存在
ls -la ~/.openclaw/workspace/skills/{skill-name}/
# 检查 openclaw.json 配置
cat ~/.openclaw-{agent-id}/openclaw.json | python3 -m json.tool
# 查看 Gateway 日志
journalctl --user -u openclaw-gateway-{agent-id}.service | grep -i skill
```
### Telegram 不回复
```bash
# 检查配对状态
cat ~/.openclaw-{agent-id}/credentials/telegram-default-allowFrom.json
# 检查 Bot Token
curl -X POST https://api.telegram.org/bot{BOT_TOKEN}/getMe
# 检查 Gateway 绑定
cat ~/.openclaw-{agent-id}/openclaw.json | grep bind
```
---
## 📊 配置模板
### openclaw.json 模板
```json
{
"meta": {
"lastTouchedVersion": "2026.2.22-2",
"lastTouchedAt": "2026-02-23T00:00:00.000Z"
},
"env": {
"TAVILY_API_KEY": "tvly-xxx"
},
"models": {
"mode": "merge",
"providers": {
"bailian": {
"baseUrl": "https://coding.dashscope.aliyuncs.com/v1",
"apiKey": "sk-sp-xxx",
"api": "openai-completions"
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "bailian/qwen3.5-plus"
},
"workspace": "/root/.openclaw/workspace/agents/{agent-id}-workspace"
},
"list": [
{
"id": "{agent-id}",
"name": "{Agent Name}",
"workspace": "/root/.openclaw/workspace/agents/{agent-id}-workspace"
}
]
},
"channels": {
"telegram": {
"enabled": true,
"dmPolicy": "pairing",
"botToken": "{BOT_TOKEN}",
"groupPolicy": "allowlist",
"streaming": "partial"
}
},
"gateway": {
"port": 18790,
"mode": "local",
"bind": "lan",
"auth": {
"mode": "token",
"token": "{GATEWAY_TOKEN}"
},
"trustedProxies": ["127.0.0.1", "::1"]
},
"memory": {
"backend": "qmd",
"citations": "auto",
"qmd": {
"includeDefaultMemory": true,
"update": {
"interval": "5m",
"debounceMs": 15000
}
}
},
"skills": {
"install": {
"nodeManager": "npm"
},
"entries": {
"tavily": {
"enabled": true,
"apiKey": "tvly-xxx"
},
"mem0-integration": {
"enabled": true,
"config": {
"agent_id": "{agent-id}",
"user_id": "{user-id}",
"collection_name": "mem0_v4_{agent-id}"
}
}
}
},
"plugins": {
"entries": {
"telegram": { "enabled": true }
}
}
}
```
### systemd 服务模板
```ini
[Unit]
Description=OpenClaw Gateway - {Agent Name}
Documentation=https://docs.openclaw.ai
After=network.target openclaw-gateway.service
[Service]
Type=simple
User=root
Environment="XDG_RUNTIME_DIR=/run/user/0"
Environment="DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/0/bus"
Environment="NODE_ENV=production"
Environment="TZ=Asia/Shanghai"
WorkingDirectory=/root/.openclaw-{agent-id}
ExecStart=/www/server/nodejs/v24.13.1/bin/openclaw --profile {agent-id} gateway
Restart=always
RestartSec=10s
MemoryMax=1G
CPUQuota=50%
TimeoutStopSec=30s
StandardOutput=journal
StandardError=journal
SyslogIdentifier=openclaw-gateway-{agent-id}
[Install]
WantedBy=default.target
```
---
## ⚠ QMD 内存后端已知风险
OpenClaw 使用 `qmd` 作为 agent workspace 的内存索引后端。此组件有一个**已知的安装兼容性问题**,在迁移或升级时很容易触发。
### 问题根因
`qmd` 由 OpenClaw 从 GitHub 下载到 cache 目录(`/www/server/nodejs/v24.13.1/cache/@GH@tobi-qmd-*/`),**不是**标准 npm 包全局安装。
两种失效模式:
| 情况 | 错误 | 原因 |
|------|------|------|
| bun 安装的 qmd | `better-sqlite3 bindings.node` 报错 | native addon 为 bun 编译,不兼容 node v24 |
| cache 版未编译 | `spawn qmd ENOENT``dist/qmd.js not found` | TypeScript 源码未编译成 dist/ |
### 触发时机
- ✓ 新服务器迁移后(cache 目录不存在 dist/)
-`openclaw update` 后(cache hash 变化,旧 symlink 失效)
- ✓ Node.js 版本升级后(路径变化)
### 快速诊断
```bash
# 1. 检查 symlink 是否正常
ls -la /www/server/nodejs/v24.13.1/bin/qmd
# 2. 实际运行测试(必须输出 "Usage:")
/www/server/nodejs/v24.13.1/bin/qmd --help 2>&1 | head -2
# 3. 查看 gateway 日志
journalctl --user -u openclaw-gateway-{agent-id} -n 20 | grep qmd
```
### 修复(迁移/升级后标准步骤)
```bash
QMD_CACHE=$(ls -dt /www/server/nodejs/v24.13.1/cache/@GH@tobi-qmd-*/ | head -1)
cd "$QMD_CACHE" && npm install && npm run build
ln -sf "$QMD_CACHE/qmd" /www/server/nodejs/v24.13.1/bin/qmd
/www/server/nodejs/v24.13.1/bin/qmd collection list # 验证
```
> 详细步骤见 `SERVER_MIGRATION_GUIDE.md § Step 4.5`
### 模型配置注意(MiniMax-M2.5)
MiniMax-M2.5 在 OpenClaw 中如配置 `"reasoning": true` 或未明确禁用,会进入 extended thinking 模式,导致**响应只有 thinking block、用户收不到任何回复**。
```json
// openclaw.json 中 default_llm/MiniMax-M2.5 必须加:
{ "id": "MiniMax-M2.5", ..., "reasoning": false }
```
---
## 🎯 检查清单(部署后)
- [ ] Gateway 服务运行正常(`systemctl --user status`)
- [ ] 端口监听正确(`ss -tlnp | grep {port}`)
- [ ] Telegram Bot 已连接(日志中显示 `starting provider`
- [ ] Telegram Pairing 完成(`allowFrom` 包含用户 ID)
- [ ] Skills 加载成功(日志无错误)
- [ ] **QMD 正常**:`/www/server/nodejs/v24.13.1/bin/qmd collection list` 无报错
- [ ] **Gateway 日志无 qmd ENOENT**:`journalctl --user -u ... | grep qmd`
- [ ] Mem0 collection 已创建(独立 collection 名)
- [ ] 日志目录已创建(`/logs/agents/{agent-id}/`)
- [ ] Registry 已更新(`agents/registry.md`)
- [ ] Git 已提交(配置备份)
- [ ] 功能测试通过(实际发送消息测试)
---
## 📚 相关文档
- [OpenClaw 官方文档](https://docs.openclaw.ai)
- [张大师部署日志](../logs/agents/life/2026-02-23-deployment-check.log)
- [张大师问题修复报告](../logs/agents/life/2026-02-23-issue-fixes.md)
- [Agent Registry](../agents/registry.md)
---
**最后更新:** 2026-02-23
**维护者:** Eason (陈医生) 👨