You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
535 lines
22 KiB
535 lines
22 KiB
# MEMORY.md - Long-term Memory |
|
|
|
This file contains curated long-term memories and important context. |
|
|
|
## Memory Management Strategy |
|
- **MEMORY.md**: Curated long-term memories, important decisions, security templates, and key configurations |
|
- **QMD System**: Automated memory backend with semantic search, auto-updates every 5 minutes |
|
- **Usage**: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation |
|
- **Access**: MEMORY.md loaded only in main sessions (direct chats) for security |
|
|
|
## QMD Configuration |
|
- Backend: qmd |
|
- Auto-update: every 5 minutes |
|
- Include default memory: true |
|
- Last verified: 2026-02-20 |
|
|
|
## Server Security Hardening Template (2026-02-20) |
|
|
|
### Environment |
|
- **Server**: Ubuntu 24.04 LTS VPS (KVM) |
|
- **Panel**: 宝塔面板 (BT-Panel) on port 888 |
|
- **Public IP**: 204.12.203.203 |
|
|
|
### Security Configuration Applied |
|
1. **Port Exposure Minimization**: |
|
- Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible |
|
- SSH (port 22) restricted to internal/network access only |
|
- OpenClaw gateway (port 18789) bound to localhost only |
|
- All other services (MySQL, custom apps) internal-only |
|
|
|
2. **OpenClaw Secure Deployment**: |
|
- Gateway configured with `bind: "localhost"` instead of `"lan"` |
|
- Access exclusively through Nginx reverse proxy with HTTPS |
|
- Token-based authentication enabled |
|
- WebSocket support properly configured in Nginx |
|
|
|
3. **Firewall Management**: |
|
- Use 宝塔面板 (BT-Panel) built-in firewall for port management |
|
- Alternative: system-level firewall (ufw/iptables) if no panel available |
|
- Regular external port scanning to verify exposure |
|
|
|
4. **Critical Security Principles**: |
|
- Never expose sensitive services directly to public internet |
|
- Always use reverse proxy with TLS termination for web services |
|
- Implement defense in depth (firewall + service binding + authentication) |
|
- Regular security audits using `openclaw security audit --deep` |
|
|
|
### Migration Checklist for New Servers |
|
- [ ] Install and configure 宝塔面板 or equivalent server management panel |
|
- [ ] Set up Nginx reverse proxy with proper WebSocket support |
|
- [ ] Configure OpenClaw with localhost binding only |
|
- [ ] Restrict public ports to 80/443 only via firewall |
|
- [ ] Enable automatic security updates |
|
- [ ] Run initial security audit and document baseline |
|
- [ ] Schedule periodic security audits via OpenClaw cron |
|
|
|
### Lessons Learned |
|
- Panel-based firewalls (宝塔/aapanel) must be verified with external port scans |
|
- Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks |
|
- Nginx reverse proxy configuration is essential for secure OpenClaw deployment |
|
|
|
## Agent Operations Logging Practice (2026-02-20) |
|
|
|
### Log Directory Structure |
|
- `/root/.openclaw/workspace/logs/operations/` - Manual operations and important changes |
|
- `/root/.openclaw/workspace/logs/system/` - System-generated logs |
|
- `/root/.openclaw/workspace/logs/agents/` - Individual agent logs |
|
- `/root/.openclaw/workspace/logs/security/` - Security operations and audits |
|
|
|
### Automatic Logging Triggers |
|
1. **Configuration Changes**: Any modification to config files (.json, .yaml, etc.) |
|
2. **Security Modifications**: Firewall rules, authentication changes, port modifications |
|
3. **Agent Lifecycle**: Deployment, updates, removal of agents |
|
4. **System Optimizations**: Performance tuning, resource allocation changes |
|
5. **Troubleshooting**: Error diagnosis and resolution procedures |
|
6. **Memory Updates**: Significant changes to MEMORY.md or memory management |
|
|
|
### Log Format Standard |
|
- **Filename**: `YYYY-MM-DD-HH-MM-SS-description.log` |
|
- **Timestamp**: UTC time format |
|
- **Content**: `[TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state` |
|
|
|
### Implementation Guidelines |
|
- Always log before making changes (capture current state) |
|
- Include rollback instructions when applicable |
|
- Redact sensitive information (passwords, tokens, private keys) |
|
- Reference related MEMORY.md entries for context |
|
- Use QMD for routine operational context, MEMORY.md for strategic decisions |
|
|
|
## Agent Health Monitoring & Alerting System (2026-02-20) |
|
|
|
### Features Implemented |
|
1. **Crash Detection**: Monitors uncaught exceptions and unhandled rejections |
|
2. **Health Checks**: Periodic service health verification (every 30 seconds) |
|
3. **Multi-Channel Notifications**: Telegram alerts for critical events |
|
4. **Automatic Logging**: All alerts logged to `/logs/agents/health-YYYY-MM-DD.log` |
|
5. **Extensible Design**: Easy to add new notification channels |
|
|
|
### Components Created |
|
- **Skill**: `agent-monitor/SKILL.md` - Documentation and usage guide |
|
- **Monitor Script**: `agent-monitor.js` - Core monitoring logic |
|
- **Startup Script**: `start-agent-monitor.sh` - Easy deployment |
|
- **Log Directory**: `/logs/agents/` - Dedicated logging location |
|
|
|
### Alert Severity Levels |
|
- **CRITICAL**: Process crashes, uncaught exceptions |
|
- **ERROR**: Unhandled rejections, failed operations |
|
- **WARNING**: Health check failures, performance issues |
|
- **INFO**: Service status updates, recovery notifications |
|
|
|
### Integration Points |
|
- Automatically integrated with existing Telegram channel |
|
- Compatible with OpenClaw's agent architecture |
|
- Works alongside existing logging and memory systems |
|
- Can monitor any Node.js-based agent process |
|
|
|
### Usage Instructions |
|
1. Source the startup script: `source /root/.openclaw/workspace/start-agent-monitor.sh` |
|
2. Call `startAgentMonitor("agent-name", healthCheckFunction)` |
|
3. Monitor automatically sends alerts on errors/crashes |
|
4. Check logs in `/logs/agents/` for detailed information |
|
|
|
--- |
|
|
|
## Complete System Architecture Upgrade (2026-02-20 14:25 UTC) |
|
|
|
### ✅ All 5 Core Requirements Implemented |
|
|
|
#### 1. System-Level Persistence ✓ |
|
- **Systemd Services**: `openclaw-gateway.service` + `openclaw-agent-monitor.service` |
|
- **Auto-start on Boot**: Both services enabled in multi-user.target |
|
- **Resource Limits**: Memory (2G/512M), CPU (80%/20%), watchdog timers |
|
- **Status**: `systemctl status openclaw-gateway` / `systemctl status openclaw-agent-monitor` |
|
|
|
#### 2. Auto-Healing ✓ |
|
- **Crash Detection**: Monitors process exits, signals, uncaught exceptions |
|
- **Auto-Restart**: Systemd Restart=always + monitor script restart logic |
|
- **Restart Limits**: Max 5 restarts per 5 minutes (prevents restart loops) |
|
- **Health Checks**: Every 30 seconds, automatic recovery on failure |
|
|
|
#### 3. Multi-Layer Memory Architecture ✓ |
|
- **Core Memory**: `CORE_INDEX.md` - Identity, structure, file index (always loaded first) |
|
- **Long-term Memory**: `MEMORY.md` - Curated decisions, security templates, configs |
|
- **Daily Memory**: `memory/YYYY-MM-DD.md` - Raw conversation logs (auto-saved) |
|
- **Passive Archive**: On-demand conversion of valuable conversations to skills/notes |
|
- **Git Integration**: All memory files tracked with version history |
|
|
|
#### 4. Git One-Click Rollback ✓ |
|
- **Repository**: `/root/.openclaw/workspace` (already initialized) |
|
- **Deploy Script**: `./deploy.sh rollback` - Rollback to previous commit |
|
- **Specific Rollback**: `./deploy.sh rollback-to <commit>` - Rollback to specific commit |
|
- **Auto-Backup**: Backup created before rollback |
|
- **Service Restart**: Automatic service restart after rollback |
|
|
|
#### 5. Telegram Notifications ✓ |
|
- **Triggers**: Service stop, error, crash, restart events |
|
- **Channels**: Telegram (via bot API) + OpenClaw message tool |
|
- **Severity Levels**: CRITICAL, ERROR, WARNING, INFO with emoji indicators |
|
- **Logging**: All notifications logged to `/logs/agents/health-YYYY-MM-DD.log` |
|
|
|
### 📋 Management Commands (deploy.sh) |
|
```bash |
|
./deploy.sh install # Install & start all systemd services |
|
./deploy.sh start # Start all services |
|
./deploy.sh stop # Stop all services |
|
./deploy.sh restart # Restart all services |
|
./deploy.sh status # Show detailed service status |
|
./deploy.sh logs # Show recent logs (last 50 lines) |
|
./deploy.sh health # Run comprehensive health check |
|
./deploy.sh backup # Create timestamped backup |
|
./deploy.sh rollback # Rollback to previous git commit |
|
./deploy.sh rollback-to <commit> # Rollback to specific commit |
|
./deploy.sh help # Show help message |
|
``` |
|
|
|
### 🔧 Systemd Service Details |
|
- **Gateway Service**: `/etc/systemd/system/openclaw-gateway.service` |
|
- Memory limit: 2G, CPU: 80%, Watchdog: 30s |
|
- Restart: always, RestartSec: 10s |
|
- Logs: `journalctl -u openclaw-gateway -f` |
|
|
|
- **Monitor Service**: `/etc/systemd/system/openclaw-agent-monitor.service` |
|
- Memory limit: 512M, CPU: 20% |
|
- Restart: always, RestartSec: 5s |
|
- Logs: `journalctl -u openclaw-agent-monitor -f` |
|
|
|
### 📊 Health Check Metrics |
|
- Gateway service status (active/inactive) |
|
- Agent monitor status (active/inactive) |
|
- Disk usage (warning at 80%) |
|
- Memory usage (warning at 80%) |
|
|
|
### 🎯 Next Steps (Future Enhancements) |
|
- [ ] Add Prometheus/Grafana monitoring dashboard |
|
- [ ] Implement log rotation and archival |
|
- [ ] Add email notifications as backup channel |
|
- [ ] Create web-based admin dashboard |
|
- [ ] Add automated security scanning in CI/CD |
|
|
|
--- |
|
|
|
## User-Level vs System-Level Systemd Services - Critical Lesson (2026-02-20 14:35 UTC) |
|
|
|
### Problem Discovered |
|
Initial deployment used system-level systemd services (`/etc/systemd/system/`) for OpenClaw Gateway, but OpenClaw natively uses **user-level systemd** (`~/.config/systemd/user/`). This caused: |
|
- Service restart loops (5 attempts then failure) |
|
- Error: `systemctl --user unavailable: Failed to connect to bus: No medium found` |
|
- Conflicts between system and user service definitions |
|
|
|
### Root Cause |
|
OpenClaw Gateway is designed as a user-level service because: |
|
1. It runs under the user's context, not root |
|
2. It needs access to user-specific config (`~/.openclaw/`) |
|
3. User-level services have different environment requirements |
|
|
|
### Solution: Hybrid Architecture |
|
|
|
#### User-Level Service (Gateway) |
|
- **Location**: `~/.config/systemd/user/openclaw-gateway.service` |
|
- **Required Setup**: |
|
```bash |
|
# Enable linger (CRITICAL - allows user services to run without login session) |
|
loginctl enable-linger $(whoami) |
|
|
|
# Set environment variables |
|
export XDG_RUNTIME_DIR=/run/user/$(id -u) |
|
export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus" |
|
``` |
|
- **Management Commands**: |
|
```bash |
|
systemctl --user status openclaw-gateway |
|
systemctl --user start/stop/restart openclaw-gateway |
|
journalctl --user -u openclaw-gateway -f |
|
``` |
|
|
|
#### System-Level Service (Agent Monitor) |
|
- **Location**: `/etc/systemd/system/openclaw-agent-monitor.service` |
|
- **Purpose**: Independently monitor the gateway (survives user session issues) |
|
- **Management Commands**: |
|
```bash |
|
systemctl status openclaw-agent-monitor |
|
systemctl start/stop/restart openclaw-agent-monitor |
|
journalctl -u openclaw-agent-monitor -f |
|
``` |
|
|
|
### Deployment Checklist for New Servers |
|
```bash |
|
# 1. Enable user linger (MUST DO FIRST) |
|
loginctl enable-linger $(whoami) |
|
|
|
# 2. Create runtime directory if needed |
|
mkdir -p /run/user/$(id -u) |
|
chmod 700 /run/user/$(id -u) |
|
|
|
# 3. Export environment (add to ~/.bashrc for persistence) |
|
echo 'export XDG_RUNTIME_DIR=/run/user/$(id -u)' >> ~/.bashrc |
|
echo 'export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$(id -u)/bus' >> ~/.bashrc |
|
|
|
# 4. Install services |
|
./deploy.sh install |
|
|
|
# 5. Verify |
|
./deploy.sh health |
|
``` |
|
|
|
### Troubleshooting Guide |
|
|
|
#### Error: "Failed to connect to bus: No medium found" |
|
**Cause**: User linger not enabled or environment variables not set |
|
**Fix**: |
|
```bash |
|
loginctl enable-linger $(whoami) |
|
export XDG_RUNTIME_DIR=/run/user/$(id -u) |
|
export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus" |
|
``` |
|
|
|
#### Error: "Start request repeated too quickly" |
|
**Cause**: Service crashing due to misconfiguration |
|
**Fix**: Check logs with `journalctl --user -u openclaw-gateway -f` |
|
|
|
#### User service not starting after reboot |
|
**Cause**: Linger not enabled |
|
**Fix**: `loginctl enable-linger $(whoami)` |
|
|
|
### Best Practices for Multi-Agent Deployments |
|
1. **Always enable linger** on first setup - document this in deployment guide |
|
2. **Use hybrid architecture** - user-level for agents, system-level for monitors |
|
3. **Set environment variables** in startup scripts, not just shell config |
|
4. **Test after reboot** - verify services auto-start correctly |
|
5. **Document in MEMORY.md** - share lessons across agent instances |
|
|
|
### Updated deploy.sh Features |
|
- Automatically enables linger during install |
|
- Sets up XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS |
|
- Uses `systemctl --user` for gateway, `systemctl` for monitor |
|
- Health check verifies linger status and runtime directory |
|
- Proper log commands for both service types |
|
|
|
--- |
|
## Collection 名称统一为 mem0_v4_shared (2026-02-27) |
|
|
|
### 背景 |
|
之前配置中存在 Collection 名称不一致问题: |
|
- 代码实际使用:`mem0_global_v4` |
|
- 用户指定/文档记录:`mem0_v4_shared`(陈医生和张大师共用) |
|
|
|
### 修改决策 |
|
王院长明确指示:所有 Collection 统一使用 `mem0_v4_shared`,不得随意修改关键配置。 |
|
|
|
### 修改文件列表 |
|
1. `skills/mem0-integration/mem0_client.py` |
|
2. `skills/mem0-integration/config.yaml` |
|
3. `skills/mem0-integration/skill.json` |
|
4. `skills/mem0-integration/config-life.yaml` |
|
5. `agents/life-agent.json` |
|
6. `agents/life-workspace/skills/mem0-integration/config.yaml` |
|
7. `docs/SYSTEM_ARCHITECTURE.md` |
|
|
|
### 验证结果 |
|
- ✅ Gateway 重启成功 (systemd 服务正常) |
|
- ✅ Qdrant Collection `mem0_v4_shared` 已创建 |
|
- ✅ 向量维度:1024 (text-embedding-v4) |
|
- ✅ 距离度量:Cosine |
|
- ✅ 元数据索引:user_id, agent_id, actor_id, run_id |
|
- ✅ Embedding 计费通道:Bailian 标准计费 |
|
|
|
### 操作日志 |
|
`/root/.openclaw/workspace/logs/operations/2026-02-27-08-55-00-unify-collection-name.log` |
|
|
|
### 重要原则 |
|
- 关键配置(Collection 名称、Embedding 模型、计费通道)修改必须经过用户确认 |
|
- 所有 Agent 共享同一 Collection,通过 `metadata.agent_id` 实现逻辑隔离 |
|
|
|
--- |
|
|
|
## Eason 的工作原则 (2026-03-07) |
|
|
|
1. **主动思考义务** — 作为 Agent 网络的维护者,有义务主动发现安全隐患、优化机会、最佳实践,并提议改进方案 |
|
2. **重要变更需审批** — 涉及安全配置、架构调整、权限变更等,必须先问王院长,获得确认后再执行 |
|
3. **用"我们"不是"你们"** — 我们是一个团队,一起工作。不说"你们的最佳实践",说"我们的最佳实践" |
|
|
|
### 边界把握 |
|
- ✅ 应该做:主动审计、发现问题、提出方案、执行已批准的操作 |
|
- ❌ 不应该:擅自修改关键配置、替用户做决定、用 outsider 语气 |
|
|
|
--- |
|
|
|
## Agent 部署最佳实践 (2026-03-07 新增) |
|
|
|
### 技能/插件文件规范 |
|
|
|
**问题:** 为桐哥配置 Tavily 时,创建了 `skill.json` 但 OpenClaw 需要 `openclaw.plugin.json`,导致服务崩溃重启 38 次。 |
|
|
|
**教训:** |
|
|
|
| 文件类型 | 用途 | 必需 | 命名 | |
|
|----------|------|------|------| |
|
| `openclaw.plugin.json` | OpenClaw 插件清单 | ✅ 必需 | 固定名称 | |
|
| `skill.json` | Clawhub 技能元数据 | ❌ 可选 | 固定名称 | |
|
| `index.js` | 插件/工具实现 | ✅ 必需 | 固定名称 | |
|
| `SKILL.md` | 技能文档 | ✅ 推荐 | 固定名称 | |
|
|
|
**检查清单(新增 Agent 时):** |
|
|
|
1. **插件结构** |
|
- [ ] `openclaw.plugin.json` 已创建(不是 `skill.json`) |
|
- [ ] `index.js` 已实现工具/插件逻辑 |
|
- [ ] `plugins.load.paths` 已添加插件路径 |
|
- [ ] `plugins.entries` 已启用插件 |
|
|
|
2. **配置验证** |
|
- [ ] 执行 `openclaw --profile <agent> doctor` 验证配置 |
|
- [ ] 执行 `openclaw --profile <agent> status` 检查服务状态 |
|
- [ ] 查看日志 `journalctl --user -u openclaw-gateway-<agent> -n 20` |
|
|
|
3. **技能启用** |
|
- [ ] `skills.entries.<skill>.enabled: true` |
|
- [ ] 环境变量已配置(如 API Key) |
|
- [ ] 插件依赖已加载 |
|
|
|
**错误示例(不要这样做):** |
|
``` |
|
❌ 只创建 skill.json,没有 openclaw.plugin.json |
|
❌ 没有验证配置就直接重启服务 |
|
❌ 服务崩溃后没有查看日志就继续修改 |
|
``` |
|
|
|
**正确流程:** |
|
``` |
|
1. 创建技能文件(openclaw.plugin.json + index.js) |
|
2. 在 openclaw.json 中配置 plugins.load.paths 和 plugins.entries |
|
3. 运行 openclaw doctor 验证配置 |
|
4. 重启服务并检查状态 |
|
5. 查看日志确认插件加载成功 |
|
``` |
|
|
|
### 配置变更原则 |
|
|
|
- **先验证再重启** — 用 `doctor` 命令验证配置,不要直接重启 |
|
- **看日志再修复** — 服务崩溃后先 `journalctl` 看错误,再针对性修复 |
|
- **小步迭代** — 一次改一个配置,验证通过再继续 |
|
|
|
--- |
|
|
|
## 时区配置 (2026-03-07) |
|
|
|
**所有 Agent 统一使用香港时区 (Asia/Hong_Kong, UTC+8)** |
|
|
|
- Eason (主 Agent): 香港时区 |
|
- 桐哥: 香港时区 |
|
- 作息配置:7-23 点工作,23-7 点休息(香港时间) |
|
- Cron 触发:每小时触发,脚本内部判断香港时区 |
|
|
|
**转换关系:** |
|
- 香港 07:00 = UTC 23:00 (前一日) |
|
- 香港 23:00 = UTC 15:00 |
|
- 香港 13:00 = UTC 05:00 |
|
|
|
--- |
|
|
|
## 安全审计误报分析 (2026-02-26) |
|
|
|
### 背景 |
|
执行 `openclaw security audit --deep` 发现 4 个 CRITICAL/WARNING 问题,经人工复核确认为误报或已知权衡。 |
|
|
|
### 误报项及原因 |
|
|
|
| 审计项 | 原始评级 | 复核结论 | 根因 | |
|
|--------|----------|----------|------| |
|
| Gateway 绑定 `lan` | CRITICAL | 误报 | 审计工具静态分析配置文件,无法感知运行时绑定到 Tailscale (100.115.94.1) | |
|
| 设备认证禁用 | CRITICAL | 已知权衡 | 解决 HTTP 下 `isSecureContext=false` 问题,Tailscale+token 双重保护 | |
|
| 无插件白名单 | WARNING | 建议修复 | 已确认暂不修复(成本低但收益有限) | |
|
| 无速率限制 | WARNING | 威胁模型不匹配 | Tailscale 封闭网络 +48 字符强 token,暴力破解风险接近零 | |
|
| MemoryLimit 废弃 | WARNING | 误报 | 审计参考 workspace 模板,实际 service 文件无此参数 | |
|
|
|
### 核心教训 |
|
1. **安全审计是静态分析** - 无法替代人工判断,需结合运行时上下文 |
|
2. **理解威胁模型** - 审计假设的威胁场景需匹配实际部署环境 |
|
3. **记录已知权衡** - 在 MEMORY.md 记录为什么某些"安全问题"被接受 |
|
|
|
### 详细文档 |
|
- 审计报告:`logs/operations/2026-02-26-20-59-30-config-audit-report.md` |
|
- 复核分析:`logs/operations/2026-02-26-21-05-00-security-audit-review.md` |
|
- 修复脚本:`fix-security-config.sh` (未执行) |
|
|
|
--- |
|
|
|
## 配置清理和推送 (2026-02-26) |
|
|
|
### 操作 |
|
- 删除废弃 `life/` 目录(空配置,未被任何文件引用) |
|
- 清理嵌套 git 仓库(`agents/life-workspace/.git`, `skills/openclaw-wecom/.git`) |
|
- 移除 Python 缓存和运行时状态文件 |
|
- 提交并推送到远程仓库 |
|
|
|
### Git 提交 |
|
- Commit: `378523c chore: 配置审计和清理 - 2026-02-26` |
|
- 远程:`gl.tigerone.tech:sw_dm/openClaw_agent_dm.git` |
|
- 备份:`/root/.openclaw/backups/workspace-20260226-210956.tar.gz` |
|
|
|
### 保留的目录 |
|
- `agents/life-workspace/` - 测试用 Agent 工作区 |
|
- `skills/openclaw-wecom/` - 企业微信技能(TypeScript 实现) |
|
|
|
--- |
|
|
|
## 系统扩展架构升级完成 (2026-03-03 17:02 UTC) |
|
|
|
### 6 项核心任务全部完成 |
|
|
|
#### Task 1 - 环境变量持久化 |
|
- **文件**: `systemd/gateway.env`, `systemd/life-gateway.env` |
|
- **权限**: chmod 600 (仅 root 可读) |
|
- **特点**: 独立于 .service 文件,OpenClaw UI 升级不会覆盖 |
|
- **内容**: |
|
```bash |
|
MEM0_DASHSCOPE_API_KEY=sk-4111c9dba5334510968f9ae72728944e |
|
OPENAI_API_BASE=https://dashscope.aliyuncs.com/compatible-mode/v1 |
|
OPENAI_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 |
|
``` |
|
|
|
#### Task 2 - Agent Monitor 修复 (4 个 Bug) |
|
- **重启限制**: 集成到 `monitorOpenClawService()` via `handleServiceDown()` — 无限重启循环已修复 |
|
- **Life Agent 监控**: 现在每 30 秒同时检查 gateway 和 openclaw-gateway-life.service |
|
- **心跳日志**: 每 10 分钟输出 `gateway=OK, life=OK` |
|
- **升级容忍**: 首次检测到服务停止后等待 60 秒 (grace period),避免升级期间误报 |
|
|
|
#### Task 3 - Systemd 服务升级 |
|
- **模板更新**: 废弃的 `MemoryLimit=` 替换为 `MemoryMax=` |
|
- **Monitor 同步**: 模板同步到 `/etc/systemd/system/` |
|
- **环境变量注入**: 两个 user-level service 文件添加 `EnvironmentFile=` |
|
- **遗留服务**: 禁用并 masked 旧的系统级 `openclaw-gateway.service` |
|
- **状态**: 所有 3 个服务已重启并确认 active |
|
|
|
#### Task 4 - deploy.sh 增强 |
|
- **新命令**: |
|
- `debug-stop` — 安全停止 monitor 防止调试期间自动重启 |
|
- `debug-start` — 调试完成后恢复所有服务 |
|
- `fix-service` — UI 升级后重新注入 `EnvironmentFile=` |
|
- **Life Agent 集成**: `start/stop/restart/status/logs/health/install` 全部支持 life agent |
|
|
|
#### Task 5 - 统一架构文档 |
|
- **文件**: `docs/EXTENSIONS_ARCHITECTURE.md` |
|
- **内容**: 服务架构、监控系统、记忆系统交叉引用、环境变量、调试流程、升级安全清单 |
|
|
|
#### Task 6 - CORE_INDEX.md 更新 |
|
- **文件树**: 新增 .env 文件、.legacy 重命名、新文档 |
|
- **星标引用**: EXTENSIONS_ARCHITECTURE.md 列为关键参考 |
|
- **升级指南**: 添加升级安全指令到模型使用指南 |
|
- **管理命令**: 更新 deploy.sh 命令列表 |
|
|
|
### 当前系统状态 (2026-03-04 03:32 UTC) |
|
|
|
``` |
|
● openclaw-gateway.service Active: active (running) 10h ago |
|
● openclaw-gateway-life.service Active: active (running) 10h ago |
|
● openclaw-agent-monitor.service Active: active (running) 10h ago |
|
``` |
|
|
|
Monitor 心跳日志正常:每 10 分钟输出 `gateway=OK, life=OK` |
|
|
|
### 升级安全流程 |
|
|
|
```bash |
|
# OpenClaw UI 升级后执行 |
|
./deploy.sh fix-service # 重新注入 EnvironmentFile= |
|
./deploy.sh restart # 重启所有服务 |
|
./deploy.sh health # 验证健康状态 |
|
``` |
|
|
|
### 关键文档 |
|
- **扩展架构**: `docs/EXTENSIONS_ARCHITECTURE.md` — 修改基础设施前必读 |
|
- **记忆系统**: `docs/MEMORY_ARCHITECTURE.md` — 四层记忆体系详细设计 |
|
- **监控脚本**: `agent-monitor.js` — 健康监控与自动修复逻辑 |
|
|
|
|