openClaw_agent_dm/MEMORY.md

# MEMORY.md - Long-term Memory

This file contains curated long-term memories and important context.

## Memory Management Strategy
- **MEMORY.md**: Curated long-term memories, important decisions, security templates, and key configurations
- **QMD System**: Automated memory backend with semantic search, auto-updates every 5 minutes
- **Usage**: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation
- **Access**: MEMORY.md loaded only in main sessions (direct chats) for security

## QMD Configuration
- Backend: qmd
- Auto-update: every 5 minutes
- Include default memory: true
- Last verified: 2026-02-20

## Server Security Hardening Template (2026-02-20)

### Environment
- **Server**: Ubuntu 24.04 LTS VPS (KVM)
- **Panel**: 宝塔面板 (BT-Panel) on port 888
- **Public IP**: 204.12.203.203

### Security Configuration Applied
1. **Port Exposure Minimization**:
   - Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible
   - SSH (port 22) restricted to internal/network access only
   - OpenClaw gateway (port 18789) bound to localhost only
   - All other services (MySQL, custom apps) internal-only

2. **OpenClaw Secure Deployment**:
   - Gateway configured with `bind: "localhost"` instead of `"lan"`
   - Access exclusively through Nginx reverse proxy with HTTPS
   - Token-based authentication enabled
   - WebSocket support properly configured in Nginx

3. **Firewall Management**:
   - Use 宝塔面板 (BT-Panel) built-in firewall for port management
   - Alternative: system-level firewall (ufw/iptables) if no panel available
   - Regular external port scanning to verify exposure

4. **Critical Security Principles**:
   - Never expose sensitive services directly to public internet
   - Always use reverse proxy with TLS termination for web services
   - Implement defense in depth (firewall + service binding + authentication)
   - Regular security audits using `openclaw security audit --deep`

### Migration Checklist for New Servers
- [ ] Install and configure 宝塔面板 or equivalent server management panel
- [ ] Set up Nginx reverse proxy with proper WebSocket support
- [ ] Configure OpenClaw with localhost binding only
- [ ] Restrict public ports to 80/443 only via firewall
- [ ] Enable automatic security updates
- [ ] Run initial security audit and document baseline
- [ ] Schedule periodic security audits via OpenClaw cron

### Lessons Learned
- Panel-based firewalls (宝塔/aapanel) must be verified with external port scans
- Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks
- Nginx reverse proxy configuration is essential for secure OpenClaw deployment

## Agent Operations Logging Practice (2026-02-20)

### Log Directory Structure
- `/root/.openclaw/workspace/logs/operations/` - Manual operations and important changes
- `/root/.openclaw/workspace/logs/system/` - System-generated logs  
- `/root/.openclaw/workspace/logs/agents/` - Individual agent logs
- `/root/.openclaw/workspace/logs/security/` - Security operations and audits

### Automatic Logging Triggers
1. **Configuration Changes**: Any modification to config files (.json, .yaml, etc.)
2. **Security Modifications**: Firewall rules, authentication changes, port modifications
3. **Agent Lifecycle**: Deployment, updates, removal of agents
4. **System Optimizations**: Performance tuning, resource allocation changes
5. **Troubleshooting**: Error diagnosis and resolution procedures
6. **Memory Updates**: Significant changes to MEMORY.md or memory management

### Log Format Standard
- **Filename**: `YYYY-MM-DD-HH-MM-SS-description.log`
- **Timestamp**: UTC time format
- **Content**: `[TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state`

### Implementation Guidelines
- Always log before making changes (capture current state)
- Include rollback instructions when applicable
- Redact sensitive information (passwords, tokens, private keys)
- Reference related MEMORY.md entries for context
- Use QMD for routine operational context, MEMORY.md for strategic decisions

## Agent Health Monitoring & Alerting System (2026-02-20)

### Features Implemented
1. **Crash Detection**: Monitors uncaught exceptions and unhandled rejections
2. **Health Checks**: Periodic service health verification (every 30 seconds)
3. **Multi-Channel Notifications**: Telegram alerts for critical events
4. **Automatic Logging**: All alerts logged to `/logs/agents/health-YYYY-MM-DD.log`
5. **Extensible Design**: Easy to add new notification channels

### Components Created
- **Skill**: `agent-monitor/SKILL.md` - Documentation and usage guide
- **Monitor Script**: `agent-monitor.js` - Core monitoring logic
- **Startup Script**: `start-agent-monitor.sh` - Easy deployment
- **Log Directory**: `/logs/agents/` - Dedicated logging location

### Alert Severity Levels
- **CRITICAL**: Process crashes, uncaught exceptions
- **ERROR**: Unhandled rejections, failed operations  
- **WARNING**: Health check failures, performance issues
- **INFO**: Service status updates, recovery notifications

### Integration Points
- Automatically integrated with existing Telegram channel
- Compatible with OpenClaw's agent architecture
- Works alongside existing logging and memory systems
- Can monitor any Node.js-based agent process

### Usage Instructions
1. Source the startup script: `source /root/.openclaw/workspace/start-agent-monitor.sh`
2. Call `startAgentMonitor("agent-name", healthCheckFunction)` 
3. Monitor automatically sends alerts on errors/crashes
4. Check logs in `/logs/agents/` for detailed information

---

## Complete System Architecture Upgrade (2026-02-20 14:25 UTC)

### ✅ All 5 Core Requirements Implemented

#### 1. System-Level Persistence ✓
- **Systemd Services**: `openclaw-gateway.service` + `openclaw-agent-monitor.service`
- **Auto-start on Boot**: Both services enabled in multi-user.target
- **Resource Limits**: Memory (2G/512M), CPU (80%/20%), watchdog timers
- **Status**: `systemctl status openclaw-gateway` / `systemctl status openclaw-agent-monitor`

#### 2. Auto-Healing ✓
- **Crash Detection**: Monitors process exits, signals, uncaught exceptions
- **Auto-Restart**: Systemd Restart=always + monitor script restart logic
- **Restart Limits**: Max 5 restarts per 5 minutes (prevents restart loops)
- **Health Checks**: Every 30 seconds, automatic recovery on failure

#### 3. Multi-Layer Memory Architecture ✓
- **Core Memory**: `CORE_INDEX.md` - Identity, structure, file index (always loaded first)
- **Long-term Memory**: `MEMORY.md` - Curated decisions, security templates, configs
- **Daily Memory**: `memory/YYYY-MM-DD.md` - Raw conversation logs (auto-saved)
- **Passive Archive**: On-demand conversion of valuable conversations to skills/notes
- **Git Integration**: All memory files tracked with version history

#### 4. Git One-Click Rollback ✓
- **Repository**: `/root/.openclaw/workspace` (already initialized)
- **Deploy Script**: `./deploy.sh rollback` - Rollback to previous commit
- **Specific Rollback**: `./deploy.sh rollback-to <commit>` - Rollback to specific commit
- **Auto-Backup**: Backup created before rollback
- **Service Restart**: Automatic service restart after rollback

#### 5. Telegram Notifications ✓
- **Triggers**: Service stop, error, crash, restart events
- **Channels**: Telegram (via bot API) + OpenClaw message tool
- **Severity Levels**: CRITICAL, ERROR, WARNING, INFO with emoji indicators
- **Logging**: All notifications logged to `/logs/agents/health-YYYY-MM-DD.log`

### 📋 Management Commands (deploy.sh)
```bash
./deploy.sh install    # Install & start all systemd services
./deploy.sh start      # Start all services
./deploy.sh stop       # Stop all services
./deploy.sh restart    # Restart all services
./deploy.sh status     # Show detailed service status
./deploy.sh logs       # Show recent logs (last 50 lines)
./deploy.sh health     # Run comprehensive health check
./deploy.sh backup     # Create timestamped backup
./deploy.sh rollback   # Rollback to previous git commit
./deploy.sh rollback-to <commit>  # Rollback to specific commit
./deploy.sh help       # Show help message
```

### 🔧 Systemd Service Details
- **Gateway Service**: `/etc/systemd/system/openclaw-gateway.service`
  - Memory limit: 2G, CPU: 80%, Watchdog: 30s
  - Restart: always, RestartSec: 10s
  - Logs: `journalctl -u openclaw-gateway -f`

- **Monitor Service**: `/etc/systemd/system/openclaw-agent-monitor.service`
  - Memory limit: 512M, CPU: 20%
  - Restart: always, RestartSec: 5s
  - Logs: `journalctl -u openclaw-agent-monitor -f`

### 📊 Health Check Metrics
- Gateway service status (active/inactive)
- Agent monitor status (active/inactive)
- Disk usage (warning at 80%)
- Memory usage (warning at 80%)

### 🎯 Next Steps (Future Enhancements)
- [ ] Add Prometheus/Grafana monitoring dashboard
- [ ] Implement log rotation and archival
- [ ] Add email notifications as backup channel
- [ ] Create web-based admin dashboard
- [ ] Add automated security scanning in CI/CD

---

## User-Level vs System-Level Systemd Services - Critical Lesson (2026-02-20 14:35 UTC)

### Problem Discovered
Initial deployment used system-level systemd services (`/etc/systemd/system/`) for OpenClaw Gateway, but OpenClaw natively uses **user-level systemd** (`~/.config/systemd/user/`). This caused:
- Service restart loops (5 attempts then failure)
- Error: `systemctl --user unavailable: Failed to connect to bus: No medium found`
- Conflicts between system and user service definitions

### Root Cause
OpenClaw Gateway is designed as a user-level service because:
1. It runs under the user's context, not root
2. It needs access to user-specific config (`~/.openclaw/`)
3. User-level services have different environment requirements

### Solution: Hybrid Architecture

#### User-Level Service (Gateway)
- **Location**: `~/.config/systemd/user/openclaw-gateway.service`
- **Required Setup**:
  ```bash
  # Enable linger (CRITICAL - allows user services to run without login session)
  loginctl enable-linger $(whoami)
  
  # Set environment variables
  export XDG_RUNTIME_DIR=/run/user/$(id -u)
  export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus"
  ```
- **Management Commands**:
  ```bash
  systemctl --user status openclaw-gateway
  systemctl --user start/stop/restart openclaw-gateway
  journalctl --user -u openclaw-gateway -f
  ```

#### System-Level Service (Agent Monitor)
- **Location**: `/etc/systemd/system/openclaw-agent-monitor.service`
- **Purpose**: Independently monitor the gateway (survives user session issues)
- **Management Commands**:
  ```bash
  systemctl status openclaw-agent-monitor
  systemctl start/stop/restart openclaw-agent-monitor
  journalctl -u openclaw-agent-monitor -f
  ```

### Deployment Checklist for New Servers
```bash
# 1. Enable user linger (MUST DO FIRST)
loginctl enable-linger $(whoami)

# 2. Create runtime directory if needed
mkdir -p /run/user/$(id -u)
chmod 700 /run/user/$(id -u)

# 3. Export environment (add to ~/.bashrc for persistence)
echo 'export XDG_RUNTIME_DIR=/run/user/$(id -u)' >> ~/.bashrc
echo 'export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$(id -u)/bus' >> ~/.bashrc

# 4. Install services
./deploy.sh install

# 5. Verify
./deploy.sh health
```

### Troubleshooting Guide

#### Error: "Failed to connect to bus: No medium found"
**Cause**: User linger not enabled or environment variables not set
**Fix**:
```bash
loginctl enable-linger $(whoami)
export XDG_RUNTIME_DIR=/run/user/$(id -u)
export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus"
```

#### Error: "Start request repeated too quickly"
**Cause**: Service crashing due to misconfiguration
**Fix**: Check logs with `journalctl --user -u openclaw-gateway -f`

#### User service not starting after reboot
**Cause**: Linger not enabled
**Fix**: `loginctl enable-linger $(whoami)`

### Best Practices for Multi-Agent Deployments
1. **Always enable linger** on first setup - document this in deployment guide
2. **Use hybrid architecture** - user-level for agents, system-level for monitors
3. **Set environment variables** in startup scripts, not just shell config
4. **Test after reboot** - verify services auto-start correctly
5. **Document in MEMORY.md** - share lessons across agent instances

### Updated deploy.sh Features
- Automatically enables linger during install
- Sets up XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS
- Uses `systemctl --user` for gateway, `systemctl` for monitor
- Health check verifies linger status and runtime directory
- Proper log commands for both service types

---
## Collection 名称统一为 mem0_v4_shared (2026-02-27)

### 背景
之前配置中存在 Collection 名称不一致问题：
- 代码实际使用：`mem0_global_v4`
- 用户指定/文档记录：`mem0_v4_shared`（陈医生和张大师共用）

### 修改决策
王院长明确指示：所有 Collection 统一使用 `mem0_v4_shared`，不得随意修改关键配置。

### 修改文件列表
1. `skills/mem0-integration/mem0_client.py`
2. `skills/mem0-integration/config.yaml`
3. `skills/mem0-integration/skill.json`
4. `skills/mem0-integration/config-life.yaml`
5. `agents/life-agent.json`
6. `agents/life-workspace/skills/mem0-integration/config.yaml`
7. `docs/SYSTEM_ARCHITECTURE.md`

### 验证结果
- ✅ Gateway 重启成功 (systemd 服务正常)
- ✅ Qdrant Collection `mem0_v4_shared` 已创建
- ✅ 向量维度：1024 (text-embedding-v4)
- ✅ 距离度量：Cosine
- ✅ 元数据索引：user_id, agent_id, actor_id, run_id
- ✅ Embedding 计费通道：Bailian 标准计费

### 操作日志
`/root/.openclaw/workspace/logs/operations/2026-02-27-08-55-00-unify-collection-name.log`

### 重要原则
- 关键配置（Collection 名称、Embedding 模型、计费通道）修改必须经过用户确认
- 所有 Agent 共享同一 Collection，通过 `metadata.agent_id` 实现逻辑隔离

---

## 安全审计误报分析 (2026-02-26)

### 背景
执行 `openclaw security audit --deep` 发现 4 个 CRITICAL/WARNING 问题，经人工复核确认为误报或已知权衡。

### 误报项及原因

| 审计项 | 原始评级 | 复核结论 | 根因 |
|--------|----------|----------|------|
| Gateway 绑定 `lan` | CRITICAL | 误报 | 审计工具静态分析配置文件，无法感知运行时绑定到 Tailscale (100.115.94.1) |
| 设备认证禁用 | CRITICAL | 已知权衡 | 解决 HTTP 下 `isSecureContext=false` 问题，Tailscale+token 双重保护 |
| 无插件白名单 | WARNING | 建议修复 | 已确认暂不修复（成本低但收益有限） |
| 无速率限制 | WARNING | 威胁模型不匹配 | Tailscale 封闭网络 +48 字符强 token，暴力破解风险接近零 |
| MemoryLimit 废弃 | WARNING | 误报 | 审计参考 workspace 模板，实际 service 文件无此参数 |

### 核心教训
1. **安全审计是静态分析** - 无法替代人工判断，需结合运行时上下文
2. **理解威胁模型** - 审计假设的威胁场景需匹配实际部署环境
3. **记录已知权衡** - 在 MEMORY.md 记录为什么某些"安全问题"被接受

### 详细文档
- 审计报告：`logs/operations/2026-02-26-20-59-30-config-audit-report.md`
- 复核分析：`logs/operations/2026-02-26-21-05-00-security-audit-review.md`
- 修复脚本：`fix-security-config.sh` (未执行)

---

## 配置清理和推送 (2026-02-26)

### 操作
- 删除废弃 `life/` 目录（空配置，未被任何文件引用）
- 清理嵌套 git 仓库（`agents/life-workspace/.git`, `skills/openclaw-wecom/.git`）
- 移除 Python 缓存和运行时状态文件
- 提交并推送到远程仓库

### Git 提交
- Commit: `378523c chore: 配置审计和清理 - 2026-02-26`
- 远程：`gl.tigerone.tech:sw_dm/openClaw_agent_dm.git`
- 备份：`/root/.openclaw/backups/workspace-20260226-210956.tar.gz`

### 保留的目录
- `agents/life-workspace/` - 测试用 Agent 工作区
- `skills/openclaw-wecom/` - 企业微信技能（TypeScript 实现）
Initial commit: OpenClaw workspace baseline with memory architecture 1 month ago			`# MEMORY.md - Long-term Memory`

			`This file contains curated long-term memories and important context.`

			`## Memory Management Strategy`
			`- MEMORY.md: Curated long-term memories, important decisions, security templates, and key configurations`
			`- QMD System: Automated memory backend with semantic search, auto-updates every 5 minutes`
			`- Usage: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation`
			`- Access: MEMORY.md loaded only in main sessions (direct chats) for security`

			`## QMD Configuration`
			`- Backend: qmd`
			`- Auto-update: every 5 minutes`
			`- Include default memory: true`
			`- Last verified: 2026-02-20`

			`## Server Security Hardening Template (2026-02-20)`

			`### Environment`
			`- Server: Ubuntu 24.04 LTS VPS (KVM)`
			`- Panel: 宝塔面板 (BT-Panel) on port 888`
			`- Public IP: 204.12.203.203`

			`### Security Configuration Applied`
			`1. Port Exposure Minimization:`
			`- Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible`
			`- SSH (port 22) restricted to internal/network access only`
			`- OpenClaw gateway (port 18789) bound to localhost only`
			`- All other services (MySQL, custom apps) internal-only`

			`2. OpenClaw Secure Deployment:`
			- Gateway configured with `bind: "localhost"` instead of `"lan"`
			`- Access exclusively through Nginx reverse proxy with HTTPS`
			`- Token-based authentication enabled`
			`- WebSocket support properly configured in Nginx`

			`3. Firewall Management:`
			`- Use 宝塔面板 (BT-Panel) built-in firewall for port management`
			`- Alternative: system-level firewall (ufw/iptables) if no panel available`
			`- Regular external port scanning to verify exposure`

			`4. Critical Security Principles:`
			`- Never expose sensitive services directly to public internet`
			`- Always use reverse proxy with TLS termination for web services`
			`- Implement defense in depth (firewall + service binding + authentication)`
			- Regular security audits using `openclaw security audit --deep`

			`### Migration Checklist for New Servers`
			`- [ ] Install and configure 宝塔面板 or equivalent server management panel`
			`- [ ] Set up Nginx reverse proxy with proper WebSocket support`
			`- [ ] Configure OpenClaw with localhost binding only`
			`- [ ] Restrict public ports to 80/443 only via firewall`
			`- [ ] Enable automatic security updates`
			`- [ ] Run initial security audit and document baseline`
			`- [ ] Schedule periodic security audits via OpenClaw cron`

			`### Lessons Learned`
			`- Panel-based firewalls (宝塔/aapanel) must be verified with external port scans`
			`- Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks`
			`- Nginx reverse proxy configuration is essential for secure OpenClaw deployment`

			`## Agent Operations Logging Practice (2026-02-20)`

			`### Log Directory Structure`
			- `/root/.openclaw/workspace/logs/operations/` - Manual operations and important changes
			- `/root/.openclaw/workspace/logs/system/` - System-generated logs
			- `/root/.openclaw/workspace/logs/agents/` - Individual agent logs
			- `/root/.openclaw/workspace/logs/security/` - Security operations and audits

			`### Automatic Logging Triggers`
			`1. Configuration Changes: Any modification to config files (.json, .yaml, etc.)`
			`2. Security Modifications: Firewall rules, authentication changes, port modifications`
			`3. Agent Lifecycle: Deployment, updates, removal of agents`
			`4. System Optimizations: Performance tuning, resource allocation changes`
			`5. Troubleshooting: Error diagnosis and resolution procedures`
			`6. Memory Updates: Significant changes to MEMORY.md or memory management`

			`### Log Format Standard`
			- Filename: `YYYY-MM-DD-HH-MM-SS-description.log`
			`- Timestamp: UTC time format`
			- Content: `[TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state`

			`### Implementation Guidelines`
			`- Always log before making changes (capture current state)`
			`- Include rollback instructions when applicable`
			`- Redact sensitive information (passwords, tokens, private keys)`
			`- Reference related MEMORY.md entries for context`
			`- Use QMD for routine operational context, MEMORY.md for strategic decisions`

			`## Agent Health Monitoring & Alerting System (2026-02-20)`

			`### Features Implemented`
			`1. Crash Detection: Monitors uncaught exceptions and unhandled rejections`
			`2. Health Checks: Periodic service health verification (every 30 seconds)`
			`3. Multi-Channel Notifications: Telegram alerts for critical events`
			4. Automatic Logging: All alerts logged to `/logs/agents/health-YYYY-MM-DD.log`
			`5. Extensible Design: Easy to add new notification channels`

			`### Components Created`
			- Skill: `agent-monitor/SKILL.md` - Documentation and usage guide
			- Monitor Script: `agent-monitor.js` - Core monitoring logic
			- Startup Script: `start-agent-monitor.sh` - Easy deployment
			- Log Directory: `/logs/agents/` - Dedicated logging location

			`### Alert Severity Levels`
			`- CRITICAL: Process crashes, uncaught exceptions`
			`- ERROR: Unhandled rejections, failed operations`
			`- WARNING: Health check failures, performance issues`
			`- INFO: Service status updates, recovery notifications`

			`### Integration Points`
			`- Automatically integrated with existing Telegram channel`
			`- Compatible with OpenClaw's agent architecture`
			`- Works alongside existing logging and memory systems`
			`- Can monitor any Node.js-based agent process`

			`### Usage Instructions`
			1. Source the startup script: `source /root/.openclaw/workspace/start-agent-monitor.sh`
			2. Call `startAgentMonitor("agent-name", healthCheckFunction)`
			`3. Monitor automatically sends alerts on errors/crashes`
feat: Complete system architecture upgrade with auto-healing, notifications, and rollback - Added systemd services for system-level persistence (gateway + monitor) - Enhanced agent-monitor.js with auto-healing and Telegram notifications - Created deploy.sh for one-click deployment and management - Updated CORE_INDEX.md with complete architecture documentation - Updated MEMORY.md with implementation details and usage guide - All memory files now tracked in git for version control and rollback Features implemented: ✓ System-Level: Services auto-start on boot, survive logout/reboot ✓ Auto-Healing: Crash detection, auto-restart with rate limiting ✓ Multi-Layer Memory: Core (CORE_INDEX.md) + Long-term (MEMORY.md) + Daily (memory/) ✓ Git Rollback: ./deploy.sh rollback / rollback-to <commit> ✓ Telegram Notifications: Alerts on stop/error/restart events 1 month ago			4. Check logs in `/logs/agents/` for detailed information

			`---`

			`## Complete System Architecture Upgrade (2026-02-20 14:25 UTC)`

			`### ✅ All 5 Core Requirements Implemented`

			`#### 1. System-Level Persistence ✓`
			- Systemd Services: `openclaw-gateway.service` + `openclaw-agent-monitor.service`
			`- Auto-start on Boot: Both services enabled in multi-user.target`
			`- Resource Limits: Memory (2G/512M), CPU (80%/20%), watchdog timers`
			- Status: `systemctl status openclaw-gateway` / `systemctl status openclaw-agent-monitor`

			`#### 2. Auto-Healing ✓`
			`- Crash Detection: Monitors process exits, signals, uncaught exceptions`
			`- Auto-Restart: Systemd Restart=always + monitor script restart logic`
			`- Restart Limits: Max 5 restarts per 5 minutes (prevents restart loops)`
			`- Health Checks: Every 30 seconds, automatic recovery on failure`

			`#### 3. Multi-Layer Memory Architecture ✓`
			- Core Memory: `CORE_INDEX.md` - Identity, structure, file index (always loaded first)
			- Long-term Memory: `MEMORY.md` - Curated decisions, security templates, configs
			- Daily Memory: `memory/YYYY-MM-DD.md` - Raw conversation logs (auto-saved)
			`- Passive Archive: On-demand conversion of valuable conversations to skills/notes`
			`- Git Integration: All memory files tracked with version history`

			`#### 4. Git One-Click Rollback ✓`
			- Repository: `/root/.openclaw/workspace` (already initialized)
			- Deploy Script: `./deploy.sh rollback` - Rollback to previous commit
			- Specific Rollback: `./deploy.sh rollback-to <commit>` - Rollback to specific commit
			`- Auto-Backup: Backup created before rollback`
			`- Service Restart: Automatic service restart after rollback`

			`#### 5. Telegram Notifications ✓`
			`- Triggers: Service stop, error, crash, restart events`
			`- Channels: Telegram (via bot API) + OpenClaw message tool`
			`- Severity Levels: CRITICAL, ERROR, WARNING, INFO with emoji indicators`
			- Logging: All notifications logged to `/logs/agents/health-YYYY-MM-DD.log`

			`### 📋 Management Commands (deploy.sh)`
			```bash
			`./deploy.sh install # Install & start all systemd services`
			`./deploy.sh start # Start all services`
			`./deploy.sh stop # Stop all services`
			`./deploy.sh restart # Restart all services`
			`./deploy.sh status # Show detailed service status`
			`./deploy.sh logs # Show recent logs (last 50 lines)`
			`./deploy.sh health # Run comprehensive health check`
			`./deploy.sh backup # Create timestamped backup`
			`./deploy.sh rollback # Rollback to previous git commit`
			`./deploy.sh rollback-to <commit> # Rollback to specific commit`
			`./deploy.sh help # Show help message`
			```

			`### 🔧 Systemd Service Details`
			- Gateway Service: `/etc/systemd/system/openclaw-gateway.service`
			`- Memory limit: 2G, CPU: 80%, Watchdog: 30s`
			`- Restart: always, RestartSec: 10s`
			- Logs: `journalctl -u openclaw-gateway -f`

			- Monitor Service: `/etc/systemd/system/openclaw-agent-monitor.service`
			`- Memory limit: 512M, CPU: 20%`
			`- Restart: always, RestartSec: 5s`
			- Logs: `journalctl -u openclaw-agent-monitor -f`

			`### 📊 Health Check Metrics`
			`- Gateway service status (active/inactive)`
			`- Agent monitor status (active/inactive)`
			`- Disk usage (warning at 80%)`
			`- Memory usage (warning at 80%)`

			`### 🎯 Next Steps (Future Enhancements)`
			`- [ ] Add Prometheus/Grafana monitoring dashboard`
			`- [ ] Implement log rotation and archival`
			`- [ ] Add email notifications as backup channel`
			`- [ ] Create web-based admin dashboard`
fix: User-level systemd configuration with linger support Critical fix for VPS/server deployments: - Gateway now uses user-level systemd (~/.config/systemd/user/) - Agent monitor uses system-level systemd (/etc/systemd/system/) - Added loginctl enable-linger requirement for persistence - Set XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS env vars - Updated deploy.sh with proper environment setup - Enhanced health check to verify linger and runtime dir - Updated agent-monitor.js with reliable gateway detection Documentation: - Added comprehensive systemd troubleshooting guide to MEMORY.md - Documented user-level vs system-level service architecture - Created deployment checklist for new servers - Added best practices for multi-agent deployments Files changed: - systemd/openclaw-gateway-user.service (new) - systemd/openclaw-agent-monitor.service (updated) - deploy.sh (complete rewrite of service management) - agent-monitor.js (improved gateway status checks) - MEMORY.md (added systemd troubleshooting guide) 1 month ago			`- [ ] Add automated security scanning in CI/CD`

			`---`

			`## User-Level vs System-Level Systemd Services - Critical Lesson (2026-02-20 14:35 UTC)`

			`### Problem Discovered`
			Initial deployment used system-level systemd services (`/etc/systemd/system/`) for OpenClaw Gateway, but OpenClaw natively uses user-level systemd (`~/.config/systemd/user/`). This caused:
			`- Service restart loops (5 attempts then failure)`
			- Error: `systemctl --user unavailable: Failed to connect to bus: No medium found`
			`- Conflicts between system and user service definitions`

			`### Root Cause`
			`OpenClaw Gateway is designed as a user-level service because:`
			`1. It runs under the user's context, not root`
			2. It needs access to user-specific config (`~/.openclaw/`)
			`3. User-level services have different environment requirements`

			`### Solution: Hybrid Architecture`

			`#### User-Level Service (Gateway)`
			- Location: `~/.config/systemd/user/openclaw-gateway.service`
			`- Required Setup:`
			```bash
			`# Enable linger (CRITICAL - allows user services to run without login session)`
			`loginctl enable-linger $(whoami)`

			`# Set environment variables`
			`export XDG_RUNTIME_DIR=/run/user/$(id -u)`
			`export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus"`
			```
			`- Management Commands:`
			```bash
			`systemctl --user status openclaw-gateway`
			`systemctl --user start/stop/restart openclaw-gateway`
			`journalctl --user -u openclaw-gateway -f`
			```

			`#### System-Level Service (Agent Monitor)`
			- Location: `/etc/systemd/system/openclaw-agent-monitor.service`
			`- Purpose: Independently monitor the gateway (survives user session issues)`
			`- Management Commands:`
			```bash
			`systemctl status openclaw-agent-monitor`
			`systemctl start/stop/restart openclaw-agent-monitor`
			`journalctl -u openclaw-agent-monitor -f`
			```

			`### Deployment Checklist for New Servers`
			```bash
			`# 1. Enable user linger (MUST DO FIRST)`
			`loginctl enable-linger $(whoami)`

			`# 2. Create runtime directory if needed`
			`mkdir -p /run/user/$(id -u)`
			`chmod 700 /run/user/$(id -u)`

			`# 3. Export environment (add to ~/.bashrc for persistence)`
			`echo 'export XDG_RUNTIME_DIR=/run/user/$(id -u)' >> ~/.bashrc`
			`echo 'export DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/$(id -u)/bus' >> ~/.bashrc`

			`# 4. Install services`
			`./deploy.sh install`

			`# 5. Verify`
			`./deploy.sh health`
			```

			`### Troubleshooting Guide`

			`#### Error: "Failed to connect to bus: No medium found"`
			`Cause: User linger not enabled or environment variables not set`
			`Fix:`
			```bash
			`loginctl enable-linger $(whoami)`
			`export XDG_RUNTIME_DIR=/run/user/$(id -u)`
			`export DBUS_SESSION_BUS_ADDRESS="unix:path=/run/user/$(id -u)/bus"`
			```

			`#### Error: "Start request repeated too quickly"`
			`Cause: Service crashing due to misconfiguration`
			Fix: Check logs with `journalctl --user -u openclaw-gateway -f`

			`#### User service not starting after reboot`
			`Cause: Linger not enabled`
			Fix: `loginctl enable-linger $(whoami)`

			`### Best Practices for Multi-Agent Deployments`
			`1. Always enable linger on first setup - document this in deployment guide`
			`2. Use hybrid architecture - user-level for agents, system-level for monitors`
			`3. Set environment variables in startup scripts, not just shell config`
			`4. Test after reboot - verify services auto-start correctly`
			`5. Document in MEMORY.md - share lessons across agent instances`

			`### Updated deploy.sh Features`
			`- Automatically enables linger during install`
			`- Sets up XDG_RUNTIME_DIR and DBUS_SESSION_BUS_ADDRESS`
			- Uses `systemctl --user` for gateway, `systemctl` for monitor
			`- Health check verifies linger status and runtime directory`
			`- Proper log commands for both service types`

docs: 更新 MEMORY.md - 安全审计误报分析和配置清理记录 1 month ago			`---`
fix: 统一 Collection 名称为 mem0_v4_shared - 2026-02-27 - 所有 mem0 配置统一使用 mem0_v4_shared (陈医生/张大师共用) - 修改文件：mem0_client.py, config.yaml, skill.json, config-life.yaml, life-agent.json - 更新文档：SYSTEM_ARCHITECTURE.md, MEMORY.md - 创建操作日志：logs/operations/2026-02-27-08-55-00-unify-collection-name.log - 验证：Qdrant Collection 已创建，维度 1024，状态 green 王院长指示：关键配置修改必须经过确认，不得随意更改 1 month ago			`## Collection 名称统一为 mem0_v4_shared (2026-02-27)`

			`### 背景`
			`之前配置中存在 Collection 名称不一致问题：`
			- 代码实际使用：`mem0_global_v4`
			- 用户指定/文档记录：`mem0_v4_shared`（陈医生和张大师共用）

			`### 修改决策`
			王院长明确指示：所有 Collection 统一使用 `mem0_v4_shared`，不得随意修改关键配置。

			`### 修改文件列表`
			1. `skills/mem0-integration/mem0_client.py`
			2. `skills/mem0-integration/config.yaml`
			3. `skills/mem0-integration/skill.json`
			4. `skills/mem0-integration/config-life.yaml`
			5. `agents/life-agent.json`
			6. `agents/life-workspace/skills/mem0-integration/config.yaml`
			7. `docs/SYSTEM_ARCHITECTURE.md`

			`### 验证结果`
			`- ✅ Gateway 重启成功 (systemd 服务正常)`
			- ✅ Qdrant Collection `mem0_v4_shared` 已创建
			`- ✅ 向量维度：1024 (text-embedding-v4)`
			`- ✅ 距离度量：Cosine`
			`- ✅ 元数据索引：user_id, agent_id, actor_id, run_id`
			`- ✅ Embedding 计费通道：Bailian 标准计费`

			`### 操作日志`
			`/root/.openclaw/workspace/logs/operations/2026-02-27-08-55-00-unify-collection-name.log`

			`### 重要原则`
			`- 关键配置（Collection 名称、Embedding 模型、计费通道）修改必须经过用户确认`
			- 所有 Agent 共享同一 Collection，通过 `metadata.agent_id` 实现逻辑隔离

			`---`

docs: 更新 MEMORY.md - 安全审计误报分析和配置清理记录 1 month ago			`## 安全审计误报分析 (2026-02-26)`

			`### 背景`
			执行 `openclaw security audit --deep` 发现 4 个 CRITICAL/WARNING 问题，经人工复核确认为误报或已知权衡。

			`### 误报项及原因`

			`\| 审计项 \| 原始评级 \| 复核结论 \| 根因 \|`
			`\|--------\|----------\|----------\|------\|`
			\| Gateway 绑定 `lan` \| CRITICAL \| 误报 \| 审计工具静态分析配置文件，无法感知运行时绑定到 Tailscale (100.115.94.1) \|
			\| 设备认证禁用 \| CRITICAL \| 已知权衡 \| 解决 HTTP 下 `isSecureContext=false` 问题，Tailscale+token 双重保护 \|
			`\| 无插件白名单 \| WARNING \| 建议修复 \| 已确认暂不修复（成本低但收益有限） \|`
			`\| 无速率限制 \| WARNING \| 威胁模型不匹配 \| Tailscale 封闭网络 +48 字符强 token，暴力破解风险接近零 \|`
			`\| MemoryLimit 废弃 \| WARNING \| 误报 \| 审计参考 workspace 模板，实际 service 文件无此参数 \|`

			`### 核心教训`
			`1. 安全审计是静态分析 - 无法替代人工判断，需结合运行时上下文`
			`2. 理解威胁模型 - 审计假设的威胁场景需匹配实际部署环境`
			`3. 记录已知权衡 - 在 MEMORY.md 记录为什么某些"安全问题"被接受`

			`### 详细文档`
			- 审计报告：`logs/operations/2026-02-26-20-59-30-config-audit-report.md`
			- 复核分析：`logs/operations/2026-02-26-21-05-00-security-audit-review.md`
			- 修复脚本：`fix-security-config.sh` (未执行)

			`---`

			`## 配置清理和推送 (2026-02-26)`

			`### 操作`
			- 删除废弃 `life/` 目录（空配置，未被任何文件引用）
			- 清理嵌套 git 仓库（`agents/life-workspace/.git`, `skills/openclaw-wecom/.git`）
			`- 移除 Python 缓存和运行时状态文件`
			`- 提交并推送到远程仓库`

			`### Git 提交`
			- Commit: `378523c chore: 配置审计和清理 - 2026-02-26`
			- 远程：`gl.tigerone.tech:sw_dm/openClaw_agent_dm.git`
			- 备份：`/root/.openclaw/backups/workspace-20260226-210956.tar.gz`

			`### 保留的目录`
			- `agents/life-workspace/` - 测试用 Agent 工作区
			- `skills/openclaw-wecom/` - 企业微信技能（TypeScript 实现）