8.8 KiB

Raw Blame History

MEMORY.md - Long-term Memory

This file contains curated long-term memories and important context.

Memory Management Strategy

MEMORY.md: Curated long-term memories, important decisions, security templates, and key configurations
QMD System: Automated memory backend with semantic search, auto-updates every 5 minutes
Usage: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation
Access: MEMORY.md loaded only in main sessions (direct chats) for security

QMD Configuration

Backend: qmd
Auto-update: every 5 minutes
Include default memory: true
Last verified: 2026-02-20

Server Security Hardening Template (2026-02-20)

Environment

Server: Ubuntu 24.04 LTS VPS (KVM)
Panel: 宝塔面板 (BT-Panel) on port 888
Public IP: 204.12.203.203

Security Configuration Applied

Port Exposure Minimization:
- Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible
- SSH (port 22) restricted to internal/network access only
- OpenClaw gateway (port 18789) bound to localhost only
- All other services (MySQL, custom apps) internal-only
OpenClaw Secure Deployment:
- Gateway configured with bind: "localhost" instead of "lan"
- Access exclusively through Nginx reverse proxy with HTTPS
- Token-based authentication enabled
- WebSocket support properly configured in Nginx
Firewall Management:
- Use 宝塔面板 (BT-Panel) built-in firewall for port management
- Alternative: system-level firewall (ufw/iptables) if no panel available
- Regular external port scanning to verify exposure
Critical Security Principles:
- Never expose sensitive services directly to public internet
- Always use reverse proxy with TLS termination for web services
- Implement defense in depth (firewall + service binding + authentication)
- Regular security audits using openclaw security audit --deep

Migration Checklist for New Servers

Install and configure 宝塔面板 or equivalent server management panel
Set up Nginx reverse proxy with proper WebSocket support
Configure OpenClaw with localhost binding only
Restrict public ports to 80/443 only via firewall
Enable automatic security updates
Run initial security audit and document baseline
Schedule periodic security audits via OpenClaw cron

Lessons Learned

Panel-based firewalls (宝塔/aapanel) must be verified with external port scans
Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks
Nginx reverse proxy configuration is essential for secure OpenClaw deployment

Agent Operations Logging Practice (2026-02-20)

Log Directory Structure

/root/.openclaw/workspace/logs/operations/ - Manual operations and important changes
/root/.openclaw/workspace/logs/system/ - System-generated logs
/root/.openclaw/workspace/logs/agents/ - Individual agent logs
/root/.openclaw/workspace/logs/security/ - Security operations and audits

Automatic Logging Triggers

Configuration Changes: Any modification to config files (.json, .yaml, etc.)
Security Modifications: Firewall rules, authentication changes, port modifications
Agent Lifecycle: Deployment, updates, removal of agents
System Optimizations: Performance tuning, resource allocation changes
Troubleshooting: Error diagnosis and resolution procedures
Memory Updates: Significant changes to MEMORY.md or memory management

Log Format Standard

Filename: YYYY-MM-DD-HH-MM-SS-description.log
Timestamp: UTC time format
Content: [TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state

Implementation Guidelines

Always log before making changes (capture current state)
Include rollback instructions when applicable
Redact sensitive information (passwords, tokens, private keys)
Reference related MEMORY.md entries for context
Use QMD for routine operational context, MEMORY.md for strategic decisions

Agent Health Monitoring & Alerting System (2026-02-20)

Features Implemented

Crash Detection: Monitors uncaught exceptions and unhandled rejections
Health Checks: Periodic service health verification (every 30 seconds)
Multi-Channel Notifications: Telegram alerts for critical events
Automatic Logging: All alerts logged to /logs/agents/health-YYYY-MM-DD.log
Extensible Design: Easy to add new notification channels

Components Created

Skill: agent-monitor/SKILL.md - Documentation and usage guide
Monitor Script: agent-monitor.js - Core monitoring logic
Startup Script: start-agent-monitor.sh - Easy deployment
Log Directory: /logs/agents/ - Dedicated logging location

Alert Severity Levels

CRITICAL: Process crashes, uncaught exceptions
ERROR: Unhandled rejections, failed operations
WARNING: Health check failures, performance issues
INFO: Service status updates, recovery notifications

Integration Points

Automatically integrated with existing Telegram channel
Compatible with OpenClaw's agent architecture
Works alongside existing logging and memory systems
Can monitor any Node.js-based agent process

Usage Instructions

Source the startup script: source /root/.openclaw/workspace/start-agent-monitor.sh
Call startAgentMonitor("agent-name", healthCheckFunction)
Monitor automatically sends alerts on errors/crashes
Check logs in /logs/agents/ for detailed information

Complete System Architecture Upgrade (2026-02-20 14:25 UTC)

✅ All 5 Core Requirements Implemented

1. System-Level Persistence ✓

Systemd Services: openclaw-gateway.service + openclaw-agent-monitor.service
Auto-start on Boot: Both services enabled in multi-user.target
Resource Limits: Memory (2G/512M), CPU (80%/20%), watchdog timers
Status: systemctl status openclaw-gateway / systemctl status openclaw-agent-monitor

2. Auto-Healing ✓

Crash Detection: Monitors process exits, signals, uncaught exceptions
Auto-Restart: Systemd Restart=always + monitor script restart logic
Restart Limits: Max 5 restarts per 5 minutes (prevents restart loops)
Health Checks: Every 30 seconds, automatic recovery on failure

3. Multi-Layer Memory Architecture ✓

Core Memory: CORE_INDEX.md - Identity, structure, file index (always loaded first)
Long-term Memory: MEMORY.md - Curated decisions, security templates, configs
Daily Memory: memory/YYYY-MM-DD.md - Raw conversation logs (auto-saved)
Passive Archive: On-demand conversion of valuable conversations to skills/notes
Git Integration: All memory files tracked with version history

4. Git One-Click Rollback ✓

Repository: /root/.openclaw/workspace (already initialized)
Deploy Script: ./deploy.sh rollback - Rollback to previous commit
Specific Rollback: ./deploy.sh rollback-to <commit> - Rollback to specific commit
Auto-Backup: Backup created before rollback
Service Restart: Automatic service restart after rollback

5. Telegram Notifications ✓

Triggers: Service stop, error, crash, restart events
Channels: Telegram (via bot API) + OpenClaw message tool
Severity Levels: CRITICAL, ERROR, WARNING, INFO with emoji indicators
Logging: All notifications logged to /logs/agents/health-YYYY-MM-DD.log

📋 Management Commands (deploy.sh)

./deploy.sh install    # Install & start all systemd services
./deploy.sh start      # Start all services
./deploy.sh stop       # Stop all services
./deploy.sh restart    # Restart all services
./deploy.sh status     # Show detailed service status
./deploy.sh logs       # Show recent logs (last 50 lines)
./deploy.sh health     # Run comprehensive health check
./deploy.sh backup     # Create timestamped backup
./deploy.sh rollback   # Rollback to previous git commit
./deploy.sh rollback-to <commit>  # Rollback to specific commit
./deploy.sh help       # Show help message

🔧 Systemd Service Details

Gateway Service: /etc/systemd/system/openclaw-gateway.service
- Memory limit: 2G, CPU: 80%, Watchdog: 30s
- Restart: always, RestartSec: 10s
- Logs: journalctl -u openclaw-gateway -f
Monitor Service: /etc/systemd/system/openclaw-agent-monitor.service
- Memory limit: 512M, CPU: 20%
- Restart: always, RestartSec: 5s
- Logs: journalctl -u openclaw-agent-monitor -f

📊 Health Check Metrics

Gateway service status (active/inactive)
Agent monitor status (active/inactive)
Disk usage (warning at 80%)
Memory usage (warning at 80%)

🎯 Next Steps (Future Enhancements)

Add Prometheus/Grafana monitoring dashboard
Implement log rotation and archival
Add email notifications as backup channel
Create web-based admin dashboard
Add automated security scanning in CI/CD

8.8 KiB Raw Blame History

MEMORY.md - Long-term Memory

Memory Management Strategy

QMD Configuration

Server Security Hardening Template (2026-02-20)

Environment

Security Configuration Applied

Migration Checklist for New Servers

Lessons Learned

Agent Operations Logging Practice (2026-02-20)

Log Directory Structure

Automatic Logging Triggers

Log Format Standard

Implementation Guidelines

Agent Health Monitoring & Alerting System (2026-02-20)

Features Implemented

Components Created

Alert Severity Levels

Integration Points

Usage Instructions

Complete System Architecture Upgrade (2026-02-20 14:25 UTC)

✅ All 5 Core Requirements Implemented

1. System-Level Persistence ✓

2. Auto-Healing ✓

3. Multi-Layer Memory Architecture ✓

4. Git One-Click Rollback ✓

5. Telegram Notifications ✓

📋 Management Commands (deploy.sh)

🔧 Systemd Service Details

📊 Health Check Metrics

🎯 Next Steps (Future Enhancements)

8.8 KiB

Raw Blame History