You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
 
 

8.8 KiB

MEMORY.md - Long-term Memory

This file contains curated long-term memories and important context.

Memory Management Strategy

  • MEMORY.md: Curated long-term memories, important decisions, security templates, and key configurations
  • QMD System: Automated memory backend with semantic search, auto-updates every 5 minutes
  • Usage: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation
  • Access: MEMORY.md loaded only in main sessions (direct chats) for security

QMD Configuration

  • Backend: qmd
  • Auto-update: every 5 minutes
  • Include default memory: true
  • Last verified: 2026-02-20

Server Security Hardening Template (2026-02-20)

Environment

  • Server: Ubuntu 24.04 LTS VPS (KVM)
  • Panel: 宝塔面板 (BT-Panel) on port 888
  • Public IP: 204.12.203.203

Security Configuration Applied

  1. Port Exposure Minimization:

    • Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible
    • SSH (port 22) restricted to internal/network access only
    • OpenClaw gateway (port 18789) bound to localhost only
    • All other services (MySQL, custom apps) internal-only
  2. OpenClaw Secure Deployment:

    • Gateway configured with bind: "localhost" instead of "lan"
    • Access exclusively through Nginx reverse proxy with HTTPS
    • Token-based authentication enabled
    • WebSocket support properly configured in Nginx
  3. Firewall Management:

    • Use 宝塔面板 (BT-Panel) built-in firewall for port management
    • Alternative: system-level firewall (ufw/iptables) if no panel available
    • Regular external port scanning to verify exposure
  4. Critical Security Principles:

    • Never expose sensitive services directly to public internet
    • Always use reverse proxy with TLS termination for web services
    • Implement defense in depth (firewall + service binding + authentication)
    • Regular security audits using openclaw security audit --deep

Migration Checklist for New Servers

  • Install and configure 宝塔面板 or equivalent server management panel
  • Set up Nginx reverse proxy with proper WebSocket support
  • Configure OpenClaw with localhost binding only
  • Restrict public ports to 80/443 only via firewall
  • Enable automatic security updates
  • Run initial security audit and document baseline
  • Schedule periodic security audits via OpenClaw cron

Lessons Learned

  • Panel-based firewalls (宝塔/aapanel) must be verified with external port scans
  • Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks
  • Nginx reverse proxy configuration is essential for secure OpenClaw deployment

Agent Operations Logging Practice (2026-02-20)

Log Directory Structure

  • /root/.openclaw/workspace/logs/operations/ - Manual operations and important changes
  • /root/.openclaw/workspace/logs/system/ - System-generated logs
  • /root/.openclaw/workspace/logs/agents/ - Individual agent logs
  • /root/.openclaw/workspace/logs/security/ - Security operations and audits

Automatic Logging Triggers

  1. Configuration Changes: Any modification to config files (.json, .yaml, etc.)
  2. Security Modifications: Firewall rules, authentication changes, port modifications
  3. Agent Lifecycle: Deployment, updates, removal of agents
  4. System Optimizations: Performance tuning, resource allocation changes
  5. Troubleshooting: Error diagnosis and resolution procedures
  6. Memory Updates: Significant changes to MEMORY.md or memory management

Log Format Standard

  • Filename: YYYY-MM-DD-HH-MM-SS-description.log
  • Timestamp: UTC time format
  • Content: [TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state

Implementation Guidelines

  • Always log before making changes (capture current state)
  • Include rollback instructions when applicable
  • Redact sensitive information (passwords, tokens, private keys)
  • Reference related MEMORY.md entries for context
  • Use QMD for routine operational context, MEMORY.md for strategic decisions

Agent Health Monitoring & Alerting System (2026-02-20)

Features Implemented

  1. Crash Detection: Monitors uncaught exceptions and unhandled rejections
  2. Health Checks: Periodic service health verification (every 30 seconds)
  3. Multi-Channel Notifications: Telegram alerts for critical events
  4. Automatic Logging: All alerts logged to /logs/agents/health-YYYY-MM-DD.log
  5. Extensible Design: Easy to add new notification channels

Components Created

  • Skill: agent-monitor/SKILL.md - Documentation and usage guide
  • Monitor Script: agent-monitor.js - Core monitoring logic
  • Startup Script: start-agent-monitor.sh - Easy deployment
  • Log Directory: /logs/agents/ - Dedicated logging location

Alert Severity Levels

  • CRITICAL: Process crashes, uncaught exceptions
  • ERROR: Unhandled rejections, failed operations
  • WARNING: Health check failures, performance issues
  • INFO: Service status updates, recovery notifications

Integration Points

  • Automatically integrated with existing Telegram channel
  • Compatible with OpenClaw's agent architecture
  • Works alongside existing logging and memory systems
  • Can monitor any Node.js-based agent process

Usage Instructions

  1. Source the startup script: source /root/.openclaw/workspace/start-agent-monitor.sh
  2. Call startAgentMonitor("agent-name", healthCheckFunction)
  3. Monitor automatically sends alerts on errors/crashes
  4. Check logs in /logs/agents/ for detailed information

Complete System Architecture Upgrade (2026-02-20 14:25 UTC)

All 5 Core Requirements Implemented

1. System-Level Persistence ✓

  • Systemd Services: openclaw-gateway.service + openclaw-agent-monitor.service
  • Auto-start on Boot: Both services enabled in multi-user.target
  • Resource Limits: Memory (2G/512M), CPU (80%/20%), watchdog timers
  • Status: systemctl status openclaw-gateway / systemctl status openclaw-agent-monitor

2. Auto-Healing ✓

  • Crash Detection: Monitors process exits, signals, uncaught exceptions
  • Auto-Restart: Systemd Restart=always + monitor script restart logic
  • Restart Limits: Max 5 restarts per 5 minutes (prevents restart loops)
  • Health Checks: Every 30 seconds, automatic recovery on failure

3. Multi-Layer Memory Architecture ✓

  • Core Memory: CORE_INDEX.md - Identity, structure, file index (always loaded first)
  • Long-term Memory: MEMORY.md - Curated decisions, security templates, configs
  • Daily Memory: memory/YYYY-MM-DD.md - Raw conversation logs (auto-saved)
  • Passive Archive: On-demand conversion of valuable conversations to skills/notes
  • Git Integration: All memory files tracked with version history

4. Git One-Click Rollback ✓

  • Repository: /root/.openclaw/workspace (already initialized)
  • Deploy Script: ./deploy.sh rollback - Rollback to previous commit
  • Specific Rollback: ./deploy.sh rollback-to <commit> - Rollback to specific commit
  • Auto-Backup: Backup created before rollback
  • Service Restart: Automatic service restart after rollback

5. Telegram Notifications ✓

  • Triggers: Service stop, error, crash, restart events
  • Channels: Telegram (via bot API) + OpenClaw message tool
  • Severity Levels: CRITICAL, ERROR, WARNING, INFO with emoji indicators
  • Logging: All notifications logged to /logs/agents/health-YYYY-MM-DD.log

📋 Management Commands (deploy.sh)

./deploy.sh install    # Install & start all systemd services
./deploy.sh start      # Start all services
./deploy.sh stop       # Stop all services
./deploy.sh restart    # Restart all services
./deploy.sh status     # Show detailed service status
./deploy.sh logs       # Show recent logs (last 50 lines)
./deploy.sh health     # Run comprehensive health check
./deploy.sh backup     # Create timestamped backup
./deploy.sh rollback   # Rollback to previous git commit
./deploy.sh rollback-to <commit>  # Rollback to specific commit
./deploy.sh help       # Show help message

🔧 Systemd Service Details

  • Gateway Service: /etc/systemd/system/openclaw-gateway.service

    • Memory limit: 2G, CPU: 80%, Watchdog: 30s
    • Restart: always, RestartSec: 10s
    • Logs: journalctl -u openclaw-gateway -f
  • Monitor Service: /etc/systemd/system/openclaw-agent-monitor.service

    • Memory limit: 512M, CPU: 20%
    • Restart: always, RestartSec: 5s
    • Logs: journalctl -u openclaw-agent-monitor -f

📊 Health Check Metrics

  • Gateway service status (active/inactive)
  • Agent monitor status (active/inactive)
  • Disk usage (warning at 80%)
  • Memory usage (warning at 80%)

🎯 Next Steps (Future Enhancements)

  • Add Prometheus/Grafana monitoring dashboard
  • Implement log rotation and archival
  • Add email notifications as backup channel
  • Create web-based admin dashboard
  • Add automated security scanning in CI/CD