You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
8.8 KiB
8.8 KiB
MEMORY.md - Long-term Memory
This file contains curated long-term memories and important context.
Memory Management Strategy
- MEMORY.md: Curated long-term memories, important decisions, security templates, and key configurations
- QMD System: Automated memory backend with semantic search, auto-updates every 5 minutes
- Usage: Write significant learnings to MEMORY.md; rely on QMD for daily context and automation
- Access: MEMORY.md loaded only in main sessions (direct chats) for security
QMD Configuration
- Backend: qmd
- Auto-update: every 5 minutes
- Include default memory: true
- Last verified: 2026-02-20
Server Security Hardening Template (2026-02-20)
Environment
- Server: Ubuntu 24.04 LTS VPS (KVM)
- Panel: 宝塔面板 (BT-Panel) on port 888
- Public IP: 204.12.203.203
Security Configuration Applied
-
Port Exposure Minimization:
- Only ports 80 (HTTP) and 443 (HTTPS) publicly accessible
- SSH (port 22) restricted to internal/network access only
- OpenClaw gateway (port 18789) bound to localhost only
- All other services (MySQL, custom apps) internal-only
-
OpenClaw Secure Deployment:
- Gateway configured with
bind: "localhost"instead of"lan" - Access exclusively through Nginx reverse proxy with HTTPS
- Token-based authentication enabled
- WebSocket support properly configured in Nginx
- Gateway configured with
-
Firewall Management:
- Use 宝塔面板 (BT-Panel) built-in firewall for port management
- Alternative: system-level firewall (ufw/iptables) if no panel available
- Regular external port scanning to verify exposure
-
Critical Security Principles:
- Never expose sensitive services directly to public internet
- Always use reverse proxy with TLS termination for web services
- Implement defense in depth (firewall + service binding + authentication)
- Regular security audits using
openclaw security audit --deep
Migration Checklist for New Servers
- Install and configure 宝塔面板 or equivalent server management panel
- Set up Nginx reverse proxy with proper WebSocket support
- Configure OpenClaw with localhost binding only
- Restrict public ports to 80/443 only via firewall
- Enable automatic security updates
- Run initial security audit and document baseline
- Schedule periodic security audits via OpenClaw cron
Lessons Learned
- Panel-based firewalls (宝塔/aapanel) must be verified with external port scans
- Direct service exposure (like OpenClaw on 0.0.0.0) creates critical security risks
- Nginx reverse proxy configuration is essential for secure OpenClaw deployment
Agent Operations Logging Practice (2026-02-20)
Log Directory Structure
/root/.openclaw/workspace/logs/operations/- Manual operations and important changes/root/.openclaw/workspace/logs/system/- System-generated logs/root/.openclaw/workspace/logs/agents/- Individual agent logs/root/.openclaw/workspace/logs/security/- Security operations and audits
Automatic Logging Triggers
- Configuration Changes: Any modification to config files (.json, .yaml, etc.)
- Security Modifications: Firewall rules, authentication changes, port modifications
- Agent Lifecycle: Deployment, updates, removal of agents
- System Optimizations: Performance tuning, resource allocation changes
- Troubleshooting: Error diagnosis and resolution procedures
- Memory Updates: Significant changes to MEMORY.md or memory management
Log Format Standard
- Filename:
YYYY-MM-DD-HH-MM-SS-description.log - Timestamp: UTC time format
- Content:
[TIMESTAMP] [OPERATION_TYPE] [AGENT/USER] Description with before/after state
Implementation Guidelines
- Always log before making changes (capture current state)
- Include rollback instructions when applicable
- Redact sensitive information (passwords, tokens, private keys)
- Reference related MEMORY.md entries for context
- Use QMD for routine operational context, MEMORY.md for strategic decisions
Agent Health Monitoring & Alerting System (2026-02-20)
Features Implemented
- Crash Detection: Monitors uncaught exceptions and unhandled rejections
- Health Checks: Periodic service health verification (every 30 seconds)
- Multi-Channel Notifications: Telegram alerts for critical events
- Automatic Logging: All alerts logged to
/logs/agents/health-YYYY-MM-DD.log - Extensible Design: Easy to add new notification channels
Components Created
- Skill:
agent-monitor/SKILL.md- Documentation and usage guide - Monitor Script:
agent-monitor.js- Core monitoring logic - Startup Script:
start-agent-monitor.sh- Easy deployment - Log Directory:
/logs/agents/- Dedicated logging location
Alert Severity Levels
- CRITICAL: Process crashes, uncaught exceptions
- ERROR: Unhandled rejections, failed operations
- WARNING: Health check failures, performance issues
- INFO: Service status updates, recovery notifications
Integration Points
- Automatically integrated with existing Telegram channel
- Compatible with OpenClaw's agent architecture
- Works alongside existing logging and memory systems
- Can monitor any Node.js-based agent process
Usage Instructions
- Source the startup script:
source /root/.openclaw/workspace/start-agent-monitor.sh - Call
startAgentMonitor("agent-name", healthCheckFunction) - Monitor automatically sends alerts on errors/crashes
- Check logs in
/logs/agents/for detailed information
Complete System Architecture Upgrade (2026-02-20 14:25 UTC)
✅ All 5 Core Requirements Implemented
1. System-Level Persistence ✓
- Systemd Services:
openclaw-gateway.service+openclaw-agent-monitor.service - Auto-start on Boot: Both services enabled in multi-user.target
- Resource Limits: Memory (2G/512M), CPU (80%/20%), watchdog timers
- Status:
systemctl status openclaw-gateway/systemctl status openclaw-agent-monitor
2. Auto-Healing ✓
- Crash Detection: Monitors process exits, signals, uncaught exceptions
- Auto-Restart: Systemd Restart=always + monitor script restart logic
- Restart Limits: Max 5 restarts per 5 minutes (prevents restart loops)
- Health Checks: Every 30 seconds, automatic recovery on failure
3. Multi-Layer Memory Architecture ✓
- Core Memory:
CORE_INDEX.md- Identity, structure, file index (always loaded first) - Long-term Memory:
MEMORY.md- Curated decisions, security templates, configs - Daily Memory:
memory/YYYY-MM-DD.md- Raw conversation logs (auto-saved) - Passive Archive: On-demand conversion of valuable conversations to skills/notes
- Git Integration: All memory files tracked with version history
4. Git One-Click Rollback ✓
- Repository:
/root/.openclaw/workspace(already initialized) - Deploy Script:
./deploy.sh rollback- Rollback to previous commit - Specific Rollback:
./deploy.sh rollback-to <commit>- Rollback to specific commit - Auto-Backup: Backup created before rollback
- Service Restart: Automatic service restart after rollback
5. Telegram Notifications ✓
- Triggers: Service stop, error, crash, restart events
- Channels: Telegram (via bot API) + OpenClaw message tool
- Severity Levels: CRITICAL, ERROR, WARNING, INFO with emoji indicators
- Logging: All notifications logged to
/logs/agents/health-YYYY-MM-DD.log
📋 Management Commands (deploy.sh)
./deploy.sh install # Install & start all systemd services
./deploy.sh start # Start all services
./deploy.sh stop # Stop all services
./deploy.sh restart # Restart all services
./deploy.sh status # Show detailed service status
./deploy.sh logs # Show recent logs (last 50 lines)
./deploy.sh health # Run comprehensive health check
./deploy.sh backup # Create timestamped backup
./deploy.sh rollback # Rollback to previous git commit
./deploy.sh rollback-to <commit> # Rollback to specific commit
./deploy.sh help # Show help message
🔧 Systemd Service Details
-
Gateway Service:
/etc/systemd/system/openclaw-gateway.service- Memory limit: 2G, CPU: 80%, Watchdog: 30s
- Restart: always, RestartSec: 10s
- Logs:
journalctl -u openclaw-gateway -f
-
Monitor Service:
/etc/systemd/system/openclaw-agent-monitor.service- Memory limit: 512M, CPU: 20%
- Restart: always, RestartSec: 5s
- Logs:
journalctl -u openclaw-agent-monitor -f
📊 Health Check Metrics
- Gateway service status (active/inactive)
- Agent monitor status (active/inactive)
- Disk usage (warning at 80%)
- Memory usage (warning at 80%)
🎯 Next Steps (Future Enhancements)
- Add Prometheus/Grafana monitoring dashboard
- Implement log rotation and archival
- Add email notifications as backup channel
- Create web-based admin dashboard
- Add automated security scanning in CI/CD