diff --git a/docs/SERVER_MIGRATION_GUIDE.md b/docs/SERVER_MIGRATION_GUIDE.md new file mode 100644 index 0000000..9fbc00d --- /dev/null +++ b/docs/SERVER_MIGRATION_GUIDE.md @@ -0,0 +1,596 @@ +# Server Migration Guide + +> Last updated: 2026-03-26 +> Covers: `~/.openclaw`, `~/.openclaw-tongge`, `~/.mem0`, Docker stack (`qdrant-master`, `dozzle`) + +--- + +## Overview + +This guide documents how to migrate the full OpenClaw stack (main gateway, Tongge sub-gateway, and Mem0 memory service) to a new Ubuntu VPS while preserving structure, configuration, and memories. + +### Systemd layout (read this before Step 3) + +On the reference deployment, **two gateways run as user systemd units** (root’s `~/.config/systemd/user/`), while the **agent monitor runs as a system unit**: + +| Unit | Scope | Role | +|------|--------|------| +| `openclaw-gateway.service` | **user** (`systemctl --user`) | Main gateway (port 18789) | +| `openclaw-gateway-tongge.service` | **user** (`systemctl --user`) | Tongge gateway (port 18790) | +| `openclaw-agent-monitor.service` | **system** (`systemctl`) | Health monitor | + +`/etc/systemd/system/openclaw-gateway.service` may exist as a legacy or alternate template; the **running** main gateway is typically the **user** unit above. Migrating only `/etc/systemd/system/` and running `systemctl start openclaw-gateway` **does not** restore the active user-managed gateways unless you intentionally switch to system units. + +For user services to start at boot (no interactive login), **root linger** must be enabled once: + +```bash +loginctl enable-linger root +``` + +### When to stop the two gateways on the **old** server + +The guide assumes **two user-scoped gateways**: `openclaw-gateway` (main) and `openclaw-gateway-tongge` (桐哥). + +| Strategy | When to stop them on the old box | Trade-off | +|----------|----------------------------------|-----------| +| **A — Default (recommended)** | Only **after** the new server is verified (Telegram, health, logs). See **Stopping the Old Server After Migration** at the end of this doc. | Old and new may both run briefly during migration → **risk of duplicate Telegram bot replies** if both are online. | +| **B — Cutover window** | Stop them **immediately before** you run **Step 6** `enable --now` / `./deploy.sh start` on the **new** server. | Short downtime, minimizes duplicate bots. | + +**Old-server commands (equivalent):** + +```bash +# Option 1 — matches this repo’s deploy script (stops agents in agents.yaml + monitor) +cd /root/.openclaw/workspace && ./deploy.sh stop + +# Option 2 — gateways only (monitor keeps running unless you stop it separately) +systemctl --user stop openclaw-gateway openclaw-gateway-tongge +``` + +Use strategy **B** if duplicate bots are unacceptable; use **A** if you want a safe rollback window while testing the new machine. + +--- + +## Step 1 — Prepare the New Server + +Install Node.js v24 (must match current version: **v24.13.1**). + +**aaPanel method** (recommended — matches `/www/server/nodejs/v24.13.1/` path): +Install Node.js 24.x from the aaPanel software store. + +**Manual method:** +```bash +curl -fsSL https://deb.nodesource.com/setup_24.x | bash - +apt install -y nodejs +``` + +Install global npm packages (must match versions on old server): +```bash +npm install -g openclaw clawhub mcporter pnpm +# Optional (present on original server): +npm install -g @steipete/oracle@0.8.6 bun@1.3.9 +``` + +Verify: +```bash +/www/server/nodejs/v24.13.1/bin/npm list -g --depth=0 +``` + +--- + +## Step 2 — Copy the Three Data Directories + +Run on the **old server**. Replace `NEW_SERVER_IP`: + +```bash +rsync -avz --progress -e 'ssh -p 3322' \ + /root/.openclaw \ + /root/.openclaw-tongge \ + /root/.mem0 \ + root@NEW_SERVER_IP:/root/ +``` + +Alternative (if rsync unavailable): +```bash +tar czf - /root/.openclaw /root/.openclaw-tongge /root/.mem0 \ + | ssh root@NEW_SERVER_IP 'tar xzf - -C /' +``` + +--- + +## Step 3 — Copy systemd Service Files + +Run on the **old server**. Copy **both** system units and **user** units (gateways). + +### 3a — System units (agent monitor) + +```bash +scp -P 3322 /etc/systemd/system/openclaw-agent-monitor.service \ + root@NEW_SERVER_IP:/etc/systemd/system/ + +# Optional: legacy/alternate main gateway system unit (if you use it instead of the user unit) +scp -P 3322 /etc/systemd/system/openclaw-gateway.service \ + root@NEW_SERVER_IP:/etc/systemd/system/ +``` + +### 3b — User units (main + Tongge gateways) — **required** for the standard layout + +On the **new** server, ensure the directory exists (scp does not create parent paths): + +```bash +ssh root@NEW_SERVER_IP 'mkdir -p /root/.config/systemd/user' +``` + +Then from the **old** server: + +```bash +scp -P 3322 /root/.config/systemd/user/openclaw-gateway.service \ + /root/.config/systemd/user/openclaw-gateway-tongge.service \ + root@NEW_SERVER_IP:/root/.config/systemd/user/ +``` + +Canonical copies also live under the repo (if you need to recreate units without scp): + +- `/root/.openclaw/workspace/systemd/openclaw-gateway-user.service` — reference for main gateway (compare with `~/.config/systemd/user/openclaw-gateway.service`) +- `/root/.openclaw/workspace/systemd/openclaw-gateway-tongge.service` — Tongge gateway + +**`openclaw-gateway.service` (user)** — main gateway +- ExecStart: typically `node` + `openclaw/dist/index.js` `gateway --port 18789` (see actual unit on disk) +- `EnvironmentFile=-/root/.openclaw/workspace/systemd/gateway.env` + +**`openclaw-gateway-tongge.service` (user)** — Tongge gateway +- `WorkingDirectory=/root/.openclaw-tongge` +- `ExecStart`: `/www/server/nodejs/v24.13.1/bin/openclaw --profile tongge gateway` +- `EnvironmentFile=-/root/.openclaw/workspace/systemd/tongge-gateway.env` (path under **`~/.openclaw`**, migrated with the `.openclaw` tree) + +**`openclaw-agent-monitor.service` (system)** — agent health monitor +- `WorkingDirectory`: `/root/.openclaw/workspace` +- `ExecStart`: `/usr/bin/node /root/.openclaw/workspace/agent-monitor.js` +- `ReadWritePaths`: `/root/.openclaw/workspace/logs` +- `MemoryMax`: 512M, `CPUQuota`: 20% + +--- + +## Step 4 — Fix Node.js Paths on the New Server + +If the new server uses a different Node.js install path, update **all** units that reference `/www/server/nodejs/v24.13.1/`: + +```bash +# Check actual paths +which node +which openclaw + +# Edit if needed +nano /root/.config/systemd/user/openclaw-gateway.service +nano /root/.config/systemd/user/openclaw-gateway-tongge.service +nano /etc/systemd/system/openclaw-agent-monitor.service +# Optional system gateway template: +nano /etc/systemd/system/openclaw-gateway.service +``` + +Key fields to verify: +- `ExecStart=` — correct paths to `openclaw` and `node` binaries +- `Environment=PATH=` (if present) — must include the Node.js `bin/` directory + +Also update **`memory.qmd.command`** in `/root/.openclaw-tongge/openclaw.json` if the absolute path to `qmd` changes (see Step 4.5). + +--- + +## Step 4.5 — 安装 QMD 内存后端(关键) + +> **为什么需要此步骤**:OpenClaw 使用 `qmd`(Quick Markdown Database)作为 Agent 工作区内存后端。 +> qmd 必须独立通过 npm 安装,**不依赖 openclaw 缓存,不依赖 bun**。 +> openclaw 内置缓存版本的 `better-sqlite3` 是用 bun runtime 编译的,与 node v24 ABI 不兼容,会导致 `bindings.node` 错误。 + +### 安装(一条命令) + +```bash +/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd +``` + +安装完成后 npm 会自动创建全局 symlink: +``` +/www/server/nodejs/v24.13.1/bin/qmd → ../lib/node_modules/@tobilu/qmd/bin/qmd +``` + +### 验证 + +```bash +# 检查 symlink 是否指向 npm 全局包(不是 cache 目录) +ls -la /www/server/nodejs/v24.13.1/bin/qmd + +# 验证 qmd 可正常运行 +/www/server/nodejs/v24.13.1/bin/qmd --help 2>&1 | head -3 + +# 验证 collection 命令可用(实际使用前) +/www/server/nodejs/v24.13.1/bin/qmd collection list +``` + +### 常见错误排查(安装与命令行阶段) + +此时尚未启动 Gateway,**没有** systemd/journal 中的网关日志;请以 **`qmd collection list`** 与 **`qmd --help`** 的终端输出为准。 + +错误信号及原因(命令行或后续网关日志中可能出现): +- `spawn .../bin/qmd ENOENT` — symlink 断开或 npm 包未安装,重新执行 `npm install -g @tobilu/qmd` +- `bindings.node` / `better-sqlite3` errors — 使用了 openclaw 缓存中 bun 编译的版本,需覆盖安装 npm 包 +- `Cannot find module '.../dist/qmd.js'` — dist 未构建,npm 包版本过旧 + +### openclaw 升级后 + +`openclaw update` **不会影响** npm 全局安装的 qmd(两者完全独立)。升级后无需重新处理 qmd。 + +若 openclaw 升级后出现 qmd 问题,检查 symlink 是否被 openclaw 覆盖: +```bash +ls -la /www/server/nodejs/v24.13.1/bin/qmd +# 应指向 ../lib/node_modules/@tobilu/qmd/bin/qmd +# 若指向 cache/ 目录,重新执行: +/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd +``` + +### 关键信息 + +| 项目 | 值 | +|------|-----| +| 安装命令 | `/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd` | +| npm 包名 | `@tobilu/qmd` | +| 当前版本 | `2.0.1` | +| Symlink 位置 | `/www/server/nodejs/v24.13.1/bin/qmd` | +| Symlink 目标 | `../lib/node_modules/@tobilu/qmd/bin/qmd` | +| 桐哥 gateway 配置 | `memory.qmd.command: "/www/server/nodejs/v24.13.1/bin/qmd"`(绝对路径) | +| 为何不用 openclaw cache 版本 | cache 版的 `better-sqlite3` 是 bun ABI 编译,与 node v24 不兼容 | +| openclaw 升级影响 | 无(npm 全局包独立于 openclaw cache) | + +--- + +## Attachment(环境依赖) + +本附录汇总迁移后启动前必须确认的“基础建设/环境依赖”,避免只要启动 gateway 就开始排错。 + +### A. OneAPI LLM 网关(基础设施,非 mem0) + +OneAPI 网关部署目录: +- `/root/.openclaw/workspace/infrastructure/oneapi/` + +关键要点: +- `docker-compose.yml` 中容器端口绑定为 `TAILSCALE_IP:3000:3000` +- `/root/.openclaw/workspace/infrastructure/oneapi/.env` 只需要把 `TAILSCALE_IP` 改成新机真实值 +- 启动后应可访问 OneAPI 管理后台:`http://:3000`(默认 `root / 123456`) + +与 OpenClaw 的对齐检查(两处都要改/核对): +- 主网关:`/root/.openclaw/.env` 的 `LLM_BASE_URL`、`LLM_API_KEY`(以及对应 `openclaw.json` 的默认 provider 引用) +- 桐哥网关:`/root/.openclaw-tongge/.env` 的 `LLM_BASE_URL`、`LLM_API_KEY` + +迁移时如果 `LLM_BASE_URL` 带有 `/v1`,客户端会按约定拼接到 `/v1/chat/completions`;否则会拼接 `/v1/chat/completions`,避免 404 的 `/v1/v1/...` 问题。 + +### B. Control UI 访问依赖(Tailscale Serve + allowedOrigins) + +迁移常见故障:Control UI 打不开,多半是新机的 Tailscale 主机名/IP 和旧配置不一致。 + +需要核对两份 `openclaw.json`: +- `gateway.controlUi.allowedOrigins`:加入新机对应的 HTTPS Origin(含端口,若非 443) +- `gateway.trustedProxies`:加入新机 Tailscale IP(或与你当前访问路径一致的代理集合) + +相关参考见:`/root/.openclaw/workspace/docs/CONTROL_UI_ACCESS_AND_SECURITY.md` + +### C. Telegram 插件依赖(两个网关各自需要 token) + +`openclaw.json` 的 `plugins.allow` 包含 `telegram`,token 来自各自 env 文件: +- 主网关 env:`/root/.openclaw/workspace/systemd/gateway.env` 中的 `TELEGRAM_BOT_TOKEN` +- 桐哥 env:`/root/.openclaw/workspace/systemd/tongge-gateway.env` 中的 `TELEGRAM_BOT_TOKEN` + +若新旧服务器在切换窗口同时在线,可能出现重复响应;见文末“停止旧服务器”策略。 + +### D. Python3 依赖(mem0-integration) + +桐哥的 `openclaw.json` 中 mem0-integration 使用: +- `pythonPath: /usr/bin/python3` + +因此迁移到新机时必须确认 `/usr/bin/python3` 存在并可执行。 + +### E. qmd 命令路径一致性(避免 ENOENT / better-sqlite3/bindings.node) + +qmd 不是跟着 openclaw 一起安装,而是需要: +- 用 node 前缀安装:`npm install -g @tobilu/qmd` +- `openclaw-tongge/openclaw.json` 里的 `memory.qmd.command` 指向新机实际 `qmd` 的绝对路径 + +同时,systemd unit 的 PATH/ExecStart 前缀要确保能找到对应的 qmd/qmd 依赖(尤其是 `better-sqlite3` ABI)。 + +--- + +## Step 5 — Restore the Cron Jobs + +Cron jobs for the tongge gateway (daily fortune, active learning) are stored in `/root/.openclaw-tongge/cron/jobs.json` and managed by OpenClaw's built-in scheduler. They are migrated automatically as part of the `.openclaw-tongge/` directory copy — **no manual crontab setup required**. + +Verify the jobs loaded after gateway start: +```bash +# Gateway 启动后约 30s 检查(jobs.json 会被 Gateway 写入 nextRunAtMs) +cat /root/.openclaw-tongge/cron/jobs.json | python3 -m json.tool | grep -E "name|nextRunAt|enabled" +``` + +Expected output shows `tongge-daily-fortune` and `tongge-active-learning` with `nextRunAtMs` values set. + +### mem0 cleanup — `/etc/cron.d/mem0-cleanup` + +This file is **not** under `~/.mem0`; copy it from the old server. + +From the **old** server: + +```bash +scp -P 3322 /etc/cron.d/mem0-cleanup root@NEW_SERVER_IP:/etc/cron.d/mem0-cleanup +``` + +On the **new** server, ensure ownership and mode are valid for `cron.d` (typical: root, `0644`): + +```bash +chown root:root /etc/cron.d/mem0-cleanup +chmod 0644 /etc/cron.d/mem0-cleanup +``` + +The job runs `memory_cleanup.py` under `/root/.openclaw/workspace/skills/mem0-integration/` and logs to `/root/.openclaw/workspace/logs/security/cleanup-cron.log` — those paths come over with the `.openclaw` rsync. + +--- + +## Step 5.5 — `deploy.sh` and `agents.yaml` (how this repo manages gateways) + +Production control plane lives under **`/root/.openclaw/workspace/deploy.sh`**. It reads **`/root/.openclaw/workspace/agents.yaml`** (via `scripts/parse_agents.py`) to know which gateways exist: + +- **main** — `local-cli`: start/stop uses `openclaw gateway start` / `gateway status` (paths are **hardcoded in `agents.yaml`**). +- **tongge** — `local-systemd`: unit `openclaw-gateway-tongge.service` installed from **`workspace/systemd/openclaw-gateway-tongge.service`**. + +| Command | What it does | +|---------|----------------| +| `./deploy.sh install` | `loginctl` linger, copies **`openclaw-gateway-user.service` → `~/.config/systemd/user/openclaw-gateway.service`**, installs tongge unit from template, installs **system** `openclaw-agent-monitor`, runs `fix-service`, **starts** all services. | +| `./deploy.sh start` / `stop` / `restart` | Start/stop/restart all agents from `agents.yaml` + monitor. | +| `./deploy.sh health` | Health check (user units + monitor + disk/memory/linger). | +| `./deploy.sh fix-service` | Re-inject `EnvironmentFile=` into units after OpenClaw UI upgrade (see script header). | + +**Migration implications:** + +1. **If you `scp`’d live units from the old server** — they may differ from templates. Either keep using **manual** `systemctl --user enable --now` (Step 6), **or** merge your path edits into `workspace/systemd/*.service` and **`agents.yaml`**, then run `./deploy.sh install` on a **clean** tree (backup first). **`install` overwrites** `~/.config/systemd/user/openclaw-gateway.service` from `openclaw-gateway-user.service`. +2. **Always update `agents.yaml`** after changing Node prefix: fields like `service.check_cmd` and `service.start_cmd` under **`main`** must point at the same `node`/`openclaw` paths as on disk (e.g. `/www/server/nodejs/v24.14.0/bin/openclaw`). Otherwise `./deploy.sh stop` / `health` / `start` will call the wrong binaries. +3. **Tongge** does not use `start_cmd` in yaml; it uses the **systemd unit** only — edit **`ExecStart`** in the installed user unit (or template) for new Node paths. + +**Suggested workflow on the new server:** finish Steps 1–5 and the **pre-flight checklist** below; then either **Step 6** (explicit `systemctl`) **or** `cd /root/.openclaw/workspace && ./deploy.sh start` (after units exist and `agents.yaml` paths match). For a full reinstall of units from repo templates, use `./deploy.sh install` **only** when you intend to replace user units with templates. + +--- + +## Step 5.9 — Pre-flight checklist (before starting gateways) + +Complete **before** `systemctl --user enable --now …` or `./deploy.sh start` / `install` to reduce boot-loop / missing-binary errors: + +**Environment & secrets** + +- [ ] `/root/.openclaw/workspace/systemd/gateway.env` and `tongge-gateway.env` exist (came with rsync). +- [ ] `/root/.openclaw/.env` and `/root/.openclaw-tongge/.env` present if your setup expects them. + +**Node / OpenClaw / qmd paths (must be consistent)** + +- [ ] `which node`, `which openclaw`, `which qmd` on the new server — **one** prefix (e.g. all under `/www/server/nodejs/v24.x.x/`). +- [ ] User systemd units: `ExecStart` paths updated (Step 4). +- [ ] `/root/.openclaw-tongge/openclaw.json` → `memory.qmd.command` matches the real `qmd` binary (Step 4.5). +- [ ] If using **`deploy.sh`**: `/root/.openclaw/workspace/agents.yaml` → **`main.service.check_cmd` / `main.service.start_cmd`** updated to the same `openclaw` path as above. + +**Network & dependencies** + +- [ ] If the new host has a **new Tailscale IP** or LLM moved: edit both **`openclaw.json`** files (Step 6.5). +- [ ] If agents use Mem0 + local Qdrant: Docker stack in `/opt/mem0-center/` is **up** and `localhost:6333` reachable **before** expecting mem0 in gateway (order: Docker section can run **before** Step 6 if needed). +- [ ] `python3` available for `parse_agents.py`, cron `memory_cleanup.py`, and optional `json.tool` checks. + +**systemd user session** + +- [ ] `loginctl enable-linger root` will be run (Step 6) so user gateways survive reboot. +- [ ] `/root/.config/systemd/user/` contains the two gateway units (or you will run `./deploy.sh install` to generate them). + +**Optional: old server** + +- [ ] If using **cutover strategy** (see Overview): old server gateways stopped **now**, then proceed to Step 6 on the new server immediately. + +### 运行前确认(环境依赖与配置对齐) + +在真正执行 Step 6(enable/start)之前,再做一次“跨组件对齐”确认,确保 Attachment(环境依赖)里的关键依赖都已就绪: + +- OneAPI:`openclaw-llm-gateway` 容器已启动,管理后台可达 `http://:3000`;主/桐哥的 `LLM_BASE_URL` 与 `LLM_API_KEY` 与 OneAPI 后台创建的 Key 对齐 +- Telegram:`TELEGRAM_BOT_TOKEN` 分别存在于主网关 env(`workspace/systemd/gateway.env`)和桐哥 env(`workspace/systemd/tongge-gateway.env`) +- Control UI:若新机的 Tailscale IP/主机名变化,已同步两份 `openclaw.json` 的 `gateway.controlUi.allowedOrigins` 与 `gateway.trustedProxies` +- Python3:`/usr/bin/python3` 存在并可执行(mem0-integration 需要) +- qmd:`openclaw-tongge/openclaw.json` 里的 `memory.qmd.command` 指向新机实际 `qmd` 的绝对路径 + +--- + +## Step 6 — Enable and Start Services + +Complete **Step 5.9** first. Below is the **manual** `systemctl` path; equivalent using **`./deploy.sh`**: after units exist and **`agents.yaml`** paths match, run `cd /root/.openclaw/workspace && ./deploy.sh start` (or `./deploy.sh install` only when intentionally (re)installing units from templates — see Step 5.5). + +```bash +# User linger (once per machine): required for user-scoped gateways at boot +loginctl enable-linger root + +systemctl daemon-reload +systemctl enable --now openclaw-agent-monitor + +systemctl --user daemon-reload +systemctl --user enable --now openclaw-gateway openclaw-gateway-tongge +``` + +Verify: + +```bash +systemctl status openclaw-agent-monitor +systemctl --user status openclaw-gateway +systemctl --user status openclaw-gateway-tongge + +# Logs (pick one) +journalctl --user -u openclaw-gateway -f +journalctl --user -u openclaw-gateway-tongge -f +``` + +**qmd 相关(仅在 Gateway 已启动后)**:若 Step 4.5 的命令行验证通过,但运行时仍怀疑 qmd 问题,再从 journal 中筛选(无输出表示近期无匹配行,属正常): + +```bash +journalctl --user -u openclaw-gateway -n 80 | grep -i qmd +journalctl --user -u openclaw-gateway-tongge -n 80 | grep -i qmd +# 若仍使用 system 级 openclaw-gateway: +journalctl -u openclaw-gateway -n 80 | grep -i qmd +``` + +If you intentionally use **system** `openclaw-gateway.service` instead of the user unit, enable that unit and **avoid** running two main gateways at once. + +--- + +## Step 6.5 — Update IPs and bind lists (if the new host differs) + +`openclaw.json` in both **`~/.openclaw`** and **`~/.openclaw-tongge`** may hardcode the previous machine’s Tailscale IP (e.g. `100.115.94.1`) under: + +- `models.providers.*.baseUrl` (e.g. local LLM / gateway at `:3000`) +- `gateway.bind` / allowlists (e.g. `:18789`, `:18790`) + +If the new VPS gets a **different Tailscale address** or you change where the LLM API runs, search and update those URLs consistently in **both** config files. Skip this if the new server reuses the same Tailscale IP and service topology. + +**Mem0 skill** (`/root/.openclaw/workspace/skills/mem0-integration/config.yaml`) uses `localhost:6333` for Qdrant; keep Qdrant on the same host as OpenClaw or change `host`/`port` to match your Docker/Qdrant deployment. + +--- + +## Step 7 — Verify Telegram Bots + +Both bots should respond within a minute of the gateways starting. Bot tokens are stored in `.env` files and systemd `EnvironmentFile`s — no changes needed when paths stay `/root/...`. Check: + +- Main gateway bot: `/root/.openclaw/.env` and `/root/.openclaw/workspace/systemd/gateway.env` +- Tongge bot: `/root/.openclaw-tongge/.env` and `/root/.openclaw/workspace/systemd/tongge-gateway.env` + +--- + +## What Is Portable (No Changes Needed) + +| Item | Location | +|------|----------| +| Main gateway config & agents | `/root/.openclaw/openclaw.json` | +| Tongge gateway config & agents | `/root/.openclaw-tongge/openclaw.json` | +| Main gateway secrets (systemd) | `/root/.openclaw/workspace/systemd/gateway.env` | +| Tongge gateway secrets (systemd) | `/root/.openclaw/workspace/systemd/tongge-gateway.env` | +| Bot tokens in tree | `/root/.openclaw/.env`, `/root/.openclaw-tongge/.env` | +| Mem0 SQLite memory history | `/root/.mem0/history.db` | +| Mem0 user identity | `/root/.mem0/config.json` | +| Qdrant migration metadata (if present) | `/root/.mem0/migrations_qdrant/` | +| Workspace scripts, skills, agents | `/root/.openclaw/workspace/` | +| Tongge workspace | `/root/.openclaw-tongge/workspace/` | +| Credentials & delivery queue | `credentials/`, `delivery-queue/` | + +--- + +## What May Need Updating + +| Item | Action | +|------|--------| +| Node.js binary paths in **user + system** `.service` files | Update if install path differs from `/www/server/nodejs/v24.13.1/` | +| `openclaw-gateway.service` (user or system) | Re-apply custom `EnvironmentFile` / `ExecStart` after some `openclaw update` flows; `gateway.env` survives if referenced | +| Tailscale IPs / LLM base URLs in `openclaw.json` | See Step 6.5 when the host or upstream API address changes | +| Qdrant vector store (if used) | See Docker stack section below | +| `node_modules` in `~/.openclaw-tongge/` | Re-run `npm install` in that directory if any native modules break | + +--- + +## Docker Stack Migration (Qdrant + Dozzle) + +Install Docker Engine and the Compose plugin on the new server before bringing the stack up (`docker compose` v2). Both services are managed by `/opt/mem0-center/docker-compose.yml`. + +### Step A — Copy the compose stack + +```bash +rsync -avz --progress /opt/mem0-center/ root@NEW_SERVER_IP:/opt/mem0-center/ +``` + +This includes `docker-compose.yml`, `qdrant_storage/` (vector data), and `snapshots/`. + +### Step B — Update Dozzle port binding on the new server + +Dozzle **must** bind to the new server's Tailscale IP. Edit the compose file after copying: + +```bash +# Get new server's Tailscale IP +tailscale ip -4 + +# Update the port binding (replace OLD_IP with the value above) +nano /opt/mem0-center/docker-compose.yml +# Change: "100.115.94.1:9999:8080" +# To: ":9999:8080" +``` + +> Binding to `127.0.0.1` breaks remote access. Binding to `0.0.0.0` exposes the port publicly. +> Always bind to the Tailscale IP specifically. See `DOZZLE_LOG_OBSERVABILITY.md` for details. + +### Step C — Start the stack + +```bash +cd /opt/mem0-center +docker compose up -d +``` + +Verify both containers are healthy: +```bash +docker ps --format "table {{.Names}}\t{{.Status}}" +# Expected: qdrant-master Up ... (healthy) +# dozzle Up ... (healthy) +``` + +Access Dozzle at `http://:9999`. + +### Healthcheck note + +Both containers use non-standard healthchecks (no `wget`/`curl` in images): +- **Dozzle**: `["CMD", "/dozzle", "healthcheck"]` — built-in binary, no shell needed +- **Qdrant**: `CMD-SHELL` with bash `/dev/tcp` TCP probe + +If you upgrade either image and healthcheck breaks, refer to `DOZZLE_LOG_OBSERVABILITY.md §四`. + +--- + +## Post-Migration Checklist + +- [ ] **Step 5.9 pre-flight** completed (paths, env files, optional Docker Qdrant, `agents.yaml` if using `deploy.sh`) +- [ ] Node.js v24 installed and at correct path +- [ ] Global npm packages installed (`openclaw`, `clawhub`, `mcporter`, `pnpm` — match Step 1) +- [ ] `~/.openclaw` copied and permissions intact (`chmod 700`) +- [ ] `~/.openclaw-tongge` copied and permissions intact (`chmod 700`) +- [ ] `~/.mem0` copied (including `history.db` and `migrations_qdrant/` if present) +- [ ] **`agents.yaml`** `main.service.*` paths match real `openclaw` if using `./deploy.sh` (Step 5.5) +- [ ] Systemd: `openclaw-agent-monitor.service` + user units `openclaw-gateway.service` and `openclaw-gateway-tongge.service` installed; paths verified +- [ ] `loginctl enable-linger root` (for user-scoped gateways at boot) +- [ ] `systemctl daemon-reload` and `systemctl --user daemon-reload` run +- [ ] Gateways + monitor started (**Step 6** manual **or** `./deploy.sh start` / `install` per Step 5.5) +- [ ] **QMD 内存后端已安装并验证**(Step 4.5): + - [ ] 执行 `/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd` + - [ ] `ls -la /www/server/nodejs/v24.13.1/bin/qmd` — 指向 `../lib/node_modules/@tobilu/qmd/bin/qmd`(非 cache 目录) + - [ ] `/www/server/nodejs/v24.13.1/bin/qmd collection list` — 无报错 + - [ ] 网关日志中无 `qmd ENOENT` 或 `better-sqlite3` 错误 +- [ ] Tongge cron jobs loaded (check `nextRunAtMs` in `/root/.openclaw-tongge/cron/jobs.json`) +- [ ] mem0 cleanup cron restored (`/etc/cron.d/mem0-cleanup`) +- [ ] Telegram bots responding +- [ ] `/opt/mem0-center/` copied to new server +- [ ] Dozzle port binding updated to new server's Tailscale IP +- [ ] `docker compose up -d` run in `/opt/mem0-center/` +- [ ] `qdrant-master` and `dozzle` both show `(healthy)` +- [ ] Dozzle accessible at `http://:9999` +- [ ] Old server services stopped (to avoid duplicate bot responses) + +--- + +## Stopping the Old Server After Migration + +**When:** After the **new** server is confirmed working (strategy **A** in Overview). If you already stopped gateways on the old box before Step 6 (strategy **B**), only disable what is still running and skip duplicate `stop` commands. + +Stop the old services to avoid duplicate Telegram bot responses: + +```bash +# On the OLD server — preferred: deploy script stops agents.yaml + monitor +cd /root/.openclaw/workspace && ./deploy.sh stop + +# Or stop user units + monitor manually: +systemctl --user stop openclaw-gateway openclaw-gateway-tongge +systemctl --user disable openclaw-gateway openclaw-gateway-tongge + +systemctl stop openclaw-agent-monitor +systemctl disable openclaw-agent-monitor + +# If you still had a system-level main gateway enabled: +systemctl stop openclaw-gateway 2>/dev/null || true +systemctl disable openclaw-gateway 2>/dev/null || true +```