docs: expand SERVER_MIGRATION_GUIDE (Attachment, OneAPI, pre-flight before Step 6)

Made-with: Cursor
1 week ago · caccbebc7e
parent a677a89f97
commit caccbebc7e
1 changed files with 596 additions and 0 deletions
--- a/docs/SERVER_MIGRATION_GUIDE.md
+++ b/docs/SERVER_MIGRATION_GUIDE.md
@ -0,0 +1,596 @@
 # Server Migration Guide
 > Last updated: 2026-03-26
 > Covers: `~/.openclaw`, `~/.openclaw-tongge`, `~/.mem0`, Docker stack (`qdrant-master`, `dozzle`)
 ---
 ## Overview
 This guide documents how to migrate the full OpenClaw stack (main gateway, Tongge sub-gateway, and Mem0 memory service) to a new Ubuntu VPS while preserving structure, configuration, and memories.
 ### Systemd layout (read this before Step 3)
 On the reference deployment, **two gateways run as user systemd units** (root’s `~/.config/systemd/user/`), while the **agent monitor runs as a system unit**:
 | Unit | Scope | Role |
 |------|--------|------|
 | `openclaw-gateway.service` | **user** (`systemctl --user`) | Main gateway (port 18789) |
 | `openclaw-gateway-tongge.service` | **user** (`systemctl --user`) | Tongge gateway (port 18790) |
 | `openclaw-agent-monitor.service` | **system** (`systemctl`) | Health monitor |
 `/etc/systemd/system/openclaw-gateway.service` may exist as a legacy or alternate template; the **running** main gateway is typically the **user** unit above. Migrating only `/etc/systemd/system/` and running `systemctl start openclaw-gateway` **does not** restore the active user-managed gateways unless you intentionally switch to system units.
 For user services to start at boot (no interactive login), **root linger** must be enabled once:
 ```bash
 loginctl enable-linger root
 ```
 ### When to stop the two gateways on the **old** server
 The guide assumes **two user-scoped gateways**: `openclaw-gateway` (main) and `openclaw-gateway-tongge` (桐哥).
 | Strategy | When to stop them on the old box | Trade-off |
 |----------|----------------------------------|-----------|
 | **A — Default (recommended)** | Only **after** the new server is verified (Telegram, health, logs). See **Stopping the Old Server After Migration** at the end of this doc. | Old and new may both run briefly during migration → **risk of duplicate Telegram bot replies** if both are online. |
 | **B — Cutover window** | Stop them **immediately before** you run **Step 6** `enable --now` / `./deploy.sh start` on the **new** server. | Short downtime, minimizes duplicate bots. |
 **Old-server commands (equivalent):**
 ```bash
 # Option 1 — matches this repo’s deploy script (stops agents in agents.yaml + monitor)
 cd /root/.openclaw/workspace && ./deploy.sh stop
 # Option 2 — gateways only (monitor keeps running unless you stop it separately)
 systemctl --user stop openclaw-gateway openclaw-gateway-tongge
 ```
 Use strategy **B** if duplicate bots are unacceptable; use **A** if you want a safe rollback window while testing the new machine.
 ---
 ## Step 1 — Prepare the New Server
 Install Node.js v24 (must match current version: **v24.13.1**).
 **aaPanel method** (recommended — matches `/www/server/nodejs/v24.13.1/` path):
 Install Node.js 24.x from the aaPanel software store.
 **Manual method:**
 ```bash
 curl -fsSL https://deb.nodesource.com/setup_24.x | bash -
 apt install -y nodejs
 ```
 Install global npm packages (must match versions on old server):
 ```bash
 npm install -g openclaw clawhub mcporter pnpm
 # Optional (present on original server):
 npm install -g @steipete/oracle@0.8.6 bun@1.3.9
 ```
 Verify:
 ```bash
 /www/server/nodejs/v24.13.1/bin/npm list -g --depth=0
 ```
 ---
 ## Step 2 — Copy the Three Data Directories
 Run on the **old server**. Replace `NEW_SERVER_IP`:
 ```bash
 rsync -avz --progress -e 'ssh -p 3322' \
  /root/.openclaw \
  /root/.openclaw-tongge \
  /root/.mem0 \
  root@NEW_SERVER_IP:/root/
 ```
 Alternative (if rsync unavailable):
 ```bash
 tar czf - /root/.openclaw /root/.openclaw-tongge /root/.mem0 \
  | ssh root@NEW_SERVER_IP 'tar xzf - -C /'
 ```
 ---
 ## Step 3 — Copy systemd Service Files
 Run on the **old server**. Copy **both** system units and **user** units (gateways).
 ### 3a — System units (agent monitor)
 ```bash
 scp -P 3322 /etc/systemd/system/openclaw-agent-monitor.service \
    root@NEW_SERVER_IP:/etc/systemd/system/
 # Optional: legacy/alternate main gateway system unit (if you use it instead of the user unit)
 scp -P 3322  /etc/systemd/system/openclaw-gateway.service \
    root@NEW_SERVER_IP:/etc/systemd/system/
 ```
 ### 3b — User units (main + Tongge gateways) — **required** for the standard layout
 On the **new** server, ensure the directory exists (scp does not create parent paths):
 ```bash
 ssh root@NEW_SERVER_IP 'mkdir -p /root/.config/systemd/user'
 ```
 Then from the **old** server:
 ```bash
 scp -P 3322 /root/.config/systemd/user/openclaw-gateway.service \
    /root/.config/systemd/user/openclaw-gateway-tongge.service \
    root@NEW_SERVER_IP:/root/.config/systemd/user/
 ```
 Canonical copies also live under the repo (if you need to recreate units without scp):
 - `/root/.openclaw/workspace/systemd/openclaw-gateway-user.service` — reference for main gateway (compare with `~/.config/systemd/user/openclaw-gateway.service`)
 - `/root/.openclaw/workspace/systemd/openclaw-gateway-tongge.service` — Tongge gateway
 **`openclaw-gateway.service` (user)** — main gateway
 - ExecStart: typically `node` + `openclaw/dist/index.js` `gateway --port 18789` (see actual unit on disk)
 - `EnvironmentFile=-/root/.openclaw/workspace/systemd/gateway.env`
 **`openclaw-gateway-tongge.service` (user)** — Tongge gateway
 - `WorkingDirectory=/root/.openclaw-tongge`
 - `ExecStart`: `/www/server/nodejs/v24.13.1/bin/openclaw --profile tongge gateway`
 - `EnvironmentFile=-/root/.openclaw/workspace/systemd/tongge-gateway.env` (path under **`~/.openclaw`**, migrated with the `.openclaw` tree)
 **`openclaw-agent-monitor.service` (system)** — agent health monitor
 - `WorkingDirectory`: `/root/.openclaw/workspace`
 - `ExecStart`: `/usr/bin/node /root/.openclaw/workspace/agent-monitor.js`
 - `ReadWritePaths`: `/root/.openclaw/workspace/logs`
 - `MemoryMax`: 512M, `CPUQuota`: 20%
 ---
 ## Step 4 — Fix Node.js Paths on the New Server
 If the new server uses a different Node.js install path, update **all** units that reference `/www/server/nodejs/v24.13.1/`:
 ```bash
 # Check actual paths
 which node
 which openclaw
 # Edit if needed
 nano /root/.config/systemd/user/openclaw-gateway.service
 nano /root/.config/systemd/user/openclaw-gateway-tongge.service
 nano /etc/systemd/system/openclaw-agent-monitor.service
 # Optional system gateway template:
 nano /etc/systemd/system/openclaw-gateway.service
 ```
 Key fields to verify:
 - `ExecStart=` — correct paths to `openclaw` and `node` binaries
 - `Environment=PATH=` (if present) — must include the Node.js `bin/` directory
 Also update **`memory.qmd.command`** in `/root/.openclaw-tongge/openclaw.json` if the absolute path to `qmd` changes (see Step 4.5).
 ---
 ## Step 4.5 — 安装 QMD 内存后端（关键）
 > **为什么需要此步骤**：OpenClaw 使用 `qmd`（Quick Markdown Database）作为 Agent 工作区内存后端。
 > qmd 必须独立通过 npm 安装，**不依赖 openclaw 缓存，不依赖 bun**。
 > openclaw 内置缓存版本的 `better-sqlite3` 是用 bun runtime 编译的，与 node v24 ABI 不兼容，会导致 `bindings.node` 错误。
 ### 安装（一条命令）
 ```bash
 /www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd
 ```
 安装完成后 npm 会自动创建全局 symlink：
 ```
 /www/server/nodejs/v24.13.1/bin/qmd → ../lib/node_modules/@tobilu/qmd/bin/qmd
 ```
 ### 验证
 ```bash
 # 检查 symlink 是否指向 npm 全局包（不是 cache 目录）
 ls -la /www/server/nodejs/v24.13.1/bin/qmd
 # 验证 qmd 可正常运行
 /www/server/nodejs/v24.13.1/bin/qmd --help 2>&1 | head -3
 # 验证 collection 命令可用（实际使用前）
 /www/server/nodejs/v24.13.1/bin/qmd collection list
 ```
 ### 常见错误排查（安装与命令行阶段）
 此时尚未启动 Gateway，**没有** systemd/journal 中的网关日志；请以 **`qmd collection list`** 与 **`qmd --help`** 的终端输出为准。
 错误信号及原因（命令行或后续网关日志中可能出现）：
 - `spawn .../bin/qmd ENOENT` — symlink 断开或 npm 包未安装，重新执行 `npm install -g @tobilu/qmd`
 - `bindings.node` / `better-sqlite3` errors — 使用了 openclaw 缓存中 bun 编译的版本，需覆盖安装 npm 包
 - `Cannot find module '.../dist/qmd.js'` — dist 未构建，npm 包版本过旧
 ### openclaw 升级后
 `openclaw update` **不会影响** npm 全局安装的 qmd（两者完全独立）。升级后无需重新处理 qmd。
 若 openclaw 升级后出现 qmd 问题，检查 symlink 是否被 openclaw 覆盖：
 ```bash
 ls -la /www/server/nodejs/v24.13.1/bin/qmd
 # 应指向 ../lib/node_modules/@tobilu/qmd/bin/qmd
 # 若指向 cache/ 目录，重新执行：
 /www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd
 ```
 ### 关键信息
 | 项目 | 值 |
 |------|-----|
 | 安装命令 | `/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd` |
 | npm 包名 | `@tobilu/qmd` |
 | 当前版本 | `2.0.1` |
 | Symlink 位置 | `/www/server/nodejs/v24.13.1/bin/qmd` |
 | Symlink 目标 | `../lib/node_modules/@tobilu/qmd/bin/qmd` |
 | 桐哥 gateway 配置 | `memory.qmd.command: "/www/server/nodejs/v24.13.1/bin/qmd"`（绝对路径） |
 | 为何不用 openclaw cache 版本 | cache 版的 `better-sqlite3` 是 bun ABI 编译，与 node v24 不兼容 |
 | openclaw 升级影响 | 无（npm 全局包独立于 openclaw cache） |
 ---
 ## Attachment（环境依赖）
 本附录汇总迁移后启动前必须确认的“基础建设/环境依赖”，避免只要启动 gateway 就开始排错。
 ### A. OneAPI LLM 网关（基础设施，非 mem0）
 OneAPI 网关部署目录：
 - `/root/.openclaw/workspace/infrastructure/oneapi/`
 关键要点：
 - `docker-compose.yml` 中容器端口绑定为 `TAILSCALE_IP:3000:3000`
 - `/root/.openclaw/workspace/infrastructure/oneapi/.env` 只需要把 `TAILSCALE_IP` 改成新机真实值
 - 启动后应可访问 OneAPI 管理后台：`http://<TAILSCALE_IP>:3000`（默认 `root / 123456`）
 与 OpenClaw 的对齐检查（两处都要改/核对）：
 - 主网关：`/root/.openclaw/.env` 的 `LLM_BASE_URL`、`LLM_API_KEY`（以及对应 `openclaw.json` 的默认 provider 引用）
 - 桐哥网关：`/root/.openclaw-tongge/.env` 的 `LLM_BASE_URL`、`LLM_API_KEY`
 迁移时如果 `LLM_BASE_URL` 带有 `/v1`，客户端会按约定拼接到 `/v1/chat/completions`；否则会拼接 `/v1/chat/completions`，避免 404 的 `/v1/v1/...` 问题。
 ### B. Control UI 访问依赖（Tailscale Serve + allowedOrigins）
 迁移常见故障：Control UI 打不开，多半是新机的 Tailscale 主机名/IP 和旧配置不一致。
 需要核对两份 `openclaw.json`：
 - `gateway.controlUi.allowedOrigins`：加入新机对应的 HTTPS Origin（含端口，若非 443）
 - `gateway.trustedProxies`：加入新机 Tailscale IP（或与你当前访问路径一致的代理集合）
 相关参考见：`/root/.openclaw/workspace/docs/CONTROL_UI_ACCESS_AND_SECURITY.md`
 ### C. Telegram 插件依赖（两个网关各自需要 token）
 `openclaw.json` 的 `plugins.allow` 包含 `telegram`，token 来自各自 env 文件：
 - 主网关 env：`/root/.openclaw/workspace/systemd/gateway.env` 中的 `TELEGRAM_BOT_TOKEN`
 - 桐哥 env：`/root/.openclaw/workspace/systemd/tongge-gateway.env` 中的 `TELEGRAM_BOT_TOKEN`
 若新旧服务器在切换窗口同时在线，可能出现重复响应；见文末“停止旧服务器”策略。
 ### D. Python3 依赖（mem0-integration）
 桐哥的 `openclaw.json` 中 mem0-integration 使用：
 - `pythonPath: /usr/bin/python3`
 因此迁移到新机时必须确认 `/usr/bin/python3` 存在并可执行。
 ### E. qmd 命令路径一致性（避免 ENOENT / better-sqlite3/bindings.node）
 qmd 不是跟着 openclaw 一起安装，而是需要：
 - 用 node 前缀安装：`npm install -g @tobilu/qmd`
 - `openclaw-tongge/openclaw.json` 里的 `memory.qmd.command` 指向新机实际 `qmd` 的绝对路径
 同时，systemd unit 的 PATH/ExecStart 前缀要确保能找到对应的 qmd/qmd 依赖（尤其是 `better-sqlite3` ABI）。
 ---
 ## Step 5 — Restore the Cron Jobs
 Cron jobs for the tongge gateway (daily fortune, active learning) are stored in `/root/.openclaw-tongge/cron/jobs.json` and managed by OpenClaw's built-in scheduler. They are migrated automatically as part of the `.openclaw-tongge/` directory copy — **no manual crontab setup required**.
 Verify the jobs loaded after gateway start:
 ```bash
 # Gateway 启动后约 30s 检查（jobs.json 会被 Gateway 写入 nextRunAtMs）
 cat /root/.openclaw-tongge/cron/jobs.json | python3 -m json.tool | grep -E "name|nextRunAt|enabled"
 ```
 Expected output shows `tongge-daily-fortune` and `tongge-active-learning` with `nextRunAtMs` values set.
 ### mem0 cleanup — `/etc/cron.d/mem0-cleanup`
 This file is **not** under `~/.mem0`; copy it from the old server.
 From the **old** server:
 ```bash
 scp -P 3322 /etc/cron.d/mem0-cleanup root@NEW_SERVER_IP:/etc/cron.d/mem0-cleanup
 ```
 On the **new** server, ensure ownership and mode are valid for `cron.d` (typical: root, `0644`):
 ```bash
 chown root:root /etc/cron.d/mem0-cleanup
 chmod 0644 /etc/cron.d/mem0-cleanup
 ```
 The job runs `memory_cleanup.py` under `/root/.openclaw/workspace/skills/mem0-integration/` and logs to `/root/.openclaw/workspace/logs/security/cleanup-cron.log` — those paths come over with the `.openclaw` rsync.
 ---
 ## Step 5.5 — `deploy.sh` and `agents.yaml` (how this repo manages gateways)
 Production control plane lives under **`/root/.openclaw/workspace/deploy.sh`**. It reads **`/root/.openclaw/workspace/agents.yaml`** (via `scripts/parse_agents.py`) to know which gateways exist:
 - **main** — `local-cli`: start/stop uses `openclaw gateway start` / `gateway status` (paths are **hardcoded in `agents.yaml`**).
 - **tongge** — `local-systemd`: unit `openclaw-gateway-tongge.service` installed from **`workspace/systemd/openclaw-gateway-tongge.service`**.
 | Command | What it does |
 |---------|----------------|
 | `./deploy.sh install` | `loginctl` linger, copies **`openclaw-gateway-user.service` → `~/.config/systemd/user/openclaw-gateway.service`**, installs tongge unit from template, installs **system** `openclaw-agent-monitor`, runs `fix-service`, **starts** all services. |
 | `./deploy.sh start` / `stop` / `restart` | Start/stop/restart all agents from `agents.yaml` + monitor. |
 | `./deploy.sh health` | Health check (user units + monitor + disk/memory/linger). |
 | `./deploy.sh fix-service` | Re-inject `EnvironmentFile=` into units after OpenClaw UI upgrade (see script header). |
 **Migration implications:**
 1. **If you `scp`’d live units from the old server** — they may differ from templates. Either keep using **manual** `systemctl --user enable --now` (Step 6), **or** merge your path edits into `workspace/systemd/*.service` and **`agents.yaml`**, then run `./deploy.sh install` on a **clean** tree (backup first). **`install` overwrites** `~/.config/systemd/user/openclaw-gateway.service` from `openclaw-gateway-user.service`.
 2. **Always update `agents.yaml`** after changing Node prefix: fields like `service.check_cmd` and `service.start_cmd` under **`main`** must point at the same `node`/`openclaw` paths as on disk (e.g. `/www/server/nodejs/v24.14.0/bin/openclaw`). Otherwise `./deploy.sh stop` / `health` / `start` will call the wrong binaries.
 3. **Tongge** does not use `start_cmd` in yaml; it uses the **systemd unit** only — edit **`ExecStart`** in the installed user unit (or template) for new Node paths.
 **Suggested workflow on the new server:** finish Steps 1–5 and the **pre-flight checklist** below; then either **Step 6** (explicit `systemctl`) **or** `cd /root/.openclaw/workspace && ./deploy.sh start` (after units exist and `agents.yaml` paths match). For a full reinstall of units from repo templates, use `./deploy.sh install` **only** when you intend to replace user units with templates.
 ---
 ## Step 5.9 — Pre-flight checklist (before starting gateways)
 Complete **before** `systemctl --user enable --now …` or `./deploy.sh start` / `install` to reduce boot-loop / missing-binary errors:
 **Environment & secrets**
 - [ ] `/root/.openclaw/workspace/systemd/gateway.env` and `tongge-gateway.env` exist (came with rsync).
 - [ ] `/root/.openclaw/.env` and `/root/.openclaw-tongge/.env` present if your setup expects them.
 **Node / OpenClaw / qmd paths (must be consistent)**
 - [ ] `which node`, `which openclaw`, `which qmd` on the new server — **one** prefix (e.g. all under `/www/server/nodejs/v24.x.x/`).
 - [ ] User systemd units: `ExecStart` paths updated (Step 4).
 - [ ] `/root/.openclaw-tongge/openclaw.json` → `memory.qmd.command` matches the real `qmd` binary (Step 4.5).
 - [ ] If using **`deploy.sh`**: `/root/.openclaw/workspace/agents.yaml` → **`main.service.check_cmd` / `main.service.start_cmd`** updated to the same `openclaw` path as above.
 **Network & dependencies**
 - [ ] If the new host has a **new Tailscale IP** or LLM moved: edit both **`openclaw.json`** files (Step 6.5).
 - [ ] If agents use Mem0 + local Qdrant: Docker stack in `/opt/mem0-center/` is **up** and `localhost:6333` reachable **before** expecting mem0 in gateway (order: Docker section can run **before** Step 6 if needed).
 - [ ] `python3` available for `parse_agents.py`, cron `memory_cleanup.py`, and optional `json.tool` checks.
 **systemd user session**
 - [ ] `loginctl enable-linger root` will be run (Step 6) so user gateways survive reboot.
 - [ ] `/root/.config/systemd/user/` contains the two gateway units (or you will run `./deploy.sh install` to generate them).
 **Optional: old server**
 - [ ] If using **cutover strategy** (see Overview): old server gateways stopped **now**, then proceed to Step 6 on the new server immediately.
 ### 运行前确认（环境依赖与配置对齐）
 在真正执行 Step 6（enable/start）之前，再做一次“跨组件对齐”确认，确保 Attachment（环境依赖）里的关键依赖都已就绪：
 - OneAPI：`openclaw-llm-gateway` 容器已启动，管理后台可达 `http://<TAILSCALE_IP>:3000`；主/桐哥的 `LLM_BASE_URL` 与 `LLM_API_KEY` 与 OneAPI 后台创建的 Key 对齐
 - Telegram：`TELEGRAM_BOT_TOKEN` 分别存在于主网关 env（`workspace/systemd/gateway.env`）和桐哥 env（`workspace/systemd/tongge-gateway.env`）
 - Control UI：若新机的 Tailscale IP/主机名变化，已同步两份 `openclaw.json` 的 `gateway.controlUi.allowedOrigins` 与 `gateway.trustedProxies`
 - Python3：`/usr/bin/python3` 存在并可执行（mem0-integration 需要）
 - qmd：`openclaw-tongge/openclaw.json` 里的 `memory.qmd.command` 指向新机实际 `qmd` 的绝对路径
 ---
 ## Step 6 — Enable and Start Services
 Complete **Step 5.9** first. Below is the **manual** `systemctl` path; equivalent using **`./deploy.sh`**: after units exist and **`agents.yaml`** paths match, run `cd /root/.openclaw/workspace && ./deploy.sh start` (or `./deploy.sh install` only when intentionally (re)installing units from templates — see Step 5.5).
 ```bash
 # User linger (once per machine): required for user-scoped gateways at boot
 loginctl enable-linger root
 systemctl daemon-reload
 systemctl enable --now openclaw-agent-monitor
 systemctl --user daemon-reload
 systemctl --user enable --now openclaw-gateway openclaw-gateway-tongge
 ```
 Verify:
 ```bash
 systemctl status openclaw-agent-monitor
 systemctl --user status openclaw-gateway
 systemctl --user status openclaw-gateway-tongge
 # Logs (pick one)
 journalctl --user -u openclaw-gateway -f
 journalctl --user -u openclaw-gateway-tongge -f
 ```
 **qmd 相关（仅在 Gateway 已启动后）**：若 Step 4.5 的命令行验证通过，但运行时仍怀疑 qmd 问题，再从 journal 中筛选（无输出表示近期无匹配行，属正常）：
 ```bash
 journalctl --user -u openclaw-gateway -n 80 | grep -i qmd
 journalctl --user -u openclaw-gateway-tongge -n 80 | grep -i qmd
 # 若仍使用 system 级 openclaw-gateway：
 journalctl -u openclaw-gateway -n 80 | grep -i qmd
 ```
 If you intentionally use **system** `openclaw-gateway.service` instead of the user unit, enable that unit and **avoid** running two main gateways at once.
 ---
 ## Step 6.5 — Update IPs and bind lists (if the new host differs)
 `openclaw.json` in both **`~/.openclaw`** and **`~/.openclaw-tongge`** may hardcode the previous machine’s Tailscale IP (e.g. `100.115.94.1`) under:
 - `models.providers.*.baseUrl` (e.g. local LLM / gateway at `:3000`)
 - `gateway.bind` / allowlists (e.g. `:18789`, `:18790`)
 If the new VPS gets a **different Tailscale address** or you change where the LLM API runs, search and update those URLs consistently in **both** config files. Skip this if the new server reuses the same Tailscale IP and service topology.
 **Mem0 skill** (`/root/.openclaw/workspace/skills/mem0-integration/config.yaml`) uses `localhost:6333` for Qdrant; keep Qdrant on the same host as OpenClaw or change `host`/`port` to match your Docker/Qdrant deployment.
 ---
 ## Step 7 — Verify Telegram Bots
 Both bots should respond within a minute of the gateways starting. Bot tokens are stored in `.env` files and systemd `EnvironmentFile`s — no changes needed when paths stay `/root/...`. Check:
 - Main gateway bot: `/root/.openclaw/.env` and `/root/.openclaw/workspace/systemd/gateway.env`
 - Tongge bot: `/root/.openclaw-tongge/.env` and `/root/.openclaw/workspace/systemd/tongge-gateway.env`
 ---
 ## What Is Portable (No Changes Needed)
 | Item | Location |
 |------|----------|
 | Main gateway config & agents | `/root/.openclaw/openclaw.json` |
 | Tongge gateway config & agents | `/root/.openclaw-tongge/openclaw.json` |
 | Main gateway secrets (systemd) | `/root/.openclaw/workspace/systemd/gateway.env` |
 | Tongge gateway secrets (systemd) | `/root/.openclaw/workspace/systemd/tongge-gateway.env` |
 | Bot tokens in tree | `/root/.openclaw/.env`, `/root/.openclaw-tongge/.env` |
 | Mem0 SQLite memory history | `/root/.mem0/history.db` |
 | Mem0 user identity | `/root/.mem0/config.json` |
 | Qdrant migration metadata (if present) | `/root/.mem0/migrations_qdrant/` |
 | Workspace scripts, skills, agents | `/root/.openclaw/workspace/` |
 | Tongge workspace | `/root/.openclaw-tongge/workspace/` |
 | Credentials & delivery queue | `credentials/`, `delivery-queue/` |
 ---
 ## What May Need Updating
 | Item | Action |
 |------|--------|
 | Node.js binary paths in **user + system** `.service` files | Update if install path differs from `/www/server/nodejs/v24.13.1/` |
 | `openclaw-gateway.service` (user or system) | Re-apply custom `EnvironmentFile` / `ExecStart` after some `openclaw update` flows; `gateway.env` survives if referenced |
 | Tailscale IPs / LLM base URLs in `openclaw.json` | See Step 6.5 when the host or upstream API address changes |
 | Qdrant vector store (if used) | See Docker stack section below |
 | `node_modules` in `~/.openclaw-tongge/` | Re-run `npm install` in that directory if any native modules break |
 ---
 ## Docker Stack Migration (Qdrant + Dozzle)
 Install Docker Engine and the Compose plugin on the new server before bringing the stack up (`docker compose` v2). Both services are managed by `/opt/mem0-center/docker-compose.yml`.
 ### Step A — Copy the compose stack
 ```bash
 rsync -avz --progress /opt/mem0-center/ root@NEW_SERVER_IP:/opt/mem0-center/
 ```
 This includes `docker-compose.yml`, `qdrant_storage/` (vector data), and `snapshots/`.
 ### Step B — Update Dozzle port binding on the new server
 Dozzle **must** bind to the new server's Tailscale IP. Edit the compose file after copying:
 ```bash
 # Get new server's Tailscale IP
 tailscale ip -4
 # Update the port binding (replace OLD_IP with the value above)
 nano /opt/mem0-center/docker-compose.yml
 # Change: "100.115.94.1:9999:8080"
 # To:     "<NEW_TAILSCALE_IP>:9999:8080"
 ```
 > Binding to `127.0.0.1` breaks remote access. Binding to `0.0.0.0` exposes the port publicly.
 > Always bind to the Tailscale IP specifically. See `DOZZLE_LOG_OBSERVABILITY.md` for details.
 ### Step C — Start the stack
 ```bash
 cd /opt/mem0-center
 docker compose up -d
 ```
 Verify both containers are healthy:
 ```bash
 docker ps --format "table {{.Names}}\t{{.Status}}"
 # Expected: qdrant-master Up ... (healthy)
 #           dozzle        Up ... (healthy)
 ```
 Access Dozzle at `http://<NEW_TAILSCALE_IP>:9999`.
 ### Healthcheck note
 Both containers use non-standard healthchecks (no `wget`/`curl` in images):
 - **Dozzle**: `["CMD", "/dozzle", "healthcheck"]` — built-in binary, no shell needed
 - **Qdrant**: `CMD-SHELL` with bash `/dev/tcp` TCP probe
 If you upgrade either image and healthcheck breaks, refer to `DOZZLE_LOG_OBSERVABILITY.md §四`.
 ---
 ## Post-Migration Checklist
 - [ ] **Step 5.9 pre-flight** completed (paths, env files, optional Docker Qdrant, `agents.yaml` if using `deploy.sh`)
 - [ ] Node.js v24 installed and at correct path
 - [ ] Global npm packages installed (`openclaw`, `clawhub`, `mcporter`, `pnpm` — match Step 1)
 - [ ] `~/.openclaw` copied and permissions intact (`chmod 700`)
 - [ ] `~/.openclaw-tongge` copied and permissions intact (`chmod 700`)
 - [ ] `~/.mem0` copied (including `history.db` and `migrations_qdrant/` if present)
 - [ ] **`agents.yaml`** `main.service.*` paths match real `openclaw` if using `./deploy.sh` (Step 5.5)
 - [ ] Systemd: `openclaw-agent-monitor.service` + user units `openclaw-gateway.service` and `openclaw-gateway-tongge.service` installed; paths verified
 - [ ] `loginctl enable-linger root` (for user-scoped gateways at boot)
 - [ ] `systemctl daemon-reload` and `systemctl --user daemon-reload` run
 - [ ] Gateways + monitor started (**Step 6** manual **or** `./deploy.sh start` / `install` per Step 5.5)
 - [ ] **QMD 内存后端已安装并验证**（Step 4.5）：
  - [ ] 执行 `/www/server/nodejs/v24.13.1/bin/npm install -g @tobilu/qmd`
  - [ ] `ls -la /www/server/nodejs/v24.13.1/bin/qmd` — 指向 `../lib/node_modules/@tobilu/qmd/bin/qmd`（非 cache 目录）
  - [ ] `/www/server/nodejs/v24.13.1/bin/qmd collection list` — 无报错
  - [ ] 网关日志中无 `qmd ENOENT` 或 `better-sqlite3` 错误
 - [ ] Tongge cron jobs loaded (check `nextRunAtMs` in `/root/.openclaw-tongge/cron/jobs.json`)
 - [ ] mem0 cleanup cron restored (`/etc/cron.d/mem0-cleanup`)
 - [ ] Telegram bots responding
 - [ ] `/opt/mem0-center/` copied to new server
 - [ ] Dozzle port binding updated to new server's Tailscale IP
 - [ ] `docker compose up -d` run in `/opt/mem0-center/`
 - [ ] `qdrant-master` and `dozzle` both show `(healthy)`
 - [ ] Dozzle accessible at `http://<NEW_TAILSCALE_IP>:9999`
 - [ ] Old server services stopped (to avoid duplicate bot responses)
 ---
 ## Stopping the Old Server After Migration
 **When:** After the **new** server is confirmed working (strategy **A** in Overview). If you already stopped gateways on the old box before Step 6 (strategy **B**), only disable what is still running and skip duplicate `stop` commands.
 Stop the old services to avoid duplicate Telegram bot responses:
 ```bash
 # On the OLD server — preferred: deploy script stops agents.yaml + monitor
 cd /root/.openclaw/workspace && ./deploy.sh stop
 # Or stop user units + monitor manually:
 systemctl --user stop openclaw-gateway openclaw-gateway-tongge
 systemctl --user disable openclaw-gateway openclaw-gateway-tongge
 systemctl stop openclaw-agent-monitor
 systemctl disable openclaw-agent-monitor
 # If you still had a system-level main gateway enabled:
 systemctl stop openclaw-gateway 2>/dev/null || true
 systemctl disable openclaw-gateway 2>/dev/null || true
 ```