You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
186 lines
5.4 KiB
186 lines
5.4 KiB
|
1 week ago
|
# Dozzle 容器日志可观测性平台
|
||
|
|
|
||
|
|
> 文档版本:2026-03-15
|
||
|
|
> 适用节点:所有部署了 Docker 容器的 OpenClaw 节点
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 一、为什么部署 Dozzle
|
||
|
|
|
||
|
|
在 OpenClaw 多 Agent 架构中,排查问题(Qdrant 写入失败、Mem0 客户端异常、API 网关超时)传统上需要 SSH 登录服务器并手动执行 `docker logs`。随着容器数量增加,跨容器追踪错误链路的成本显著上升。
|
||
|
|
|
||
|
|
Dozzle 解决了以下核心痛点:
|
||
|
|
|
||
|
|
| 痛点 | Dozzle 方案 |
|
||
|
|
|------|------------|
|
||
|
|
| 每次排障需 SSH 登录 | 浏览器直接访问,免 SSH |
|
||
|
|
| 多容器日志分散,难以关联 | 统一 Web 界面,支持多容器聚合与分屏对比 |
|
||
|
|
| ELK/Loki 等方案资源占用高 | 纯内存流式读取,无状态,二进制体积极小 |
|
||
|
|
| 日志检索效率低 | 内置全文检索与正则过滤 |
|
||
|
|
| 公网暴露风险 | 绑定 Tailscale 接口,零公网攻击面 |
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 二、架构定位
|
||
|
|
|
||
|
|
```
|
||
|
|
[浏览器]
|
||
|
|
|
|
||
|
|
| http://<Tailscale_IP>:9999 (仅 Tailscale 内网可达)
|
||
|
|
|
|
||
|
|
[Dozzle 容器]
|
||
|
|
| 挂载 /var/run/docker.sock (只读流式)
|
||
|
|
|
|
||
|
|
[Docker Engine]
|
||
|
|
|-- qdrant-master
|
||
|
|
|-- openclaw-llm-gateway
|
||
|
|
|-- dozzle (自身)
|
||
|
|
|-- ss
|
||
|
|
|-- ... 其他 Agent 容器
|
||
|
|
```
|
||
|
|
|
||
|
|
Dozzle 通过挂载 Docker Socket 实时读取所有容器的 stdout/stderr,**不存储任何日志**,不影响容器本身的 `json-file` 日志驱动。
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 三、当前部署配置(中心节点 vps-vaym)
|
||
|
|
|
||
|
|
**节点信息**
|
||
|
|
- Tailscale IP:`100.115.94.1`
|
||
|
|
- 访问地址:`http://100.115.94.1:9999`
|
||
|
|
- Compose 文件:`/opt/mem0-center/docker-compose.yml`
|
||
|
|
|
||
|
|
**关键配置说明**
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
dozzle:
|
||
|
|
image: amir20/dozzle:latest
|
||
|
|
container_name: dozzle
|
||
|
|
ports:
|
||
|
|
- "100.115.94.1:9999:8080" # 绑定 Tailscale 接口,公网不可达
|
||
|
|
volumes:
|
||
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
||
|
|
environment:
|
||
|
|
- DOZZLE_BASE=/
|
||
|
|
- DOZZLE_LEVEL=info
|
||
|
|
- DOZZLE_TAILSIZE=300 # 每个容器默认显示最近 300 行
|
||
|
|
healthcheck:
|
||
|
|
test: ["CMD", "/dozzle", "healthcheck"] # 使用内置 healthcheck,无需 wget/curl
|
||
|
|
interval: 30s
|
||
|
|
timeout: 10s
|
||
|
|
retries: 3
|
||
|
|
```
|
||
|
|
|
||
|
|
**端口绑定设计原则**:必须绑定到 Tailscale 接口 IP(而非 `0.0.0.0` 或 `127.0.0.1`):
|
||
|
|
- `0.0.0.0` → 暴露公网,违反零信任策略
|
||
|
|
- `127.0.0.1` → 只有本机可访问,无法通过 Tailscale 远程查看
|
||
|
|
- `100.115.94.1` → 仅 Tailscale 网络内的授权设备可访问 ✓
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 四、Healthcheck 说明
|
||
|
|
|
||
|
|
Dozzle 基于 scratch 镜像构建,容器内**没有 shell、wget、curl**,标准的 `CMD-SHELL` 方式会失败。正确方式是使用 Dozzle v8+ 内置的 healthcheck 子命令:
|
||
|
|
|
||
|
|
```yaml
|
||
|
|
# 错误(wget/shell 不存在)
|
||
|
|
test: ["CMD", "wget", "-q", "--spider", "http://localhost:8080/"]
|
||
|
|
|
||
|
|
# 正确(使用 Dozzle 内置命令,exec 形式,无需 shell)
|
||
|
|
test: ["CMD", "/dozzle", "healthcheck"]
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 五、迁移指南
|
||
|
|
|
||
|
|
### 5.1 迁移到新服务器
|
||
|
|
|
||
|
|
Dozzle 本身**无状态**,迁移仅需在新节点重新运行容器,无需备份任何数据。
|
||
|
|
|
||
|
|
**步骤:**
|
||
|
|
|
||
|
|
1. 确认新节点已加入 Tailscale,获取其 Tailscale IP:
|
||
|
|
```bash
|
||
|
|
tailscale ip -4
|
||
|
|
```
|
||
|
|
|
||
|
|
2. 将以下 snippet 加入新节点的 `docker-compose.yml`,替换 `<TAILSCALE_IP>`:
|
||
|
|
```yaml
|
||
|
|
dozzle:
|
||
|
|
image: amir20/dozzle:latest
|
||
|
|
container_name: dozzle
|
||
|
|
ports:
|
||
|
|
- "<TAILSCALE_IP>:9999:8080"
|
||
|
|
volumes:
|
||
|
|
- /var/run/docker.sock:/var/run/docker.sock
|
||
|
|
environment:
|
||
|
|
- DOZZLE_BASE=/
|
||
|
|
- DOZZLE_LEVEL=info
|
||
|
|
- DOZZLE_TAILSIZE=300
|
||
|
|
restart: unless-stopped
|
||
|
|
healthcheck:
|
||
|
|
test: ["CMD", "/dozzle", "healthcheck"]
|
||
|
|
interval: 30s
|
||
|
|
timeout: 10s
|
||
|
|
retries: 3
|
||
|
|
logging:
|
||
|
|
driver: "json-file"
|
||
|
|
options:
|
||
|
|
max-size: "50m"
|
||
|
|
max-file: "2"
|
||
|
|
```
|
||
|
|
|
||
|
|
3. 启动:
|
||
|
|
```bash
|
||
|
|
docker compose up -d dozzle
|
||
|
|
```
|
||
|
|
|
||
|
|
4. 在本机(或任何 Tailscale 网络内的设备)浏览器访问:
|
||
|
|
```
|
||
|
|
http://<新节点_TAILSCALE_IP>:9999
|
||
|
|
```
|
||
|
|
|
||
|
|
### 5.2 整体服务迁移(随 mem0-center 栈迁移)
|
||
|
|
|
||
|
|
Dozzle 是无状态服务,随 `docker-compose.yml` 一同迁移即可,无额外备份需求。详见 `SERVER_MIGRATION_GUIDE.md`。
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 六、常见问题
|
||
|
|
|
||
|
|
### 浏览器无法访问 :9999
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 检查容器是否运行
|
||
|
|
docker ps | grep dozzle
|
||
|
|
|
||
|
|
# 检查端口绑定(应显示 Tailscale IP,而非 127.0.0.1)
|
||
|
|
docker port dozzle
|
||
|
|
|
||
|
|
# 检查 Tailscale 连通性
|
||
|
|
tailscale ping <目标节点名>
|
||
|
|
```
|
||
|
|
|
||
|
|
### 容器显示 unhealthy
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker inspect dozzle --format='{{json .State.Health.Log}}' | python3 -m json.tool
|
||
|
|
```
|
||
|
|
|
||
|
|
常见原因:healthcheck 配置使用了 `wget` 或 `CMD-SHELL`,参见第四节修正。
|
||
|
|
|
||
|
|
### Dozzle 界面显示空列表(无容器)
|
||
|
|
|
||
|
|
检查是否设置了 `DOZZLE_FILTER`:该环境变量使用 Docker 过滤语法,`name=foo` 仅匹配容器名**包含** `foo` 的容器。如无特殊过滤需求,删除该变量即可显示所有容器。
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## 七、扩展:作为 Sidecar 部署到远端 Agent 节点
|
||
|
|
|
||
|
|
当犇犇节点或其他远端 Agent 节点排障频率较高时,可将 Dozzle 作为标准 Sidecar 写入节点部署模板(`docker-compose.yml.tpl`)。
|
||
|
|
|
||
|
|
部署后在总部即可通过 `http://<远端节点_Tailscale_IP>:9999` 直接可视化远端 Agent 运行日志,无需 SSH,与零信任网络策略完全兼容。
|
||
|
|
|
||
|
|
模板变量:`${TAILSCALE_IP}` — 在节点初始化脚本中通过 `tailscale ip -4` 动态注入。
|