1. 添加 healthcheck
和 labels
healthcheck
可以定时检查 docker 容器内部的服务是不是在正常执行。
version: '3.7'
services:
secure_monitor:
build:
context: ./build
dockerfile: ./Dockerfile
image: secure-monitor:1.0
container_name: secure_monitor
command:
- "python3"
- "main.py"
- "-f"
- "/app/secure"
user: root
network_mode: bridge
privileged: true
restart: always
volumes:
- ./app:/app
- ./logs:/logs
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"] # exit code 1: not health ; exit code 0: health
interval: 30s
timeout: 10s
retries: 3
start_period: 10s
labels:
autoheal: "true"
参考:https://docs.docker.com/compose/compose-file/compose-file-v3/
2. 自动重启 unhealthy 的服务
2.1 增加 autoheal 服务
version: "3.7"
services:
autoheal:
restart: always
image: willfarrell/autoheal
container_name: autoheal
environment:
- AUTOHEAL_CONTAINER_LABEL=all
volumes:
- /var/run/docker.sock:/var/run/docker.sock
关于 autoheal 服务的环境变量定义如下:
- a) Apply the label
autoheal=true
to your container to have it watched. - b) Set ENV
AUTOHEAL_CONTAINER_LABEL=all
to watch all running containers. - c) Set ENV
AUTOHEAL_CONTAINER_LABEL
to existing label name that has the value true.
⚠️Note: You must apply HEALTHCHECK to your docker images first
. See
对于c) 的理解,如果 AUTOHEAL_CONTAINER_LABEL=abc
,那么就只会重启 labels 设置了 abc=true
的服务?
更多环境变量(https://github.com/willfarrell/docker-autoheal):
AUTOHEAL_CONTAINER_LABEL=autoheal
AUTOHEAL_INTERVAL=5 # check every 5 seconds
AUTOHEAL_START_PERIOD=0 # wait 0 seconds before first health check
AUTOHEAL_DEFAULT_STOP_TIMEOUT=10 # Docker waits max 10 seconds (the Docker default) for a container to stop before killing during restarts (container overridable via label, see below)
DOCKER_SOCK=/var/run/docker.sock # Unix socket for curl requests to Docker API
CURL_TIMEOUT=30 # --max-time seconds for curl requests to Docker API
WEBHOOK_URL="" # post message to the webhook if a container was restarted (or restart failed)
参考: - https://wshs0713.github.io/posts/b8226bad/ - https://hub.docker.com/r/willfarrell/autoheal