一、使用 Docker Swarm 管理 TDengine+Nginx
- docker版本为26。
- 使用docker swarm 进行集群管理。
- 使用taosAdapter实现RESTful 接口访问。
- 使用nginx代理访问集群。
二、架构设计
基于TDengine 节点 IP:192.168.0.1/db1、192.168.0.2/db2、192.168.0.3/db3,设计 Swarm 集群架构如下:
- Swarm 集群角色划分
- Swarm Manager 节点:选择 1 台服务器作为 Manager(如 db1:192.168.0.1),负责集群编排(部署服务、管理副本);
- Swarm Worker 节点:db2(192.168.0.2)、db3(192.168.0.3)作为 Worker,负责运行 TDengine/Nginx 容器;
- 注:生产环境建议配置 3 个 Manager 节点(避免 Manager 单点故障),此处为简化演示用 1 个 Manager。
- 服务组件设计
服务名称 | 作用 | 部署方式 | 关键配置 |
---|---|---|---|
tdengine-cluster | TDengine 3.3.6 三节点集群 | Swarm Service(3 个副本,每个节点 1 个) | 挂载数据卷持久化数据,配置集群参数 |
tdengine-taosAdapter | restful 通信,nginx与td集群之间的桥梁和适配器 | Swarm Service(3 个副本,分布在不同 Worker) | 如需上报监控,需要配置keeper的地址 |
tdengine-taosKeeper | TDengine 3.0 版本监控指标的导出工具,可获取 TDengine 的运行状态 | Swarm Service(1 个副本,) | 配置adapter的地址 |
tdengine-taosExplorer | 可视化管理交互工具的 web 服务 | Swarm Service(1 个副本,) | 配置集群地址,既adapter的地址 |
nginx-proxy | 反向代理 TDengine 的 6041 端口,restful,实现高可用。代理6060端口,访问taosExplorer服务。 | Swarm Service(2 个副本,分布在不同 Worker) | 配置反向代理规则,对接 TDengine 服务地址 |
三、部署Docker Swarm
1.配置主机名解析
在所有节点的/etc/hosts文件中添加以下内容,确保节点间可以通过 hostname 相互访问:
echo "192.168.0.1 db1" >> /etc/hosts
echo "192.168.0.2 db2" >> /etc/hosts
echo "192.168.0.3 db3" >> /etc/hosts
修改3个服务器的hostname 分别为 db1 db2 db3。
2.初始化 Docker Swarm 集群
2.1 在 Manager 节点(db1:192.168.0.1)初始化 Swarm
# 初始化Swarm,指定Manager节点的IP(需是其他节点可访问的IP)
docker swarm init --advertise-addr 192.168.0.1# 如果要新增管理节点,在现有管理节点上执行
docker swarm join-token manager
- 执行成功后,会输出Worker 节点加入集群的命令(类似如下,需保存备用):
docker swarm join --token SWMTKN-1-xxxxxx-xxxxxx 192.168.0.1:2377
2.2 其他节点(db2、db3)加入 Swarm 作为 Worker
在 db2(192.168.0.2)和 db3(192.168.0.3)上执行上面步骤输出的docker swarm join命令:
# db2和db3执行相同命令(替换为实际token)
docker swarm join --token SWMTKN-1-xxxxxx-xxxxxx 192.168.0.1:2377
2.3 验证 Swarm 集群状态(在 Manager 节点执行)
# 查看Swarm节点列表,确认3个节点均在线(db1为Leader,db2/db3为Worker)
docker node ls
- 正常输出示例:
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION
abc123xxxxxx (leader) db1 Ready Active Leader 24.0.5
def456xxxxxx db2 Ready Active 24.0.5
ghi789xxxxxx db3 Ready Active 24.0.5
四、部署TDengine集群
1.创建 本地目录(持久化 TDengine 数据)
登录到每个 TDengine 节点(db1、db2、db3),手动创建独立的本地目录(确保路径唯一,避免混淆):
# 1. 在db1(192.168.0.1)创建目录
mkdir -p /mnt/tdengine-cluster/td1/lib && mkdir -p /mnt/tdengine-cluster/td1/log && chmod -R 777 /mnt/tdengine-cluster/td1
# (权限777是为了让容器内的TDengine进程能读写,生产环境可根据用户组优化,避免过宽权限)# 2. 在db2(192.168.0.2)创建目录
mkdir -p /mnt/tdengine-cluster/td2/lib && mkdir -p /mnt/tdengine-cluster/td2/log && chmod -R 777 /mnt/tdengine-cluster/td2# 3. 在db3(192.168.0.3)创建目录
mkdir -p /mnt/tdengine-cluster/td3/lib && mkdir -p /mnt/tdengine-cluster/td3/log && chmod -R 777 /mnt/tdengine-cluster/td3
2.编写 Swarm Compose 文件(部署 TDengine+Nginx)
在 Manager 节点创建docker-compose-swarm.yml文件(Swarm 使用docker stack deploy命令部署,配置文件兼容 Docker Compose 语法,新增deploy字段定义 Swarm 规则):
version: '3.8'# 定义网络(Swarm内部服务通信,使用overlay网络,跨节点通信)
networks:tdengine-network:driver: overlay # overlay网络支持跨节点容器通信attachable: true # 允许外部容器接入(可选)# 定义服务
services:############################################################################ 1. TDengine服务(3个副本,每个节点1个,通过constraint绑定到指定节点)###########################################################################tdengine-db1:image: tdengine/tdengine:3.3.6.13hostname: db1networks:- tdengine-networkenvironment:- TZ=Asia/Shanghai
# - TAOS_NUM_OF_MNODES=3volumes:- /mnt/tdengine-cluster/td1/lib:/var/lib/taos # 持久化数据- /mnt/tdengine-cluster/td1/log:/var/log/taos # 持久化日志- /mnt/tdengine-cluster/cfg/taos.cfg:/etc/taos/taos.cfg # 配置deploy:placement:constraints: [node.hostname == db1] # 强制部署到db1节点
# resources:
# limits:
# cpus: '2'
# memory: 4Grestart_policy:condition: any # 任何情况下都重启(包括容器退出码非0)max_attempts: 3 # 最大重启次数(避免无限重启)window: 60s # 重启间隔窗口healthcheck:test: ["CMD", "taos", "-s", "show dnodes;"] # 执行SQL检查节点状态interval: 10s # 每10秒检查一次timeout: 5s # 检查超时时间retries: 3 # 连续3次失败视为不健康start_period: 60s # 启动后60秒内不检查(给初始化留时间)command: taosdtdengine-db2:image: tdengine/tdengine:3.3.6.13hostname: db2networks:- tdengine-network# environment:# - TZ=Asia/Shanghai# - TAOS_FQDN=db2volumes:- /mnt/tdengine-cluster/td2/lib:/var/lib/taos- /mnt/tdengine-cluster/td2/log:/var/log/taos- /mnt/tdengine-cluster/cfg/taos.cfg:/etc/taos/taos.cfgdeploy:placement:constraints: [node.hostname == db2] # 强制部署到db2节点
# privileged: truerestart_policy:condition: anymax_attempts: 3window: 60sdepends_on:- tdengine-db1healthcheck:test: ["CMD", "taos", "-s", "show dnodes;"]interval: 10stimeout: 5sretries: 3start_period: 60scommand: taosdtdengine-db3:image: tdengine/tdengine:3.3.6.13hostname: db3networks:- tdengine-network# environment:# - TAOS_FQDN=db3volumes:- /mnt/tdengine-cluster/td3/lib:/var/lib/taos- /mnt/tdengine-cluster/td3/log:/var/log/taos- /mnt/tdengine-cluster/cfg/taos.cfg:/etc/taos/taos.cfgdeploy:placement:constraints: [node.hostname == db3] # 强制部署到db3节点restart_policy:condition: anymax_attempts: 3window: 60sdepends_on:- tdengine-db2healthcheck:test: ["CMD", "taos", "-s", "show dnodes;"]interval: 10stimeout: 5sretries: 3start_period: 60scommand: taosd############################################################################ 2. taosAdapter是 TDengine TSDB 的配套工具,是集群和应用程序之间的桥梁和适配器###########################################################################tdengine-adapter:image: tdengine/tdengine:3.3.6.13hostname: tdengine-adapternetworks:- tdengine-networkenvironment:- TAOS_FQDN=tdengine-adapter- TAOS_FIRST_EP=db1# - UPLOADKEEPER_ENABLE=false # 优化:关闭Keeper上报,消除报错(若不部署taosKeeper)# - TAOS_SECOND_EP=db2volumes:- /mnt/tdengine-cluster/cfg/taosadapter.toml:/etc/taos/taosadapter.tomldeploy:replicas: 3restart_policy:condition: anymax_attempts: 3window: 60sdepends_on:- tdengine-db3healthcheck:# 分层检查:1. 基础可用性(无参数);2. 查询场景可用性(带action=query)test: |curl -f -s -o /dev/null -w "%{http_code}" http://localhost:6041/-/ping | grep -q "200" && \curl -f -s -o /dev/null -w "%{http_code}" "http://localhost:6041/-/ping?action=query" | grep -q "200"interval: 10stimeout: 5sretries: 3start_period: 60scommand: taosadapter############################################################################ 3. taoskeeper TDengine TSDB 监控工具###########################################################################tdengine-taoskeeper:image: tdengine/tdengine:3.3.6.13hostname: tdengine-taoskeepernetworks:- tdengine-networkenvironment:- TAOS_FQDN=tdengine-taoskeeper- TAOS_FIRST_EP=db1# - TAOS_SECOND_EP=db2volumes:- /mnt/tdengine-cluster/cfg/taoskeeper.toml:/etc/taos/taoskeeper.tomldeploy:# replicas: 1restart_policy:condition: anymax_attempts: 3window: 60sdepends_on:- tdengine-adapterhealthcheck:# 1. 检查接口返回HTTP 200;2. 检查响应体包含版本信息(确保接口正常工作)test: |curl -f -s -o /tmp/health.json -w "%{http_code}" http://localhost:6043/check_health | grep -q "200" && \cat /tmp/health.json | grep -q "\"version\":"interval: 10stimeout: 5sretries: 3start_period: 60scommand: taoskeeper############################################################################ 4. taosExplorer TDengine TSDB 视化管理交互工具的 web 服务###########################################################################tdengine-explorer:image: tdengine/tdengine:3.3.6.13hostname: tdengine-explorernetworks:- tdengine-networkenvironment:- TAOS_FQDN=tdengine-explorer- TAOS_FIRST_EP=db1# - TAOS_SECOND_EP=db2volumes:- /mnt/tdengine-cluster/cfg/explorer.toml:/etc/taos/explorer.tomldeploy:# replicas: 1restart_policy:condition: anymax_attempts: 3window: 60sdepends_on:- tdengine-adaptercommand: taos-explorer########################################################################### 5. nginx-proxy 负载均衡Nginx,实现Nginx高可用##########################################################################nginx-proxy:image: nginx:alpinenetworks:- tdengine-networkports:- "6041:6041" # 暴露6041端口(客户端访问此端口)- "6060:6060" # explorervolumes:# 挂载Nginx配置文件- /mnt/tdengine-cluster/cfg/explorer-nginx.conf:/etc/nginx/conf.d/explorer.confdeploy:replicas: 2 # 部署2个副本,实现Nginx高可用placement:# 避免2个Nginx副本部署在同一节点(通过hostname分散)constraints:- node.hostname != db1 # 可选,不让Nginx占用Manager节点资源
# resources:
# limits:
# cpus: '1'
# memory: 512M# Swarm内置负载均衡:客户端访问任意节点的6041端口,自动转发到健康的Nginx副本restart_policy:condition: anymax_attempts: 3window: 60smode: replicateddepends_on:- tdengine-adaptercommand: ["sh","-c","while true;do curl -s http://tdengine-adapter:6041/-/ping >/dev/null && break;done;printf 'server{listen 6041;location /{proxy_pass http://tdengine-adapter:6041;}}'> /etc/nginx/conf.d/rest.conf;cat /etc/nginx/nginx.conf;nginx -g 'daemon off;'",]
注意:
1.在配置中使用了 depends_on
控制调度优先级,用健康检查 healthcheck
定义就绪标准,用启动脚本command
轮询等待前驱服务就绪。保证在 Swarm 模式下,严格按照按 db1 → db2 → db3 → nginx
的顺序创建服务。
2.每个td服务需要配置hostname,进行集群节点间通过fqdn通信。
3.配置 Nginx 反向代理(适配 Swarm 服务通信)
需要使用nginx来代理访问td集群,同时 taosExplorer 可视化web界面也通过nginx代理。
核心配置需注意:Nginx 转发目标为 Swarm 内部的 TDengine 服务名(而非 IP),因为 Swarm 内置 DNS 会自动解析服务名到对应的容器 IP。
对于Restful 需要的taosAdapter访问转发,此处使用了printf 添加到容器的配置中,没有额外配置文件。
对于 taosExplorer ,新建配置文件explorer-nginx.conf,来进行跨越访问等代理。
explorer-nginx.conf
server {listen 6060;location ~* {proxy_pass http://explorer;if ($request_method = 'OPTIONS') {add_header 'Access-Control-Allow-Origin' '*';add_header 'Access-Control-Allow-Credentials' 'true';add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';add_header 'Access-Control-Max-Age' 86400;add_header 'Content-Type' 'text/plain charset=UTF-8';add_header 'Content-Length' 0;return 204; break;}if ($request_method = 'POST') {add_header 'Access-Control-Allow-Origin' '*';add_header 'Access-Control-Allow-Credentials' 'true';add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';}if ($request_method = 'GET') {add_header 'Access-Control-Allow-Origin' '*';add_header 'Access-Control-Allow-Credentials' 'true';add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';add_header 'Access-Control-Allow-Headers' 'DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type';}proxy_set_header Host $host:$server_port;proxy_set_header X-Real-IP $remote_addr;#proxy_http_version 1.1;proxy_read_timeout 60s;proxy_next_upstream error http_502 http_500 non_idempotent;}
}
upstream explorer{ip_hash;server tdengine-explorer:6060;
# server db1:6060 ;
# server db2:6060 ;
# server db3:6060 ;
}
4.其他配置文件
这些配置文件可从单体允许td中导出,再进行修改。
taosadapter.toml
# Enable pprof debug mode. If set to true, pprof debugging is enabled.
debug = true# The directory where TDengine's configuration file (taos.cfg) is located.
taosConfigDir = ""# The port on which the server listens.
port = 6041# When the server returns an error, use a non-200 HTTP status code if set to true.
httpCodeServerError = false# Automatically create the database when writing data with the schemaless feature if set to true.
smlAutoCreateDB = false# Instance ID of the taosAdapter.
instanceId = 32# The maximum number of concurrent calls allowed for the C synchronized method.0 means use CPU core count.
#maxSyncConcurrentLimit = 0# The maximum number of concurrent calls allowed for the C asynchronous method. 0 means use CPU core count.
#maxAsyncConcurrentLimit = 0[cors]
# If set to true, allows cross-origin requests from any origin (CORS).
allowAllOrigins = true[pool]
# The maximum number of connections to the server. If set to 0, use cpu count * 2.
# maxConnect = 0# The maximum number of idle connections to the server. Should match maxConnect.
# maxIdle = 0# The maximum number of connections waiting to be established. 0 means no limit.
maxWait = 0# Maximum time to wait for a connection. 0 means no timeout.
waitTimeout = 60[ssl]
# Enable SSL. Applicable for the Enterprise Edition.
enable = false
certFile = ""
keyFile = ""[log]
# The directory where log files are stored.
# path = "/var/log/taos"# The log level. Options are: trace, debug, info, warning, error.
level = "info"# Number of log file rotations before deletion.
rotationCount = 30# The number of days to retain log files.
keepDays = 30# The maximum size of a log file before rotation.
rotationSize = "1GB"# If set to true, log files will be compressed.
compress = false# Minimum disk space to reserve. Log files will not be written if disk space falls below this limit.
reservedDiskSize = "1GB"# Enable logging of HTTP SQL queries.
enableRecordHttpSql = false# Number of HTTP SQL log rotations before deletion.
sqlRotationCount = 2# Time interval for rotating HTTP SQL logs.
sqlRotationTime = "24h"# Maximum size of HTTP SQL log files before rotation.
sqlRotationSize = "1GB"[monitor]
# If set to true, disables monitoring.
disable = true# Interval for collecting metrics.
collectDuration = "3s"# Indicates if running inside a Docker container.
incgroup = false# When memory usage reaches this percentage, query execution will be paused.
pauseQueryMemoryThreshold = 70# When memory usage reaches this percentage, both queries and inserts will be paused.
pauseAllMemoryThreshold = 80# The identity of the current instance. If empty, it defaults to 'hostname:port'.
identity = ""[uploadKeeper]
# Enable uploading of metrics to TaosKeeper.
enable = true# URL of the TaosKeeper service to which metrics will be uploaded.
url = "http://tdengine-taoskeeper:6043/adapter_report"# Interval for uploading metrics.
interval = "15s"# Timeout for uploading metrics.
timeout = "5s"# Number of retries when uploading metrics fails.
retryTimes = 3# Interval between retries for uploading metrics.
retryInterval = "5s"[opentsdb]
# Enable the OpenTSDB HTTP plugin.
enable = true[influxdb]
# Enable the InfluxDB plugin.
enable = true[statsd]
# Enable the StatsD plugin.
enable = false# The port on which the StatsD plugin listens.
port = 6044# The database name used by the StatsD plugin.
db = "statsd"# The username used to connect to the TDengine database.
user = "root"# The password used to connect to the TDengine database.
password = "taosdata"# The number of worker threads for processing StatsD data.
worker = 10# Interval for gathering StatsD metrics.
gatherInterval = "5s"# The network protocol used by StatsD (e.g., udp4, tcp).
protocol = "udp4"# Maximum number of TCP connections allowed for StatsD.
maxTCPConnections = 250# If set to true, enables TCP keep-alive for StatsD connections.
tcpKeepAlive = false# Maximum number of pending messages StatsD allows.
allowPendingMessages = 50000# If set to true, deletes the counter cache after gathering metrics.
deleteCounters = true# If set to true, deletes the gauge cache after gathering metrics.
deleteGauges = true# If set to true, deletes the set cache after gathering metrics.
deleteSets = true# If set to true, deletes the timing cache after gathering metrics.
deleteTimings = true[collectd]
# Enable the Collectd plugin.
enable = false# The port on which the Collectd plugin listens.
port = 6045# The database name used by the Collectd plugin.
db = "collectd"# The username used to connect to the TDengine database.
user = "root"# The password used to connect to the TDengine database.
password = "taosdata"# Number of worker threads for processing Collectd data.
worker = 10[opentsdb_telnet]
# Enable the OpenTSDB Telnet plugin.
enable = false# Maximum number of TCP connections allowed for the OpenTSDB Telnet plugin.
maxTCPConnections = 250# If set to true, enables TCP keep-alive for OpenTSDB Telnet connections.
tcpKeepAlive = false# List of databases to which OpenTSDB Telnet plugin writes data.
dbs = ["opentsdb_telnet", "collectd", "icinga2", "tcollector"]# The ports on which the OpenTSDB Telnet plugin listens, corresponding to each database.
ports = [6046, 6047, 6048, 6049]# The username used to connect to the TDengine database for OpenTSDB Telnet.
user = "root"# The password used to connect to the TDengine database for OpenTSDB Telnet.
password = "taosdata"# Batch size for processing OpenTSDB Telnet data.
batchSize = 1# Interval between flushing data to the database. 0 means no interval.
flushInterval = "0s"[node_exporter]
# Enable the Node Exporter plugin.
enable = false# The database name used by the Node Exporter plugin.
db = "node_exporter"# The username used to connect to the TDengine database.
user = "root"# The password used to connect to the TDengine database.
password = "taosdata"# List of URLs to gather Node Exporter metrics from.
urls = ["http://localhost:9100"]# Timeout for waiting for a response from the Node Exporter plugin.
responseTimeout = "5s"# Username for HTTP authentication, if applicable.
httpUsername = ""# Password for HTTP authentication, if applicable.
httpPassword = ""# Bearer token for HTTP requests, if applicable.
httpBearerTokenString = ""# Path to the CA certificate file for SSL validation.
caCertFile = ""# Path to the client certificate file for SSL validation.
certFile = ""# Path to the client key file for SSL validation.
keyFile = ""# If set to true, skips SSL certificate verification.
insecureSkipVerify = true# Interval for gathering Node Exporter metrics.
gatherDuration = "5s"[prometheus]
# Enable the Prometheus plugin.
enable = true
taoskeeper.toml
instanceId = 64# Listening host, supports IPv4/Ipv6, default is ""
host = ""
# Listen port, default is 6043
port = 6043# go pool size
gopoolsize = 50000# interval for metrics
RotationInterval = "15s"[tdengine]
host = "tdengine-adapter"
port = 6041
username = "root"
password = "taosdata"
usessl = false[metrics]
# metrics prefix in metrics names.
prefix = "taos"# export some tables that are not super table
tables = []# database for storing metrics data
[metrics.database]
name = "log"
# database options for db storing metrics data
[metrics.database.options]
vgroups = 1
buffer = 64
keep = 90
cachemodel = "both"[environment]
# Whether running in cgroup.
incgroup = false[log]
# The directory where log files are stored.
# path = "/var/log/taos"
level = "info"
# Number of log file rotations before deletion.
rotationCount = 30
# The number of days to retain log files.
keepDays = 30
# The maximum size of a log file before rotation.
rotationSize = "1GB"
# If set to true, log files will be compressed.
compress = false
# Minimum disk space to reserve. Log files will not be written if disk space falls below this limit.
reservedDiskSize = "1GB"
explorer.toml
# This is a automacically generated configuration file for Explorer in [TOML](https://toml.io/) format.
#
# Here is a full list of available options.# Explorer server port to listen on.
# Default is 6060.
#
port = 6060# IPv4 listen address.
# Default is 0.0.0.0
addr = "0.0.0.0"# IPv6 listen address.# ipv6 = "::1"# explorer server instance id
#
# The instanceId of each instance is unique on the host
# instanceId = 1# Explorer server log level.
# Default is "info"
#
# Deprecated: use log.level instead
log_level = "info"# All data files are stored in this directory
# data_dir = "/var/lib/taos/explorer" # Default for Linux
# data_dir = "C:\\TDengine\\data\\explorer" # Default for Windows# REST API endpoint to connect to the cluster.
# This configuration is also the target for data migration tasks.
#
# Default is "http://localhost:6041" - the default endpoint for REST API.
#
cluster = "http://tdengine-adapter:6041"# native endpoint to connect to the cluster.
# Default is disabled. To enable it, set it to the native API URL like "taos://localhost:6030" and uncomment it.
# If you enable it, you will get more performance for data migration tasks.
# If modify this config item, you must relogin to clear the cache in browser local storage.
#
# cluster_native = "taos://localhost:6030"# API endpoint for data replication/backup/data sources. No default option.
# Set it to API URL like "http://localhost:6050".
#
x_api = "http://localhost:6050"# Please set it to same as the "grpc" parameter used by taosX Service;
# If "grpc" parameter is not set explicitly in taosX service, please set it to the default grpc address of taosX
grpc = "http://localhost:6055"# CORS configuration switch, it allows cross-origin access
cors = true# cloud open api.
# cloud_open_api = "https://pre.ali.cloud.taosdata.com/openapi"# Enable ssl
# If the following two files exist, enable ssl protocol
#
[ssl]# SSL certificate
#
# certificate = "/path/to/ca.file" # on linux/macOS
# certificate = "C:\\path\\to\\ca.file" # on windows# SSL certificate key
#
# certificate_key = "/path/to/key.file" # on linux/macOS
# certificate_key = "C:\\path\\to\\key.file" # on windows# log configuration
[log]
# All log files are stored in this directory
#
# path = "/var/log/taos" # on linux/macOS
# path = "C:\\TDengine\\log" # on windows# log filter level
#
# level = "info"# Compress archived log files or not
#
# compress = false# The number of log files retained by the current explorer server instance in the `path` directory
#
# rotationCount = 30# Rotate when the log file reaches this size
#
# rotationSize = "1GB"# Log downgrade when the remaining disk space reaches this size, only logging `ERROR` level logs
#
# reservedDiskSize = "1GB"# The number of days log files are retained
#
# keepDays = 30# integrated with Grafana
[grafana]
# The token of the Grafana server, which is used to access the Grafana server.
#token = ""# The URL of the Grafana dashboard, which is used to display the monitoring data of the TDengine cluster.
# You can configure multiple Grafana dashboards.
[grafana.dashboards]
#TDengine3 = "http://localhost:3000/d/000000001/tdengine3"
#taosX = "http://localhost:3000/d/000000002/taosx"
5.部署 Swarm Stack(启动 TDengine+Nginx)
在 Manager 节点执行以下命令,通过docker stack deploy部署整个集群(tdengine-stack为 Stack 名称,自定义):
docker stack deploy -c docker-compose-swarm.yml tdengine-stack
6.验证 Swarm 集群部署结果
6.1 查看 Stack 服务状态(Manager 节点执行)
# 查看所有服务的运行状态(确保所有服务的REPLICAS列显示正常,如tdengine-db1为1/1,nginx-proxy为2/2)
docker stack ps tdengine-stackdocker stack ps --no-trunc tdengine-stack# 查看服务日志(如查看nginx-proxy日志)
docker service logs -f tdengine-stack_nginx-proxy# 查看TDengine服务日志(如查看db1的日志)
docker service logs -f tdengine-stack_tdengine-db1docker stack rm tdengine-stack
6.2 初始化 TDengine 集群(关键步骤)
Swarm 仅部署了 TDengine 容器,需手动初始化集群(与传统部署一致):
添加node集群节点
创建集群后,进入td1容器内,手动添加td2、td3节点。https://docs.taosdata.com/3.3.6/operation/deployment/#6-添加-dnode
[root@db1 tdengine-cluster]# docker exec -it 81067c315bd3 /bin/bash
root@db1:~# taos
Welcome to the TDengine Command Line Interface, Native Client Version:3.3.6.13
Copyright (c) 2025 by TDengine, all rights reserved.********************************* Tab Completion ************************************** The TDengine CLI supports tab completion for a variety of items, ** including database names, table names, function names and keywords. ** The full list of shortcut keys is as follows: ** [ TAB ] ...... complete the current word ** ...... if used on a blank line, display all supported commands ** [ Ctrl + A ] ...... move cursor to the st[A]rt of the line ** [ Ctrl + E ] ...... move cursor to the [E]nd of the line ** [ Ctrl + W ] ...... move cursor to the middle of the line ** [ Ctrl + L ] ...... clear the entire screen ** [ Ctrl + K ] ...... clear the screen after the cursor ** [ Ctrl + U ] ...... clear the screen before the cursor *****************************************************************************************Server is TDengine Community Edition, ver:3.3.6.13 and will never expire.taos> show dnodes;id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note |
=============================================================================================================================================================================1 | db1:6030 | 0 | 69 | ready | 2025-10-11 10:08:21.368 | 2025-10-11 10:08:21.318 | |
Query OK, 1 row(s) in set (0.002750s)taos> create dnode "db2:6030";
Create OK, 0 row(s) affected (0.021848s)taos> show dnodes;id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note |
=============================================================================================================================================================================1 | db1:6030 | 0 | 69 | ready | 2025-10-11 10:08:21.368 | 2025-10-11 10:08:21.318 | |2 | db2:6030 | 0 | 69 | ready | 2025-10-11 10:16:04.585 | 2025-10-11 10:09:44.840 | |
Query OK, 2 row(s) in set (0.001758s)taos> create dnode "db3:6030";
Create OK, 0 row(s) affected (0.006202s)taos> show dnodes;id | endpoint | vnodes | support_vnodes | status | create_time | reboot_time | note |
=============================================================================================================================================================================1 | db1:6030 | 0 | 69 | ready | 2025-10-11 10:08:21.368 | 2025-10-11 10:08:21.318 | |2 | db2:6030 | 0 | 69 | ready | 2025-10-11 10:16:04.585 | 2025-10-11 10:09:44.840 | |3 | db3:6030 | 0 | 69 | ready | 2025-10-11 10:16:12.039 | 2025-10-11 10:08:03.301 | |
Query OK, 3 row(s) in set (0.001927s)taos>
最后显示3个节点都在线。
6.3 集群访问测试
1.使用nginx代理请求
# 通过Nginx代理查询TDengine数据(认证信息:root:空密码,Base64编码为cm9vdDo=)
curl -X POST \http://192.168.0.1:6041/rest/sql/meter_db \ # 访问db1的6041端口(Nginx代理)-H 'Authorization: Basic cm9vdDo=' \-H 'Content-Type: application/json' \-d '{"sql": "select * from meter_data limit 2"}'
- 成功标志:返回正常查询结果,Nginx 日志显示请求转发到 TDengine 节点(如tdengine-db2:6041)。
2.访问explorer可视化页面
客户端只需访问Swarm 集群中任意节点的 6060端口(因 Swarm 内置负载均衡,访问 db1/db2/db3 的 6060 端口均可),示例:
https://docs.taosdata.com/3.3.6/reference/components/explorer/
首次使用需要先注册。
3.使用dbeaver等工具访问
6.4 测试高可用(模拟故障)
- 停止一个 TDengine 节点(如 db1 的容器):
docker stop $(docker ps -q --filter name=taosd-db1)
- Swarm 会自动重启该容器(约 10 秒内),可通过
docker stack ps tdengine-stack
查看重启状态; - 停止一个 Nginx 副本(如其中一个 nginx-proxy 容器):
docker stop $(docker ps -q --filter name=nginx-tdengine | head -n 1)
- Swarm 会自动在其他节点启动新的 Nginx 副本,客户端访问不受影响。
四、关键注意事项
- Swarm Manager 高可用(生产环境必配)
- 单 Manager 节点存在单点故障风险(Manager 挂掉后无法管理集群),需扩展为 3 个 Manager 节点(db1、db2、db3 均为 Manager):
# 在现有Manager(db1)上生成Manager加入令牌
docker swarm join-token manager
# 在db2和db3上执行生成的命令,加入为Manager节点
docker swarm join --token SWMTKN-1-xxxxxx-xxxxxx 192.168.0.1:2377
- TDengine 数据安全
- 数据备份:由于td的特殊,仅仅备份映射的数据目录不能直接用。避免数据丢失,可以使用自带的taosdump 工具手动备份。https://docs.taosdata.com/3.3.6/reference/tools/taosdump/;
- Nginx 配置更新
- 在 Swarm 中更新 Nginx 配置时,需先修改配置文件,然后重启 Nginx 服务:
# 重启Nginx服务(Swarm会滚动更新副本,无 downtime)
docker service update --force tdengine-stack_nginx-proxy
- 性能优化
- Swarm 网络:overlay 网络性能略低于 host 网络,若 TDengine 节点间通信频繁(如数据同步),可在 TDengine 服务中添加network_mode: host(需确保宿主机端口不冲突);
- Nginx 副本数:根据并发量调整 Nginx 副本数(如高并发场景设为 3 个),执行:
docker service scale tdengine-stack_nginx-proxy=3
- 版本升级
- TDengine 版本升级时,通过 Swarm 滚动更新实现无 downtime:
# 更新TDengine镜像版本(如升级到3.3.7)
docker service update --image tdengine/tdengine:3.3.7.0 tdengine-stack_tdengine-db1
docker service update --image tdengine/tdengine:3.3.7.0 tdengine-stack_tdengine-db2
docker service update --image tdengine/tdengine:3.3.7.0 tdengine-stack_tdengine-db3
五、监控
可以与prometheus、grafana等集成。
https://docs.taosdata.com/operation/monitor/
https://docs.taosdata.com/3.3.6/third-party/visual/grafana/