官方文档

Jaegertracing

Jaeger简介

Jaeger:开源的端到端分布式跟踪,监视复杂的分布式系统中的事务并进行故障排除。 下图对比了常用的开源全链路追踪方案,目前SkyWalking和Pinpoint使用比较多,Jaeger相比客户端支持语言比较多,特别是对C++的支持,所以这次选择测试下。

Jaeger解决的问题

  • 分布式事务监控
  • 性能和延迟优化
  • 根本原因分析
  • 服务依赖性分析
  • 分布式上下文传播

Jaeger架构图

Jaeger组件

  • Jaeger Agent,负责和客户端通信,把收集到的追踪信息上报个收集器 Jaeger Collector
  • Jaeger Colletor把收集到的数据存入数据库或者其它存储器
  • Jaeger Query 负责对追踪数据进行查询
  • Jaeger Ingester 是一个从Kafka主题读取并写入另一个存储后端(Cassandra、Elasticsearch)的服务
  • Jaeger UI负责用户交互

Jaeger端口统计

Agent 5775 UDP协议,接收兼容zipkin的协议数据 6831 UDP协议,接收兼容jaeger的兼容协议 6832 UDP协议,接收jaeger的二进制协议 5778 HTTP协议,数据量大不建议使用

Collector 14267 tcp agent发送jaeger.thrift格式数据 14250 tcp agent发送proto格式数据(背后gRPC) 14268 http 直接接受客户端数据 14269 http 健康检查

Query 16686 http jaeger的前端,放给用户的接口 16687 http 健康检查

Jaeger部署

1.创建命名空间

1
[root@VM-0-123-centos jaeger]# kubectl create namespace jaeger 

2.部署Jaeger-Operator Jaeger Operator:Jaeger Operator for Kubernetes简化了在Kubernetes上的部署和运行Jaeger。 Jaeger Operator是Kubernetes operator的实现。操作员是一种软件,可以减轻运行另一软件的操作复杂性。从技术上讲,操作员是打包,部署和管理Kubernetes应用程序的一种方法。 Jaeger Operator版本跟踪Jaeger组件(查询,收集器,代理)的一种版本。发行新版本的Jaeger组件时,将发行新版本的操作员,该操作员了解如何将先前版本的运行实例升级到新版本。

1
2
3
4
5
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/crds/jaegertracing.io_jaegers_crd.yaml 
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/service_account.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/role_binding.yaml
[root@VM-0-123-centos jaeger]# kubectl create -n jaeger -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/master/deploy/operator.yaml

查看状态

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 0/1 Terminating 0 9d

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d

NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d

3.创建jaeger实例 创建jaeger.yaml文件,配置ES集群及限制Deployment/simple-prod-collector容器的cpu和内存使用大小。最大数量可以起10个pod。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-prod
spec:
strategy: production
storage:
type: elasticsearch
options:
es:
server-urls: http://10.0.16.3:9200
index-prefix: zhjt
collector:
maxReplicas: 10
resources:
limits:
cpu: 500m
memory: 512Mi
1
2
[root@VM-0-123-centos jaeger]# kubectl apply -f  jaeger.yaml  -n jaeger
jaeger.jaegertracing.io/simple-prod created

列出jaeger对象 备注:貌似使用官网all in one的例子状态是正常的Running,这里状态虽然是Failed,但是不影响使用。

1
2
3
[root@VM-0-123-centos jaeger]# kubectl get jaegers -n jaeger
NAME STATUS VERSION STRATEGY STORAGE AGE
simple-prod Failed 1.22.0 production elasticsearch 9d

获取pod名字

1
2
3
4
[root@VM-0-123-centos jaeger]# kubectl get pods -l app.kubernetes.io/instance=simple-prod -n jaeger
NAME READY STATUS RESTARTS AGE
simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 9d
simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 9d

获取pod日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
[root@VM-0-123-centos jaeger]# kubectl  logs simple-prod-query-85689b7bbd-g5jw9 jaeger-agent  -n jaeger
2021/04/28 04:55:34 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585734.2081811,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585734.2082183,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585734.2083232,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585734.2083883,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":14271"}
{"level":"info","ts":1619585734.2084124,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:14271","health-status":"unavailable"}
{"level":"info","ts":1619585734.2089527,"caller":"grpc/builder.go:70","msg":"Agent requested insecure grpc connection to collector(s)"}
{"level":"info","ts":1619585734.2089992,"caller":"grpc@v1.29.1/clientconn.go:243","msg":"parsed scheme: \"dns\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.21038,"caller":"command-line-arguments/main.go:84","msg":"Starting agent"}
{"level":"info","ts":1619585734.2104166,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585734.2108943,"caller":"grpc/builder.go:108","msg":"Checking connection to collector"}
{"level":"info","ts":1619585734.210908,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"IDLE"}
{"level":"info","ts":1619585734.211061,"caller":"app/agent.go:69","msg":"Starting jaeger-agent HTTP server","http-port":5778}
{"level":"info","ts":1619585734.3344934,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.88:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345578,"caller":"grpc@v1.29.1/clientconn.go:667","msg":"ClientConn switching balancer to \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3345697,"caller":"grpc@v1.29.1/clientconn.go:682","msg":"Channel switches to new LB policy \"round_robin\"","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3346283,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.33467,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.334736,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3347983,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619585734.335669,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357751,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0002f5ea0:{{172.20.0.88:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.3357947,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619585734.335807,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
{"level":"info","ts":1619592172.4516647,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517512,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.88:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517596,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517772,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4517884,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"warn","ts":1619592172.4523218,"caller":"grpc@v1.29.1/clientconn.go:1275","msg":"grpc: addrConn.createTransport failed to connect to {172.20.0.88:14250 <nil> 0 <nil>}. Err: connection error: desc = \"transport: Error while dialing dial tcp 172.20.0.88:14250: connect: connection refused\". Reconnecting...","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523551,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.452386,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to TRANSIENT_FAILURE","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.4523947,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"TRANSIENT_FAILURE"}
{"level":"info","ts":1619592172.6118224,"caller":"grpc@v1.29.1/resolver_conn_wrapper.go:143","msg":"ccResolverWrapper: sending update to cc: {[{172.20.0.178:14250 <nil> 0 <nil>}] <nil> <nil>}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118581,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6118758,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to SHUTDOWN","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.611892,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to CONNECTING","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6119003,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"CONNECTING"}
{"level":"info","ts":1619592172.6119049,"caller":"grpc@v1.29.1/clientconn.go:1193","msg":"Subchannel picks a new address \"172.20.0.178:14250\" to connect","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.612726,"caller":"grpc@v1.29.1/clientconn.go:1056","msg":"Subchannel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127572,"caller":"base/balancer.go:200","msg":"roundrobinPicker: newPicker called with info: {map[0xc0003df970:{{172.20.0.178:14250 <nil> 0 <nil>}}]}","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127682,"caller":"grpc@v1.29.1/clientconn.go:417","msg":"Channel Connectivity change to READY","system":"grpc","grpc_log":true}
{"level":"info","ts":1619592172.6127849,"caller":"grpc/builder.go:119","msg":"Agent collector connection state change","dialTarget":"dns:///simple-prod-collector-headless.jaeger.svc:14250","status":"READY"}
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@VM-0-123-centos jaeger]# kubectl  logs simple-prod-query-85689b7bbd-g5jw9 jaeger-query   -n jaeger
2021/04/28 04:55:29 maxprocs: Leaving GOMAXPROCS=4: CPU quota undefined
{"level":"info","ts":1619585729.8951077,"caller":"flags/service.go:117","msg":"Mounting metrics handler on admin server","route":"/metrics"}
{"level":"info","ts":1619585729.8951416,"caller":"flags/service.go:123","msg":"Mounting expvar handler on admin server","route":"/debug/vars"}
{"level":"info","ts":1619585729.8952546,"caller":"flags/admin.go:105","msg":"Mounting health check on admin server","route":"/"}
{"level":"info","ts":1619585729.8953054,"caller":"flags/admin.go:111","msg":"Starting admin HTTP server","http-addr":":16687"}
{"level":"info","ts":1619585729.8953238,"caller":"flags/admin.go:97","msg":"Admin server started","http.host-port":"[::]:16687","health-status":"unavailable"}
{"level":"info","ts":1619585729.9169888,"caller":"config/config.go:183","msg":"Elasticsearch detected","version":7}
{"level":"info","ts":1619585729.9174955,"caller":"app/static_handler.go:181","msg":"UI config path not provided, config file will not be watched"}
{"level":"info","ts":1619585729.9175768,"caller":"app/server.go:170","msg":"Query server started"}
{"level":"info","ts":1619585729.9175944,"caller":"healthcheck/handler.go:128","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1619585729.9176183,"caller":"app/server.go:249","msg":"Starting GRPC server","port":16685,"addr":":16685"}
{"level":"info","ts":1619585729.9176335,"caller":"app/server.go:230","msg":"Starting HTTP server","port":16686,"addr":":16686"}

4.查看jaeger资源

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
[root@VM-0-123-centos jaeger]# kubectl get all -n jaeger
NAME READY STATUS RESTARTS AGE
pod/jaeger-operator-6ff67bdd4b-4nffk 1/1 Running 0 14d
pod/simple-prod-collector-59fc47bf5c-h26mq 1/1 Running 0 8d
pod/simple-prod-query-85689b7bbd-g5jw9 2/2 Running 0 8d

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/jaeger-operator-metrics ClusterIP 172.20.253.138 <none> 8383/TCP,8686/TCP 14d
service/simple-prod-collector ClusterIP 172.20.255.184 <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-collector-headless ClusterIP None <none> 9411/TCP,14250/TCP,14267/TCP,14268/TCP 8d
service/simple-prod-query ClusterIP 172.20.254.102 <none> 16686/TCP 8d

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/jaeger-operator 1/1 1 1 14d
deployment.apps/simple-prod-collector 1/1 1 1 8d
deployment.apps/simple-prod-query 1/1 1 1 8d

NAME DESIRED CURRENT READY AGE
replicaset.apps/jaeger-operator-6ff67bdd4b 1 1 1 14d
replicaset.apps/simple-prod-collector-59fc47bf5c 1 1 1 8d
replicaset.apps/simple-prod-query-85689b7bbd 1 1 1 8d

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
horizontalpodautoscaler.autoscaling/simple-prod-collector Deployment/simple-prod-collector 1457m/90, 137m/90 1 10 1 8d

如果流量大需要减小es压力,可以接入kafka集群,修改jaeger.yaml文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simple-streaming
spec:
strategy: streaming
collector:
options:
kafka:
producer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
options:
kafka:
consumer:
topic: jaeger-spans
brokers: my-cluster-kafka-brokers.kafka:9092 #修改为kafka地址
ingester:
deadlockInterval: 5s
storage:
type: elasticsearch
options:
es:
server-urls: http://elasticsearch:9200 #修改为ES地址

5.agent部署

jaeger client的一个代理程序,client将收集到的调用链数据发给agent,然后由agent发给collector。由于使用的udp协议,一般部署在靠近client的位置。

agent有多种安装方式

1).docker安装

下载:jaegertracing/jaeger-agent Tags (docker.com)

docker run -d -p 6831:6831/udp -p 6832:6832/udp -p 5778:5778/tcp jaegertracing/jaeger-agent:1.12 –reporter.grpc.host-port=xx.xx.xx.xx:14250

2).k8s安装又分两种

sidecar方式

daemonset方式

参考:Operator for Kubernetes — Jaeger documentation (jaegertracing.io)

3).二进制安装

下载:Jaeger – Download Jaeger (jaegertracing.io)

nohup ./jaeger-agent –collector.host-port=xxxx:14267 1>1.log 2>2.log &