当前位置: 首页 > news >正文

响应式布局网站开发磁县企业做网站推广

响应式布局网站开发,磁县企业做网站推广,珠海网站建设品牌策划,百度贴吧网页版登录写在前面 很常见的集群运维场景#xff0c;整理分享博文内容为 K8s 集群高可用 master 节点故障如何恢复的过程理解不足小伙伴帮忙指正 不必太纠结于当下#xff0c;也不必太忧虑未来#xff0c;当你经历过一些事情的时候#xff0c;眼前的风景已经和从前不一样了。——村上…写在前面 很常见的集群运维场景整理分享博文内容为 K8s 集群高可用 master 节点故障如何恢复的过程理解不足小伙伴帮忙指正 不必太纠结于当下也不必太忧虑未来当你经历过一些事情的时候眼前的风景已经和从前不一样了。——村上春树 遇到了什么问题 今天做实验发现 集群其中一个 master 节点上的 etcd 和 apiserver 都挂掉了 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get nodes NAME STATUS ROLES AGE VERSION vms100.liruilongs.github.io Ready control-plane 415d v1.25.1 vms101.liruilongs.github.io Ready control-plane 415d v1.25.1 vms102.liruilongs.github.io Ready control-plane 415d v1.25.1 vms103.liruilongs.github.io Ready none 415d v1.25.1 vms105.liruilongs.github.io Ready none 415d v1.25.1 vms106.liruilongs.github.io Ready none 415d v1.25.1 ┌──[rootvms100.liruilongs.github.io]-[~] └─$vms100.liruilongs.github.io 这个节点 上的 apiserver 和 etcd ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep apiserver kube-system kube-apiserver-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1448 (3m23s ago) 415d 192.168.26.100 vms100.liruilongs.github.io none none kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (3h18m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (3h18m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep etcd kube-system etcd-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1244 (3m6s ago) 415d 192.168.26.100 vms100.liruilongs.github.io none none kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (3h18m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (3h18m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none查看 keepalived 对应的静态Pod运行正常 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep keep kube-system keepalived-vms100.liruilongs.github.io 1/1 Running 63 (3h50m ago) 415d 192.168.26.100 vms100.liruilongs.github.io none none kube-system keepalived-vms101.liruilongs.github.io 1/1 Running 54 (3h51m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system keepalived-vms102.liruilongs.github.io 1/1 Running 60 (3h51m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$所以可能是 etcd 数据不同步或者什么原因 导致etcd 挂掉了。因为 每个 master 节点的 apiserver 只和 本节点的 etcd 进行 通信(每个 etcd 的写请求会转发到 etcd 的领导节点)etcd 挂掉apiserver 无法提供能力所以也会挂掉。 通过 etcdctl 可以发现 vms100.liruilongs.github.io 上的 etcd 彻底死掉了 ┌──[rootvms100.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \member list -w table Error: dial tcp 127.0.0.1:2379: connect: connection refused如何排查 这里我们换一个 etcd 节点 执行 命令 查看 etcd 集群成员 ┌──[rootvms100.liruilongs.github.io]-[~] └─$ssh vms101.liruilongs.github.io Last login: Sat Mar 2 09:52:01 2024 from 192.168.26.100 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \member list -w table ------------------------------------------------------------------------------------------------------------------ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | ------------------------------------------------------------------------------------------------------------------ | ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 | | 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 | | b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 | ------------------------------------------------------------------------------------------------------------------查看节点状态 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \endpoint status --cluster -w table Failed to get the status of endpoint https://192.168.26.100:2379 (context deadline exceeded) --------------------------------------------------------------------------------------------------- | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | --------------------------------------------------------------------------------------------------- | https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 88 MB | false | 603 | 22208417 | | https://192.168.26.102:2379 | b8cb9f66c2e63b91 | 3.5.4 | 88 MB | true | 603 | 22208417 | ---------------------------------------------------------------------------------------------------确定 ETCD 节点故障 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 \--cert/etc/kubernetes/pki/etcd/server.crt \--key/etc/kubernetes/pki/etcd/server.key \--cacert/etc/kubernetes/pki/etcd/ca.crt \endpoint health --cluster -w table https://192.168.26.101:2379 is healthy: successfully committed proposal: took 3.753357ms https://192.168.26.102:2379 is healthy: successfully committed proposal: took 2.989943ms https://192.168.26.100:2379 is unhealthy: failed to connect: dial tcp 192.168.26.100:2379: connect: connection refused Error: unhealthy cluster查看 etcd 的容器日志 ┌──[rootvms100.liruilongs.github.io]-[~] └─$docker ps -a | grep etcd 0f2f98ebf8c3 a8a176a5d5d6 etcd --advertise-cl… 4 minutes ago Exited (2) 4 minutes ago k8s_etcd_etcd-vms100.liruilongs.github.io_kube-system_e8c17bb99f9bd8119cdd769556041e18_1252 a4b39d16a753 registry.aliyuncs.com/google_containers/pause:3.8 /pause 4 hours ago Up 4 hours k8s_POD_etcd-vms100.liruilongs.github.io_kube-system_e8c17bb99f9bd8119cdd769556041e18_54 ┌──[rootvms100.liruilongs.github.io]-[~] └─$docker logs 0f2f98ebf8c3 {level:info,ts:2024-03-16T14:46:54.644Z,caller:etcdmain/etcd.go:73,msg:Running: ,args:[etcd,--advertise-client-urlshttps://192.168.26.100:2379,--cert-file/etc/kubernetes/pki/etcd/server.crt,--client-cert-authtrue,--data-dir/var/lib/etcd,--experimental-initial-corrupt-checktrue,--experimental-watch-progress-notify-interval5s,--initial-advertise-peer-urlshttps://192.168.26.100:2380,--initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,--key-file/etc/kubernetes/pki/etcd/server.key,--listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.100:2379,--listen-metrics-urlshttp://127.0.0.1:2381,--listen-peer-urlshttps://192.168.26.100:2380,--namevms100.liruilongs.github.io,--peer-cert-file/etc/kubernetes/pki/etcd/peer.crt,--peer-client-cert-authtrue,--peer-key-file/etc/kubernetes/pki/etcd/peer.key,--peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt,--snapshot-count10000,--trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt]} {level:info,ts:2024-03-16T14:46:54.645Z,caller:etcdmain/etcd.go:116,msg:server has been already initialized,data-dir:/var/lib/etcd,dir-type:member} {level:info,ts:2024-03-16T14:46:54.645Z,caller:embed/etcd.go:131,msg:configuring peer listeners,listen-peer-urls:[https://192.168.26.100:2380]} {level:info,ts:2024-03-16T14:46:54.645Z,caller:embed/etcd.go:479,msg:starting with peer TLS,tls-info:cert /etc/kubernetes/pki/etcd/peer.crt, key /etc/kubernetes/pki/etcd/peer.key, client-cert, client-key, trusted-ca /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth true, crl-file ,cipher-suites:[]} {level:info,ts:2024-03-16T14:46:54.645Z,caller:embed/etcd.go:139,msg:configuring client listeners,listen-client-urls:[https://127.0.0.1:2379,https://192.168.26.100:2379]} {level:info,ts:2024-03-16T14:46:54.645Z,caller:embed/etcd.go:308,msg:starting an etcd server,etcd-version:3.5.4,git-sha:08407ff76,go-version:go1.16.15,go-os:linux,go-arch:amd64,max-cpu-set:4,max-cpu-available:4,member-initialized:true,name:vms100.liruilongs.github.io,data-dir:/var/lib/etcd,wal-dir:,wal-dir-dedicated:,member-dir:/var/lib/etcd/member,force-new-cluster:false,heartbeat-interval:100ms,election-timeout:1s,initial-election-tick-advance:true,snapshot-count:10000,snapshot-catchup-entries:5000,initial-advertise-peer-urls:[https://192.168.26.100:2380],listen-peer-urls:[https://192.168.26.100:2380],advertise-client-urls:[https://192.168.26.100:2379],listen-client-urls:[https://127.0.0.1:2379,https://192.168.26.100:2379],listen-metrics-urls:[http://127.0.0.1:2381],cors:[*],host-whitelist:[*],initial-cluster:,initial-cluster-state:new,initial-cluster-token:,quota-size-bytes:2147483648,pre-vote:true,initial-corrupt-check:true,corrupt-check-time-interval:0s,auto-compaction-mode:periodic,auto-compaction-retention:0s,auto-compaction-interval:0s,discovery-url:,discovery-proxy:,downgrade-check-interval:5s} panic: freepages: failed to get all reachable pages (page 7744: multiple references)goroutine 109 [running]: go.etcd.io/bbolt.(*DB).freepages.func2(0xc00009c480)/go/pkg/mod/go.etcd.io/bboltv1.3.6/db.go:1056 0xe9 created by go.etcd.io/bbolt.(*DB).freepages/go/pkg/mod/go.etcd.io/bboltv1.3.6/db.go:1054 0x1cd ┌──[rootvms100.liruilongs.github.io]-[~] └─$如何解决 这里最快的办法是重新同步一下这个节点的数据即把这个故障节点移出 集群清理完故障节点旧数据在重新添加操作步骤 清理数据目录移动静态Pod 的yaml 文件停止故障节点服务然后删除etcd数据目录。移除故障节点使用member remove命令剔除错误节点,可以在健康的节点执行命令。添加节点使用member add命令添加故障节点。重新启动移动故障节点yaml文件进行启动 注: 静态Pod 通过加载指定目录的 yaml 文件来调度kubelet 会定时扫描删除移动 yaml 文件静态 Pod 会自动停止,同理。添加 yaml 文件会自动创建静态 Pod 移动静态Pod 的yaml 文件 ┌──[rootvms100.liruilongs.github.io]-[~] └─$mv /etc/kubernetes/manifests/{etcd.yaml,kube-apiserver.yaml} /tmp/删除etcd数据目录 ┌──[rootvms100.liruilongs.github.io]-[~] └─$rm -rf /var/lib/etcd/*确认节点 的 etcd 和 apiservier 都已经停止 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep apiserver kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (4h15m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (4h15m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep etcd kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (4h15m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (4h15m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$获取故障节点 ID,下面的操作我们在健康的 etcd 节点执行或者可以修改 --endpoints ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://192.168.26.101:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member list -w table ------------------------------------------------------------------------------------------------------------------ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | ------------------------------------------------------------------------------------------------------------------ | ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 | | 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 | | b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 | ------------------------------------------------------------------------------------------------------------------移除故障节点 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member remove ee392e5273e89e2 Member ee392e5273e89e2 removed from cluster 4816f346663d82a7重新添加 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member add vms100.liruilongs.github.io --peer-urlshttps://192.168.26.100:2380 Member 456f71fdc1ad9917 added to cluster 4816f346663d82a7ETCD_NAMEvms100.liruilongs.github.io ETCD_INITIAL_CLUSTERvms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380,vms102.liruilongs.github.iohttps://192.168.26.102:2380 ETCD_INITIAL_ADVERTISE_PEER_URLShttps://192.168.26.100:2380 ETCD_INITIAL_CLUSTER_STATEexisting回到 100 节点机器,移动 Yaml 文件恢复节点 ┌──[rootvms100.liruilongs.github.io]-[~] └─$mv /tmp/{etcd.yaml,kube-apiserver.yaml} /etc/kubernetes/manifests/确认 Pod 状态 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep etcd kube-system etcd-vms100.liruilongs.github.io 1/1 Running 0 16s 192.168.26.100 vms100.liruilongs.github.io none none kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (4h32m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (4h32m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep apiserver kube-system kube-apiserver-vms100.liruilongs.github.io 1/1 Running 0 24s 192.168.26.100 vms100.liruilongs.github.io none none kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (4h32m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (4h32m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$查看 etcd 集群状态 ┌──[rootvms101.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member list -w table -------------------------------------------------------------------------------------------------------------------- | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | -------------------------------------------------------------------------------------------------------------------- | 54952f3b494c0286 | unstarted | | https://192.168.26.100:2380 | | | 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 | | b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 | --------------------------------------------------------------------------------------------------------------------这里我们发现 新添加的节点状态不正常一直是 unstarted 我们在 故障节点执行 etcd 命令。发现故障节点并没有添加到集群而是作为一个单节点运行。 ┌──[rootvms100.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member list -w table ----------------------------------------------------------------------------------------------------------------- | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | ----------------------------------------------------------------------------------------------------------------- | ee392e5273e89e2 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 | ----------------------------------------------------------------------------------------------------------------- ┌──[rootvms100.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt endpoint status --cluster -w table -------------------------------------------------------------------------------------------------- | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | -------------------------------------------------------------------------------------------------- | https://192.168.26.100:2379 | ee392e5273e89e2 | 3.5.4 | 815 kB | true | 2 | 2261 | -------------------------------------------------------------------------------------------------- ┌──[rootvms100.liruilongs.github.io]-[~] └─$也没有同步 当前集群的数据 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide --serverhttps://vms100.liruilongs.github.io:6443 No resources found遇到这种情况大部分原因是 某个节点的 etcd配置文件的问题我的这个问题是 故障节点的 etcd 配置文件没有集群信息相关配置所以这里把集群相关配置写入配置 原本的配置文件 ┌──[rootvms100.liruilongs.github.io]-[~] └─$cat /etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata:annotations:kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.100:2379creationTimestamp: nulllabels:component: etcdtier: control-planename: etcdnamespace: kube-system spec:containers:- command:- etcd- --advertise-client-urlshttps://192.168.26.100:2379- --cert-file/etc/kubernetes/pki/etcd/server.crt- --client-cert-authtrue- --data-dir/var/lib/etcd- --experimental-initial-corrupt-checktrue- --experimental-watch-progress-notify-interval5s- --initial-advertise-peer-urlshttps://192.168.26.100:2380- --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380- --key-file/etc/kubernetes/pki/etcd/server.key- --listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.100:2379- --listen-metrics-urlshttp://127.0.0.1:2381- --listen-peer-urlshttps://192.168.26.100:2380- --namevms100.liruilongs.github.io- --peer-cert-file/etc/kubernetes/pki/etcd/peer.crt- --peer-client-cert-authtrue- --peer-key-file/etc/kubernetes/pki/etcd/peer.key- --peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt- --snapshot-count10000- --trusted-ca-file/etc/kubernetes/pki/etcd/ca.crtimage: registry.aliyuncs.com/google_containers/etcd:3.5.4-0 。。。。。。。。。。。。。。。。集群信息不全的添加后的配置文件 ┌──[rootvms100.liruilongs.github.io]-[~] └─$cat /etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata:annotations:kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.100:2379creationTimestamp: nulllabels:component: etcdtier: control-planename: etcdnamespace: kube-system spec:containers:- command:- etcd- --advertise-client-urlshttps://192.168.26.100:2379- --cert-file/etc/kubernetes/pki/etcd/server.crt- --client-cert-authtrue- --data-dir/var/lib/etcd- --experimental-initial-corrupt-checktrue- --experimental-watch-progress-notify-interval5s- --initial-advertise-peer-urlshttps://192.168.26.100:2380- --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380- --initial-cluster-stateexisting- --key-file/etc/kubernetes/pki/etcd/server.key- --listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.100:2379- --listen-metrics-urlshttp://127.0.0.1:2381- --listen-peer-urlshttps://192.168.26.100:2380- --namevms100.liruilongs.github.io- --peer-cert-file/etc/kubernetes/pki/etcd/peer.crt- --peer-client-cert-authtrue- --peer-key-file/etc/kubernetes/pki/etcd/peer.key- --peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt- --snapshot-count10000- --trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt然后我们以上面相同的方式从新恢复一次发现节点直接没有起来 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep apiserver kube-system kube-apiserver-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 1 (18s ago) 39s 192.168.26.100 vms100.liruilongs.github.io none none kube-system kube-apiserver-vms101.liruilongs.github.io 1/1 Running 272 (5h29m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system kube-apiserver-vms102.liruilongs.github.io 1/1 Running 246 (5h29m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl get pod -A -o wide | grep etcd kube-system etcd-vms100.liruilongs.github.io 0/1 CrashLoopBackOff 3 (21s ago) 53s 192.168.26.100 vms100.liruilongs.github.io none none kube-system etcd-vms101.liruilongs.github.io 1/1 Running 167 (5h29m ago) 415d 192.168.26.101 vms101.liruilongs.github.io none none kube-system etcd-vms102.liruilongs.github.io 1/1 Running 173 (5h29m ago) 415d 192.168.26.102 vms102.liruilongs.github.io none none查看日志 ┌──[rootvms100.liruilongs.github.io]-[~] └─$kubectl logs etcd-vms100.liruilongs.github.io -n kube-system ............................. {level:fatal,ts:2024-03-16T16:25:19.981Z,caller:etcdmain/etcd.go:204,msg:discovery failed,error:error validating peerURLs {ClusterID:4816f346663d82a7 Members:[{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} {ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} {ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal,stacktrace:go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225}根据日志信息可以看到有用的信息 RemovedMemberIDs:[]}: member count is unequal ,成员数量不相等,在分析日志 {level: info,ts: 2024-03-16T16:25:19.961Z,caller: etcdmain/etcd.go:73,msg: Running: ,args: [etcd,--advertise-client-urlshttps://192.168.26.100:2379,--cert-file/etc/kubernetes/pki/etcd/server.crt,--client-cert-authtrue,--data-dir/var/lib/etcd,--experimental-initial-corrupt-checktrue,--experimental-watch-progress-notify-interval5s,--initial-advertise-peer-urlshttps://192.168.26.100:2380,--initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380,--initial-cluster-stateexisting,--key-file/etc/kubernetes/pki/etcd/server.key,--listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.100:2379,--listen-metrics-urlshttp://127.0.0.1:2381,--listen-peer-urlshttps://192.168.26.100:2380,--namevms100.liruilongs.github.io,--peer-cert-file/etc/kubernetes/pki/etcd/peer.crt,--peer-client-cert-authtrue,--peer-key-file/etc/kubernetes/pki/etcd/peer.key,--peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt,--snapshot-count10000,--trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt] } .............................................................................. {level: warn,ts: 2024-03-16T16:25:19.981Z,caller: etcdmain/etcd.go:146,msg: failed to start etcd,error: error validating peerURLs {ClusterID:4816f346663d82a7 Members:[{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} {ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} {ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal } {level: fatal,ts: 2024-03-16T16:25:19.981Z,caller: etcdmain/etcd.go:204,msg: discovery failed,error: error validating peerURLs {ClusterID:4816f346663d82a7 Members:[{ID:b8cb9f66c2e63b91 RaftAttributes:{PeerURLs:[https://192.168.26.102:2380] IsLearner:false} Attributes:{Name:vms102.liruilongs.github.io ClientURLs:[https://192.168.26.102:2379]}} {ID:3fbbbed942c51f7b RaftAttributes:{PeerURLs:[https://192.168.26.100:2380] IsLearner:false} Attributes:{Name: ClientURLs:[]}} {ID:70059e836d19883d RaftAttributes:{PeerURLs:[https://192.168.26.101:2380] IsLearner:false} Attributes:{Name:vms101.liruilongs.github.io ClientURLs:[https://192.168.26.101:2379]}}] RemovedMemberIDs:[]}: member count is unequal,stacktrace: go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/etcdmain/main.go:40\nmain.main\n\t/go/src/go.etcd.io/etcd/release/etcd/server/main.go:32\nruntime.main\n\t/go/gos/go1.16.15/src/runtime/proc.go:225 }可以看到它提示 可能错误与 vms102.liruilongs.github.io 节点相关 然后我们看一下 vms102.liruilongs.github.io 的配置文件 ┌──[rootvms102.liruilongs.github.io]-[~] └─$cat /etc/kubernetes/manifests/etcd.yaml apiVersion: v1 kind: Pod metadata:annotations:kubeadm.kubernetes.io/etcd.advertise-client-urls: https://192.168.26.102:2379creationTimestamp: nulllabels:component: etcdtier: control-planename: etcdnamespace: kube-system spec:containers:- command:- etcd- --advertise-client-urlshttps://192.168.26.102:2379- --cert-file/etc/kubernetes/pki/etcd/server.crt- --client-cert-authtrue- --data-dir/var/lib/etcd- --experimental-initial-corrupt-checktrue- --experimental-watch-progress-notify-interval5s- --initial-advertise-peer-urlshttps://192.168.26.102:2380- --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms102.liruilongs.github.iohttps://192.168.26.102:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380- --initial-cluster-stateexisting- --key-file/etc/kubernetes/pki/etcd/server.key- --listen-client-urlshttps://127.0.0.1:2379,https://192.168.26.102:2379- --listen-metrics-urlshttp://127.0.0.1:2381- --listen-peer-urlshttps://192.168.26.102:2380- --namevms102.liruilongs.github.io- --peer-cert-file/etc/kubernetes/pki/etcd/peer.crt- --peer-client-cert-authtrue- --peer-key-file/etc/kubernetes/pki/etcd/peer.key- --peer-trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt- --snapshot-count10000- --trusted-ca-file/etc/kubernetes/pki/etcd/ca.crt通过配置文件比对可以发现之前配置的故障节点的配置任然有问题少了一个vms102.liruilongs.github.iohttps://192.168.26.102:2380节点信息。 --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380, --initial-clustervms100.liruilongs.github.iohttps://192.168.26.100:2380,vms102.liruilongs.github.iohttps://192.168.26.102:2380,vms101.liruilongs.github.iohttps://192.168.26.101:2380修改完配置按照上面相同的流程重新恢复节点 节点恢复 通过 etcdctl 命令检查 ┌──[rootvms100.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt member list -w table ------------------------------------------------------------------------------------------------------------------ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | ------------------------------------------------------------------------------------------------------------------ | 70059e836d19883d | started | vms101.liruilongs.github.io | https://192.168.26.101:2380 | https://192.168.26.101:2379 | | ac5f6045dbe477b3 | started | vms100.liruilongs.github.io | https://192.168.26.100:2380 | https://192.168.26.100:2379 | | b8cb9f66c2e63b91 | started | vms102.liruilongs.github.io | https://192.168.26.102:2380 | https://192.168.26.102:2379 | ------------------------------------------------------------------------------------------------------------------┌──[rootvms100.liruilongs.github.io]-[~] └─$ETCDCTL_API3 etcdctl --endpoints https://127.0.0.1:2379 --cert/etc/kubernetes/pki/etcd/server.crt --key/etc/kubernetes/pki/etcd/server.key --cacert/etc/kubernetes/pki/etcd/ca.crt endpoint status --cluster -w table --------------------------------------------------------------------------------------------------- | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX | --------------------------------------------------------------------------------------------------- | https://192.168.26.101:2379 | 70059e836d19883d | 3.5.4 | 88 MB | false | 603 | 22227327 | | https://192.168.26.100:2379 | ac5f6045dbe477b3 | 3.5.4 | 88 MB | false | 603 | 22227327 | | https://192.168.26.102:2379 | b8cb9f66c2e63b91 | 3.5.4 | 88 MB | true | 603 | 22227327 | --------------------------------------------------------------------------------------------------- ┌──[rootvms100.liruilongs.github.io]-[~] └─$故障节点恢复在实际的操作中添加完节点我们需要确认故障节点的配置文件是否是正确的配置文件 © 2018-2024 liruilongergmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)
http://www.hkea.cn/news/14276913/

相关文章:

  • 青海保险网站建设公司亳州网站开发
  • 网站设计模板免费建站建立组词
  • 聊城市公司网站建站墨子网站建设
  • 网络网站制作技巧wamp wordpress 安装
  • 手机微信网站怎么做的建工网校app免费下载
  • 电子商务网站运营图片在线制作免费软件
  • 金融网站建设方案wordpress搜索框代码
  • jsp网站制作详细教程广州工商学院官网
  • 好看的手机网站布局自助建站信息发布网企业
  • 江门找人做网站排名怎样搭建大型企业网络
  • 天津做企业网站公司企业搭建网站多少钱
  • 北京做百度网站花西子品牌营销策略分析
  • 保险微网站制作平面设计价格收费表
  • 福永自适应网站建汽车报价网址
  • 珠海左右创意园网站开发海尔工业互联网公司排名
  • 哪块行业需要网站建设使用h5做的学习网站源码
  • 北京兼职网站建设怎样看一个网站是谁做的
  • 信息发布型网站wordpress快速入门指南布局篇
  • 中企动力网站策划wordpress批量alt代码
  • 网站建设问题及解决办法选服务好的网站建设
  • 西安免费网站建站模板校园局域网设计方案
  • 哪个医学网站大夫可以做离线题推荐10个网站
  • 苏州建设网站wordpress iis6 伪静态
  • 网站建设问卷调研郑州搜索引擎优化
  • 企业网站建江津哪个网站可以做顺风车
  • 天津网站维护坑梓网站建设基本流程
  • 电商网站开发的主流技术wordpress post存储
  • 比较好的源码网站wordpress主题修改菜鸟教程
  • 房产公司网站建设wordpress修改主题头部
  • 宜昌网站改版网站建设投放广告