版权声明:本文为博主原创文章,转载请注明出处:https://twocups.cn/index.php/2021/06/28/40/

写在前面

这一章比较特殊,因为它不是 Zabbix 的安装和部署教程,而是我的一次启动 zabbix-server 失败的排错经历记录。这个错误我在网上找了好了,都没有直接的解决方法。只有一个提问和我的情况很像,但是下面没有回答。所以我自己记录一下我这次 zabbix-server 启动失败的情况以及解决方法,希望能帮到遇到同样情况的人。如果不感兴趣,可以直接跳过这章。

配置文件填写错误

在我准备将 Zabbix 连接到 elasticsearch,从而修改配置文件 /etc/zabbix/zabbix_server.conf 的时候,不小心将

elasticsearch 地址中的“localhost”拼成了“localhsot”。之后我重启 zabbix-server 及其相关组件的时候自然是失败了。

systemctl restart zabbix-server zabbix-agent httpd rh-php72-php-fpm

毕竟这次重启 zabbix-server 修改的地方不多,所以我很快就发现自己拼错了。当我将拼错的地方修改好,并再次重启 zabbix-server 的时候,却卡住了。shell 也没有报错,就是卡住了,什么都没显示。那么我决定先看看哪里出错了。

systemctl status zabbix-server.service -l

报错:
● zabbix-server.service - Zabbix Server
  Loaded: loaded (/usr/lib/systemd/system/zabbix-server.service; enabled; vendor preset: disabled)
  Active: deactivating (stop-sigterm) since Wed 2021-06-16 16:01:21 CST; 33min ago
Process: 25082 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS)
Process: 18684 ExecStart=/usr/sbin/zabbix_server -c $CONFFILE (code=exited, status=0/SUCCESS)
Main PID: 18688 (zabbix_server)
  CGroup: /system.slice/zabbix-server.service
          ├─18688 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
          ├─18699 /usr/sbin/zabbix_server: configuration syncer #1 [terminated
          ├─18712 /usr/sbin/zabbix_server: housekeeper #1 [terminated]    
          ├─18713 /usr/sbin/zabbix_server: timer #1 [terminated]          
          ├─18715 /usr/sbin/zabbix_server: http poller #1 [terminated]    
          ├─18717 /usr/sbin/zabbix_server: discoverer #1 [terminated]      
          ├─18718 /usr/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000027 sec, syncing history
          ├─18719 /usr/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000016 sec, syncing history
          ├─18722 /usr/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000013 sec, syncing history
          ├─18723 /usr/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000016 sec, syncing history
          ├─18725 /usr/sbin/zabbix_server: escalator #1 [terminated]      
          ├─18727 /usr/sbin/zabbix_server: proxy poller #1 [terminated]    
          ├─18728 /usr/sbin/zabbix_server: self-monitoring #1 [terminated]
          ├─18729 /usr/sbin/zabbix_server: task manager #1 [terminated]    
          ├─18731 /usr/sbin/zabbix_server: poller #1 [terminated]          
          ├─18732 /usr/sbin/zabbix_server: poller #2 [terminated]          
          ├─18733 /usr/sbin/zabbix_server: poller #3 [terminated]          
          ├─18735 /usr/sbin/zabbix_server: poller #4 [terminated]          
          ├─18736 /usr/sbin/zabbix_server: poller #5 [terminated]          
          ├─18737 /usr/sbin/zabbix_server: unreachable poller #1 [terminated
          ├─18738 /usr/sbin/zabbix_server: trapper #1 [terminated]        
          ├─18739 /usr/sbin/zabbix_server: trapper #2 [terminated]        
          ├─18740 /usr/sbin/zabbix_server: trapper #3 [terminated]        
          ├─18741 /usr/sbin/zabbix_server: trapper #4 [terminated]        
          ├─18742 /usr/sbin/zabbix_server: trapper #5 [terminated]        
          ├─18743 /usr/sbin/zabbix_server: icmp pinger #1 [terminated]    
          ├─18744 /usr/sbin/zabbix_server: alert manager #1 [terminated]  
          ├─18745 /usr/sbin/zabbix_server: alerter #1 started              
          ├─18746 /usr/sbin/zabbix_server: alerter #2 started              
          ├─18747 /usr/sbin/zabbix_server: alerter #3 started              
          ├─18749 /usr/sbin/zabbix_server: preprocessing manager #1 [terminated
          ├─18750 /usr/sbin/zabbix_server: preprocessing worker #1 started
          ├─18751 /usr/sbin/zabbix_server: preprocessing worker #2 started
          ├─18752 /usr/sbin/zabbix_server: preprocessing worker #3 started
          ├─18753 /usr/sbin/zabbix_server: lld manager #1 [terminated]    
          ├─18754 /usr/sbin/zabbix_server: lld worker #1 started          
          ├─18755 /usr/sbin/zabbix_server: lld worker #2 started          
          └─18756 /usr/sbin/zabbix_server: alert syncer #1 [terminated]    

Jun 16 15:57:12 * systemd[1]: Starting Zabbix Server...
Jun 16 15:57:12 * systemd[1]: zabbix-server.service: Supervising process 18688 which is not our child. We'll most likely not notice when it exits.
Jun 16 15:57:12 * systemd[1]: Started Zabbix Server.
Jun 16 16:01:21 * systemd[1]: Stopping Zabbix Server...

我尝试用 kill 指令关闭进程18688,但是没有用,它仍然存在。于是,我查了下这个18688进程。

cat /proc/18688/status

显示:
Name:   zabbix_server
NStgid: 18688
NSpid: 18688
NSsid: 18685
State: S (sleeping)
Tgid:   18688
Pid:   18688
PPid:   1
TracerPid:     0
Uid:   991     991     991     991
Gid:   988     988     988     988
FDSize: 64
Groups: 988
VmPeak:   160968 kB
VmSize:   160964 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:     3952 kB
VmRSS:     3952 kB
VmData:     1084 kB
VmStk:       132 kB
VmExe:     3304 kB
VmLib:     19520 kB
VmPTE:       248 kB
VmSwap:       0 kB
Threads:       1
SigQ:   9/319773
SigPnd: 0000000000000000
ShdPnd: 0000000000004000
SigBlk: 0000000000014000
SigIgn: 0000000000001000
SigCgt: 0000000180016ecf
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
Seccomp:       0
Cpus_allowed:   ff
Cpus_allowed_list:     0-7
Mems_allowed:   00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list:     0
voluntary_ctxt_switches:       75
nonvoluntary_ctxt_switches:     3

我们从端口信息可以看到,父进程(PPid)是1,那么它大概率已经是僵尸进程了。

于是我又看了一下 Zabbix 的日志。

cat /var/log/zabbix/zabbix_server.log | tail -n 5

显示:
18722:20210616:164806.736 cannot send data to elasticsearch: Could not resolve host: localhsot; Name or service not known
18719:20210616:164806.742 cannot send data to elasticsearch: Could not resolve host: localhsot; Name or service not known
18719:20210616:164806.742 cannot send data to elasticsearch: Could not resolve host: localhsot; Name or service not known
18718:20210616:164806.746 cannot send data to elasticsearch: Could not resolve host: localhsot; Name or service not known
18718:20210616:164806.746 cannot send data to elasticsearch: Could not resolve host: localhsot; Name or service not known

我又刷新了几次,时间信息一直在变,说明这个错误一直在发生,也就是说 zabbix-server 一直在尝试将数据发送到 elastcisearch,但一直失败。失败的原因很明显,日志里 能看的出来,就是因为我之前“localhost”拼错了。但我在最开始启动失败一次之后就改回来了,所以不应该到现在还在用拼错的“localhsot”去向 elasticsearch 请求。难道是 zabbix-server 没有意识到配置文件被修改了,需要刷新一下?但是重启本身就是一种刷新,所以一定是什么其他地方出错了。

于是,我想先把不停向 elasticsearch 进行的请求先停了。回头看上面的 zabbix-server 状态信息,明显能看的出来是有四条进程在向 elasticsearch 请求同步历史数据的。

├─18718 /usr/sbin/zabbix_server: history syncer #1 [processed 0 values, 0 triggers in 0.000027 sec, syncing history
├─18719 /usr/sbin/zabbix_server: history syncer #2 [processed 0 values, 0 triggers in 0.000016 sec, syncing history
├─18722 /usr/sbin/zabbix_server: history syncer #3 [processed 0 values, 0 triggers in 0.000013 sec, syncing history
├─18723 /usr/sbin/zabbix_server: history syncer #4 [processed 0 values, 0 triggers in 0.000016 sec, syncing history

于是我想通过 kill 指令结束这四条进程,但是结束不掉。我又去看了这四条进程的状态,果然都是僵尸进程。于是我将这四条进程连同 zabbix-server 的主进程全部用“kill -9”强制结束了。但是还是无法重启 zabbix-server。

这时候,我怀疑可能不单单是 Zabbix 层面的问题了,于是我去查看了系统日志。

cat /var/log/messages

最终发现系统中的程序 Control Groups(简称 cgroups,是一个基于内核的进程级资源隔离工具)一直在重启但是失败。而报错的原因是由于“Device or resource busy”而导致有一块内存无法挂载。

于是,我打算清理一下cgroups,结果又报错了,理由还是一样的。

cgclear

报错:
cgclear failed with Device or resource busy

所以,我打算将那块挂载不上的内存先取下来。

umount /内存地址

报错:
/内存地址 Device or resource busy

于是,我想看看到底是谁在占用这块内存。

fuser -cu /内存地址

找到以后,先问问周围有没有人在用这个进程,确认之后我将这个进程结束了,然后就可以顺利取下这个之前被占用的内存。之后,我清理了 cgroups,并且重启了 cgconfig 服务。(别忘了事后自己把内存挂载回去)

umount /内存地址
cgclear
service cgconfig restart

之后,cgroups 就正常了。这时我再去启动 zabbix-server, 也能正常运行了。

总结分析

这次的排错还是蛮有代表性的,因为不是单一的原因,而是一系列的错误阴差阳粗地碰到了一起而导致了 zabbix-server 无法正常启动,甚至连报错都没有。这次错误最开始的原因其实不是 zabbix-server 的配置文件出错,而是有一个进程占用了一块内存区域,导致了进程级资源管理工具 Control Groups 卡住了,所以我重新启动 zabbix-server 的时候才发生了一系列不正常的情况。我之前在网上查资料的时候,没有发现有人找到类似的错误原因,所以写了这篇博客,希望能帮到以后遇到同样情况的人。

林皓伟

《【Zabbix系列】特别章01:zabbix-server启动失败排错记录》有 13 条评论
  1. There are many portfolio management tools available in the market that it is already confusing which among them will be most advantageous to us. I can see how diversified Personal Capital is; hence, I would like to try it and see for myself how it differs from other portfolio management tools. Thanks for sharing! Wiley Layson

  2. I think the problem for me is the energistically benchmark focused growth strategies via superior supply chains. Compellingly reintermediate mission-critical potentialities whereas cross functional scenarios. Phosfluorescently re-engineer distributed processes without standardized supply chains. Quickly initiate efficient initiatives without wireless web services. Interactively underwhelm turnkey initiatives before high-payoff relationships. Sylvester Doswell

  3. WP Rentals can benefit both hosts and travelers. Hosts get to meet people from around the world while making a little extra money, and travelers can often stay for less than the cost of a hotel room. In addition, many travelers enjoy accommodations that offer a different experience from standard hotels. Rudolph Leukuma

  4. Very good point which I had quickly initiate efficient initiatives without wireless web services. Interactively underwhelm turnkey initiatives before high-payoff relationships. Holisticly restore superior interfaces before flexible technology. Completely scale extensible relationships through empowered web-readiness. Paris Deak

  5. We definitely missed our Golf & the Gospel family this year, as well as the entire staff at Camp Arcadia! But, we all know that God is in control of EVERY situation, both good and bad. He is guiding us through this event, and we will all come out stronger because of it. We are surrounded by His love and mercy. We are awed by His gifts to us. We are humbled at His feet as we thank and praise Him. He is PRESENT in every sunset, every wave that crashes, every breeze that stirs the trees. We praise him. Benedict Gander

  6. Unquestionably imagine that which you stated. Your favourite reason seemed to be on the internet the simplest factor to bear in mind of. I say to you, I certainly get irked even as other people think about issues that they plainly do not recognise about. You managed to hit the nail upon the highest as smartly as outlined out the whole thing without having side-effects , other people can take a signal. Will probably be again to get more. Thank you James Exon

  7. They do still use DDT, under high regulation, in very limited quantities in regions worst hit by malaria. Why? Because it works. Would it work against Zika-carrying mosquitos? Some research says yes, some says no. There are many unanswered questions about this particular public health crisis, and nothing about has behaved as expected. Don Gaber

  8. about it but what was true? I walked towards it, eyes transfixed on the bleak castle. The trees so close together piercing me in the skin. Gingerly I crept up the cobble stone path. I had to find out the truth about this castle. Now I was in front of the castle I felt like as if my life was on the line. Those were my last thoughts! Jeremiah Ocon

  9. I needed to put you this very small remark in order to say thank you over again for your personal incredible pointers you have contributed on this site. It is really tremendously open-handed of people like you to convey unhampered just what most of us might have offered for sale for an electronic book in making some cash for their own end, primarily seeing that you could possibly have done it if you ever decided. The suggestions likewise served to be the easy way to fully grasp that other people have a similar zeal just as my very own to realize many more in respect of this matter. Certainly there are numerous more fun occasions up front for folks who find out your blog. Roy Mcnayr

  10. I have to show my gratitude for your kindness supporting those individuals that should have help on in this issue. Your real dedication to getting the message throughout turned out to be remarkably insightful and have in every case empowered regular people just like me to arrive at their targets. This informative hints and tips denotes much a person like me and even more to my colleagues. Many thanks; from each one of us. Rogelio Noyes

  11. Needed to write you this very small word to help thank you so much yet again relating to the stunning thoughts you have shown on this site. This is so unbelievably open-handed with you to provide unreservedly what exactly numerous people would have sold for an electronic book to help with making some profit on their own, principally since you might well have tried it in the event you wanted. Those things likewise worked to become a great way to recognize that someone else have the same passion the same as my very own to see very much more with regard to this problem. I know there are some more pleasurable situations up front for those who start reading your website. Lyman Ornellas

发表回复

您的电子邮箱地址不会被公开。