Sunday, December 7, 2014

Configured Zabbix to keep my server cool

Recently I got myself an APC NetShelterCX mini. It is a 12U rack, with integrated fans for cooling. At the moment it is populated with some ARM boards (not rack mounted), their PDUs, a switch and (for now) one 2U server.

Surprisingly, the fans of the NetShelter are louder than the server (the rest does not have fans at all, except for the switch). But, it is not an option to keep the fans turned off all the time. When the server is idle, its CPUs temperature stays somewhere between 40-50 Celsius. However, when starting several virtual machines for testing some Gluster changes, the temperature rises steadily. To prevent overheating, the fans of the cabinet need to be turned on.

Of course, turning on the fans manually is possible, but it requires me to plug in the power cable. This is not very convenient when the cabinet is normally closed to reduce the noise. With the PDUs and fence_netio from the fence-agents-netio package, the fans inside the cabinet can be controlled remotely. That was a great step already!

Well, things can be even better. I don't want to monitor the temperature of my server and then decide when to turn on the fans. After spending some time looking for and comparing different monitoring solutions, I settled to try Zabbix. Packages for Zabbix are available for Fedora and EPEL, which makes trying it out pretty simple.

In a couple of hours playing with the installation and configuration, I was able to monitor the basics of my server. With a little manual configuration, the Zabbix Agent on the server can send the temperatures of the CPUs. All I had to do was setup a UserParameter in /etc/zabbix_agentd.conf:

UserParameter=cpu.temp.0,sensors | sed -n -r '/Physical id 0/s/^.*:[[:space:]]+\+([[:digit:]]+\.[[:digit:]]+).*$/\1/p'
UserParameter=cpu.temp.1,sensors | sed -n -r '/Physical id 1/s/^.*:[[:space:]]+\+([[:digit:]]+\.[[:digit:]]+).*$/\1/p'

The above configuration snippet tells the Zabbix Agent on the server to execute sensors (from the lm_sensors package), and filter the output through a sed command. The result is captured and sent to the Zabbix Server.

With the new cpu.temp.0 and .1 keys, the Zabbix webui can use these temperature items to setup a trigger which then invokes an action when the temperature rises above 55 Celsius. When the trigger enters the PROBLEM state, the action calls fence_netio and turns on the port that has the cable for the fans connected. When the trigger returns back to normal (checked every 5 minutes, now moved to 10), the port is disabled again.

This is the first time that I actually have setup monitoring with some custom actions. It was quite fun, and I'm certainly happy with the result.