Wednesday, December 11, 2013

Gluster and (not) restarting brick processes upon updates

Gluster users have different opinions on when the Gluster daemons should be restarted. This seems to be a very common discussion for a lot daemons, and pops up on the Fedora Developers mailinglist regularly.

An explanation on how and when Gluster starts its daemons is probably in order. A storage server running Gluster always has at least one process running, the management daemon (glusterd). The management daemon is responsible for building the Trusted Pool (aka cluster) of the Friends (other storage servers) that it knows. The glusterd process also handles the actions from the commandline client or other storage servers' glusterd processes.

Before a storage server provides bricks for a volume, glusterd is the only process that is running. After a volume has been created and started, each brick will have its own glusterfsd process. glusterd starts these glusterfsd processes when the volume is started (gluster volume start VOLNAME) or when booting and the volume should be in a 'started' state.

In addition to starting the brick processes, glusterd is also responsible for starting the NFS-server and the self-heal-daemon (when these are not disabled). Both of these processes are a glusterfs client process and are started once per storage server.

Client processes for mounting Gluster Volumes through FUSE are not started by the glusterd management daemon. These processes are started upon mounting and are not known to the Gluster processes that provide the storage services.

When updates are installed, it is highly recommended to restart all the binaries that had their content (either the binaries themselves, or loaded libraries) changed. When no restart is performed, the old binaries are still running and existing bugs that the update intends to fix are not applied. This add to the confusion about which version is running, because rpm -q glusterfs will return the updated version, which is different from the most recent version that has been logged when the daemons started.

Luckily, systemd makes it pretty easy to restart all processes that glusterd started. But, unfortunately there are some valid (advanced/power-user) use-cases where restarting all the processes is not needed and can cause more problems than it would prevent. To accommodate these users on Fedora, we have split the management of the daemons over two systemd services:
  • glusterd.service for starting/stopping the glusterd process
  • glusterfsd.service for stopping the glusterfsd/brick processes
On most storage servers, both services should be activated. glusterfsd.service does nothing on start, but it will kill the glusterfsd processes when it gets stopped (or restarted). The glusterd.service starts and stops the glusterd management process (which in turn starts the needed glusterfsd processes).

Those users that can not allow automatic updates restart the glusterfsd processes, can disable the glusterfsd.service and no processes that provide the bricks for the volumes should be restarted:

# systemctl disable glusterfsd.service

As long as this service is active (check with systemctl status glusterfsd.service), an update will cause a restart of the brick processes. Stopping the service and restarting the glusterd.service is required once, or a reboot will suffice too.

In order to have the glusterfsd processes stopped on shutdown, the glusterfsd.service file can be copied with a name that is unknown to the GlusterFS package. If the RPMs do not know about the service, they will not try to restart it. The following set of commands should work for these users:

# systemctl disable glusterfsd.service
# cp /usr/lib/systemd/system/glusterfsd.service /etc/systemd/system/
# systemctl daemon-reload
# systemctl start glusterfsd-shutdown-only.service

Any issues, questions, suggestions or notes can be passed to me on IRC (ndevos on Freenode in #gluster) or can be reported in a bug against the Fedora GlusterFS package.
Post a Comment