Sunday, December 7, 2014

Configured Zabbix to keep my server cool

Recently I got myself an APC NetShelterCX mini. It is a 12U rack, with integrated fans for cooling. At the moment it is populated with some ARM boards (not rack mounted), their PDUs, a switch and (for now) one 2U server.

Surprisingly, the fans of the NetShelter are louder than the server (the rest does not have fans at all, except for the switch). But, it is not an option to keep the fans turned off all the time. When the server is idle, its CPUs temperature stays somewhere between 40-50 Celsius. However, when starting several virtual machines for testing some Gluster changes, the temperature rises steadily. To prevent overheating, the fans of the cabinet need to be turned on.

Of course, turning on the fans manually is possible, but it requires me to plug in the power cable. This is not very convenient when the cabinet is normally closed to reduce the noise. With the PDUs and fence_netio from the fence-agents-netio package, the fans inside the cabinet can be controlled remotely. That was a great step already!

Well, things can be even better. I don't want to monitor the temperature of my server and then decide when to turn on the fans. After spending some time looking for and comparing different monitoring solutions, I settled to try Zabbix. Packages for Zabbix are available for Fedora and EPEL, which makes trying it out pretty simple.

In a couple of hours playing with the installation and configuration, I was able to monitor the basics of my server. With a little manual configuration, the Zabbix Agent on the server can send the temperatures of the CPUs. All I had to do was setup a UserParameter in /etc/zabbix_agentd.conf:

UserParameter=cpu.temp.0,sensors | sed -n -r '/Physical id 0/s/^.*:[[:space:]]+\+([[:digit:]]+\.[[:digit:]]+).*$/\1/p'
UserParameter=cpu.temp.1,sensors | sed -n -r '/Physical id 1/s/^.*:[[:space:]]+\+([[:digit:]]+\.[[:digit:]]+).*$/\1/p'

The above configuration snippet tells the Zabbix Agent on the server to execute sensors (from the lm_sensors package), and filter the output through a sed command. The result is captured and sent to the Zabbix Server.

With the new cpu.temp.0 and .1 keys, the Zabbix webui can use these temperature items to setup a trigger which then invokes an action when the temperature rises above 55 Celsius. When the trigger enters the PROBLEM state, the action calls fence_netio and turns on the port that has the cable for the fans connected. When the trigger returns back to normal (checked every 5 minutes, now moved to 10), the port is disabled again.

This is the first time that I actually have setup monitoring with some custom actions. It was quite fun, and I'm certainly happy with the result.

Monday, November 17, 2014

Testing GlusterFS with very fast disks on Fedora 20

In the past I used to test with RAM-disks, provided by /dev/ram*. Gluster uses extended attributes on the filesystem, that makes is not possible to use tmpfs. While thinking about improving some of the GlusterFS regression tests, I noticed that Fedora 20 (and possibly earlier versions too) does not provide the /dev/ram* devices anymore. I could not find the needed kernel module quickly, so I decided to look into the newer zram module.

Getting zram working seems to be pretty simple. By default one /dev/zram0 is made available after loading the module. But, if needed, the module offers a parameter num_devices to create more devices. After loading the module with modprobe zram, you can do the following to create your high-performance volatile storage:

# SIZE_2GB=$(expr 1024 \* 1024 \* 1024 \* 2)
# echo ${SIZE_2GB} > /sys/class/block/zram0/disksize
# mkfs -t xfs /dev/zram0
# mkdir /bricks/fast
# mount /dev/zram0 /bricks/fast

With this mountpoint it is now possible to create a Gluster volume:

# gluster volume create fast ${HOSTNAME}:/bricks/fast/data
# gluster volume start fast

Once done with testing, stop and delete the Gluster volume, and free the zram like this:

# umount /bricks/fast
# echo 1 > /sys/class/block/zram0/reset

Of course, unloading the module with rmmod zram would free the resources too.

It is getting more important for Gluster to be prepared for very fast disks. Hardware like Fusion-io Flash drives and in future Persistent Memory/NVM will get more available in storage clouds, and of course we would like to see Gluster staying part of that!

Wednesday, November 5, 2014

Installing GlusterFS 3.4.x, 3.5.x or 3.6.0 on RHEL or CentOS 6.6

With the release of RHEL-6.6 and CentOS-6.6, there are now glusterfs packages in the standard channels/repositories. Unfortunately, these are only the client-side packages (like glusterfs-fuse and glusterfs-api). Users that want to run a Gluster Server on a current RHEL or CentOS now have difficulties installing any of todays current version of the Gluster Community packages.

The most prominent issue is that the glusterfs package from RHEL has a version of 3.6.0.28, and that is higher than the last week released version of 3.6.0. RHEL is shipping a pre-release that was created while the Gluster Community was still developing 3.6. An unfortunate packaging decision added a .28 to the version, where most other pre-releases would fall-back to a (rpm-)version like 3.6.0-0.1.something.bla.el6. The difference might look minor, but the result is a major disruption in the much anticipated 3.6 community release.

For the immediate need to fix this in a most easy way for our community users, we have decided to release version 3.6.1 later this week (maybe on Thursday November 6). This version is higher than the version in RHEL/CentOS, and therefore yum will prefer the package from the community repository over the one available in RHEL/CentOS. This is also the main reason why there have been no 3.6.0 packages provided on the download server.

Installing an older stable release (like 3.4 or 3.5) on RHEL/CentOS 6.6 requires a different approach. At the moment we can offer two solutions that can be used. We are still working on making this easier, until that is finalized, some manual actions are required.

Lets assume you want to verify if todays announced glusterfs-3.5.3beta2 packages indeed fix that bug you reported. (These steps apply to the other versions as well, this just happens to be what I have been testing.)

Option A: use exclude in the yum repository files for RHEL/CentOS

  1. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  2. edit /etc/yum.repos.d/redhat.repo or /etc/yum.repos.d/CentOS-Base.repo and under each repository that you find, add the following line

    exclude=glusterfs*

This prevents yum from installing the glusterfs* packages from the standard RHEL/CentOS repositories, but allows those packages from others. The Red Hat Customer Portal has an article about this configuration too.

Option B: install and configure yum-plugin-priorities

Using yum-plugin-priorities is probably a more stable solution. This does not require changes to the standard RHEL/CentOS repositories. However, an additional package needs to get installed.

  1. enable the optional repository when on RHEL, CentOS users can skip this step

    # subscription-manager repos --list | grep optional-rpms
        # subscription-manager repos --enable=*optional-rpms

  2. install the yum-plugin-priorities package:

    # yum install yum-plugin-priorities

  3. download the glusterfs-353beta2-epel.repo file and save it under /etc/yum.repos.d/

  4. edit the /etc/yum.repos.d/glusterfs-353beta2-epel.repo file and add the following option to each repository definition:

    priority=50

The default priority for repositories is 99. The repositories with the lowest number have the highest priority. As long as the RHEL/CentOS repositories do not have the priority option set, the packages from the glusterfs-353beta2-epel.repo will get preferred by yum.

When using the yum-plugin-priorities approach, we highly recommend that you check if all your repositories have a suitable (or missing) priority option. In case some repositories have the option set, but yum-plugin-priorities was not installed yet, the order of the repositories might have changed. Because of this, we do not want to force using yum-plugin-priorities on all the Gluster Community users that run on RHEL/CentOS.

In case users still have issues installing the Gluster Community packages on RHEL or CentOS, we recommend getting in touch with us on the Gluster Users mailinglist (archive) or in the #gluster IRC channel on Freenode.

GlusterFS 3.5.3beta2 is now available for testing

Even though GlusterFS 3.6.0 has been released last week, the 3.5 stable series continues to live on! The 2nd beta for GlusterFS 3.5.3 is now available for testing. Many bugs have been fixed since the 3.5.2 release, check the references below for details.

Bug reporters are encouraged to verify the fixes, and we invite others to test this beta to check if there are no regressions. Please see the announcement email for further information. When the release proves to be stable, a final 3.5.3 will be made available in a week or so.

Packages for different distributions can be found on the main download server.

Release Notes for GlusterFS 3.5.3

This is a bugfix release. The Release Notes for 3.5.0, 3.5.1 and 3.5.2 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.5 stable release.

Bugs Fixed:

  • 1081016: glusterd needs xfsprogs and e2fsprogs packages
  • 1100204: brick failure detection does not work for ext4 filesystems
  • 1126801: glusterfs logrotate config file pollutes global config
  • 1129527: DHT :- data loss - file is missing on renaming same file from multiple client at same time
  • 1129541: [DHT:REBALANCE]: Rebalance failures are seen with error message " remote operation failed: File exists"
  • 1132391: NFS interoperability problem: stripe-xlator removes EOF at end of READDIR
  • 1133949: Minor typo in afr logging
  • 1136221: The memories are exhausted quickly when handle the message which has multi fragments in a single record
  • 1136835: crash on fsync
  • 1138922: DHT + rebalance : rebalance process crashed + data loss + few Directories are present on sub-volumes but not visible on mount point + lookup is not healing directories
  • 1139103: DHT + Snapshot :- If snapshot is taken when Directory is created only on hashed sub-vol; On restoring that snapshot Directory is not listed on mount point and lookup on parent is not healing
  • 1139170: DHT :- rm -rf is not removing stale link file and because of that unable to create file having same name as stale link file
  • 1139245: vdsm invoked oom-killer during rebalance and Killed process 4305, UID 0, (glusterfs nfs process)
  • 1140338: rebalance is not resulting in the hash layout changes being available to nfs client
  • 1140348: Renaming file while rebalance is in progress causes data loss
  • 1140549: DHT: Rebalance process crash after add-brick and `rebalance start' operation
  • 1140556: Core: client crash while doing rename operations on the mount
  • 1141558: AFR : "gluster volume heal <volume_name> info" prints some random characters
  • 1141733: data loss when rebalance + renames are in progress and bricks from replica pairs goes down and comes back
  • 1142052: Very high memory usage during rebalance
  • 1142614: files with open fd's getting into split-brain when bricks goes offline and comes back online
  • 1144315: core: all brick processes crash when quota is enabled
  • 1145000: Spec %post server does not wait for the old glusterd to exit
  • 1147156: AFR client segmentation fault in afr_priv_destroy
  • 1147243: nfs: volume set help says the rmtab file is in "/var/lib/glusterd/rmtab"
  • 1149857: Option transport.socket.bind-address ignored
  • 1153626: Sizeof bug for allocation of memory in afr_lookup
  • 1153629: AFR : excessive logging of "Non blocking entrylks failed" in glfsheal log file.
  • 1153900: Enabling Quota on existing data won't create pgfid xattrs
  • 1153904: self heal info logs are filled with messages reporting ENOENT while self-heal is going on
  • 1155073: Excessive logging in the self-heal daemon after a replace-brick
  • 1157661: GlusterFS allows insecure SSL modes

Known Issues:

  • The following configuration changes are necessary for 'qemu' and 'samba vfs plugin' integration with libgfapi to work seamlessly:
    1. gluster volume set <volname> server.allow-insecure on
    2. restarting the volume is necessary
       gluster volume stop <volname>
       gluster volume start <volname>
      
    3. Edit /etc/glusterfs/glusterd.vol to contain this line:
       option rpc-auth-allow-insecure on
      
    4. restarting glusterd is necessary
       service glusterd restart
      
      More details are also documented in the Gluster Wiki on the Libgfapi with qemu libvirt page.
  • For Block Device translator based volumes open-behind translator at the client side needs to be disabled.
    gluster volume set <volname> performance.open-behind disabled
    
  • libgfapi clients calling glfs_fini before a successfull glfs_init will cause the client to hang as reported here. The workaround is NOT to call glfs_fini for error cases encountered before a successfull glfs_init. This is being tracked in Bug 1134050 for glusterfs-3.5 and Bug 1093594 for mainline.
  • If the /var/run/gluster directory does not exist enabling quota will likely fail (Bug 1117888).

Sunday, October 5, 2014

GlusterFS 3.5.3beta1 has been released for testing

The first beta for GlusterFS 3.5.3 is now available for download.

Packages for different distributions will land on the download server over the next few days. When packages become available, the package maintainers will send a notification to the gluster-users mailinglist.

With this beta release, we make it possible for bug reporters and testers to check if issues have indeed been fixed. All community members are invited to test and/or comment on this release.

If a bug from the list below has not been sufficiently fixed, please open the bug report, leave a comment with details of the testing and change the status of the bug to ASSIGNED.

In case someone has successfully verified a fix for a bug, please change the status of the bug to VERIFIED.

The Release Notes for 3.5.0, 3.5.1 and 3.5.2 contain a listing of all the new features that were added and bugs fixed in the GlusterFS 3.5 stable release.

Bugs Fixed:

  • 1081016: glusterd needs xfsprogs and e2fsprogs packages
  • 1129527: DHT :- data loss - file is missing on renaming same file from multiple client at same time
  • 1129541: [DHT:REBALANCE]: Rebalance failures are seen with error message " remote operation failed: File exists"
  • 1132391: NFS interoperability problem: stripe-xlator removes EOF at end of READDIR
  • 1133949: Minor typo in afr logging
  • 1136221: The memories are exhausted quickly when handle the message which has multi fragments in a single record
  • 1136835: crash on fsync
  • 1138922: DHT + rebalance : rebalance process crashed + data loss + few Directories are present on sub-volumes but not visible on mount point + lookup is not healing directories
  • 1139103: DHT + Snapshot :- If snapshot is taken when Directory is created only on hashed sub-vol; On restoring that snapshot Directory is not listed on mount point and lookup on parent is not healing
  • 1139170: DHT :- rm -rf is not removing stale link file and because of that unable to create file having same name as stale link file
  • 1139245: vdsm invoked oom-killer during rebalance and Killed process 4305, UID 0, (glusterfs nfs process)
  • 1140338: rebalance is not resulting in the hash layout changes being available to nfs client
  • 1140348: Renaming file while rebalance is in progress causes data loss
  • 1140549: DHT: Rebalance process crash after add-brick and `rebalance start' operation
  • 1140556: Core: client crash while doing rename operations on the mount
  • 1141558: AFR : "gluster volume heal <volume_name> info" prints some random characters
  • 1141733: data loss when rebalance + renames are in progress and bricks from replica pairs goes down and comes back
  • 1142052: Very high memory usage during rebalance
  • 1142614: files with open fd's getting into split-brain when bricks goes offline and comes back online
  • 1144315: core: all brick processes crash when quota is enabled
  • 1145000: Spec %post server does not wait for the old glusterd to exit
  • 1147243: nfs: volume set help says the rmtab file is in "/var/lib/glusterd/rmtab"

Known Issues:

  • The following configuration changes are necessary for 'qemu' and 'samba vfs plugin' integration with libgfapi to work seamlessly:
    1. gluster volume set <volname> server.allow-insecure on
    2. restarting the volume is necessary
       gluster volume stop <volname>
       gluster volume start <volname>
      
    3. Edit /etc/glusterfs/glusterd.vol to contain this line:
       option rpc-auth-allow-insecure on
      
    4. restarting glusterd is necessary
       service glusterd restart
      
      More details are also documented in the Gluster Wiki on the Libgfapi with qemu libvirt page.
  • For Block Device translator based volumes open-behind translator at the client side needs to be disabled.
    gluster volume set <volname> performance.open-behind disabled
    
  • libgfapi clients calling glfs_fini before a successfull glfs_init will cause the client to hang as reported here. The workaround is NOT to call glfs_fini for error cases encountered before a successfull glfs_init. This is being tracked in Bug 1134050 for glusterfs-3.5 and Bug 1093594 for mainline.
  • If the /var/run/gluster directory does not exist enabling quota will likely fail (Bug 1117888).

Monday, September 22, 2014

Experimenting with Ceph support for NFS-Ganesha

NFS-Ganesha is a user-space NFS-server that is available in Fedora. It contains several plugins (FSAL, File System Abstraction Layer) for supporting different storage backends. Some of the more interesting are:

Setting up a basic NFS-Ganesha server

Exporting a mounted filesystem is pretty simple. Unfortunately this failed for me when running with the standard nfs-ganesha packages on a minimal Fedora 20 installation. The following changes were needed to make NFS-Ganesha work for a basic export:

  • install rpcbind and make the nfs-ganesha.service depend on it
  • copy /etc/dbus-1/system.d/org.ganesha.nfsd.conf from the sources
  • create a /etc/sysconfig/nfs-ganesha environment file

When these initial things have been taken care of, a configuration file needs to be created. The default configuration file mentioned in the environment file is /etc/ganesha.nfsd.conf. The sources of nfs-ganesha contain some examples, the vfs.conf is quite usable as a starting point. After copying the example and modifying the paths to something more suitable, starting the NFS-server should work:

# systemctl start nfs-ganesha

In case something failed, there should be a note about it in /var/log/ganesha.log.

Exporting the Ceph filesystem with NFS-Ganesha

This assumes you have a working Ceph Cluster which includes several MON, OSD and one or more MDS daemons. The FSAL_CEPH from NFS-Ganesha uses libcephfs which seems to be the same as the ceph-fuse package for Fedora. the easiest way to make sure that the Ceph filesystem is functional, is to try and mount it with ceph-fuse.

The minimal requirements to get a Ceph client system to access the Ceph Cluster, seem to be a /etc/ceph/ceph.conf with a [global] section and a suitable keyring. Creating the ceph.conf on the Fedora system that was done the ceph-deploy:

$ ceph-deploy config push $NFS_SERVER

In my setup I scp'd the /etc/ceph/ceph.client.admin.keyring from one of my Ceph servers to the $NFS_SERVER. There probably are better ways to create/distribute a keyring, but I'm new to Ceph and this worked sufficiently for my testing.

When the above configuration was done, it was possible to mount the Ceph filesystem on the Ceph client that is becoming the NFS-server. This command worked without issues:

# ceph-fuse /mnt
# echo 'Hello Ceph!' > /mnt/README
# umount /mnt

The first write to the Ceph filesystem took a while. This is likely due to the initial work the MDS and OSD daemons need to do (like creating pools for the Ceph filesystem).

After confirming that the Ceph Cluster and Filesystem work, the configuration for NFS-Ganesha can just be taken from the sources and saved as /etc/ganesha.nfsd.conf. With this configuration, and restarting the nfs-ganesha.service, the NFS-export becomes available:

# showmount -e $NFS_SERVER
Export list for $NFS_SERVER:
/ (everyone)

NFSv4 uses a 'pseudo root' as mentioned in the configuration file. This means that mounting the export over NFSv4 results in a virtual directory structure:

# mount -t nfs $NFS_SERVER:/ /mnt
# find /mnt
/mnt
/mnt/nfsv4
/mnt/nfsv4/pseudofs
/mnt/nfsv4/pseudofs/ceph
/mnt/nfsv4/pseudofs/ceph/README

Reading and writing to the mountpoint under /mnt/nfsv4/pseudofs/ceph works fine, as long as the usual permissions allow that. By default NFS-Ganesha enabled 'root squashing', so the 'root' user may not do a lot on the export. Disabling this security measure can be done by placing this option in the export section:

Squash = no_root_squash;

Restart the nfs-ganesha.service after modifying /etc/ganesha.nfsd.conf and writing files as 'root' should work too now.

Future Work

For me, this was a short "let's try it out" while learning about Ceph. At the moment, I have no intention on working on the FSAL_CEPH for NFS-Ganesha. My main interest of this experiment with exporting a Ceph filesystem though NFS-Ganesha on a plain Fedora 20 installation, was to learn about usability of a new NFS-Ganesha configuration/deployment. In order to improve the user experience with NFS-Ganesha, I'll try and fix some of the issues I run into. Progress can be followed in Bug 1144799.

In future, I will mainly use NFS-Ganesha for accessing Gluster Volumes. My colleague Soumya posted a nice explanation on how to download, build and run NFS-Ganesha with support for Gluster. We will be working on improving the out-of-the-box support in Fedora while stabilizing the FSAL_GLUSTER in the upstream NFS-Ganeasha project.

Thursday, July 31, 2014

GlusterFS 3.5.2 has been released!


GlusterFS 3.5.2 has been announced some minutes ago. These are the changes that have been included in this release. Known issues are documented below too.

Release Notes for GlusterFS 3.5.2

This is mostly a bugfix release. The Release Notes for 3.5.0 and 3.5.1 contain a listing of all the new features that were added and bugs fixed.

Bugs Fixed:

  • 1096020: NFS server crashes in _socket_read_vectored_request
  • 1100050: Can't write to quota enable folder
  • 1103050: nfs: reset command does not alter the result for nfs options earlier set
  • 1105891: features/gfid-access: stat on .gfid virtual directory return EINVAL
  • 1111454: creating symlinks generates errors on stripe volume
  • 1112111: Self-heal errors with "afr crawl failed for child 0 with ret -1" while performing rolling upgrade.
  • 1112348: [AFR] I/O fails when one of the replica nodes go down
  • 1112659: Fix inode leaks in gfid-access xlator
  • 1112980: NFS subdir authentication doesn't correctly handle multi-(homed,protocol,etc) network addresses
  • 1113007: nfs-utils should be installed as dependency while installing glusterfs-server
  • 1113403: Excessive logging in quotad.log of the kind 'null client'
  • 1113749: client_t clienttable cliententries are never expanded when all entries are used
  • 1113894: AFR : self-heal of few files not happening when a AWS EC2 Instance is back online after a restart
  • 1113959: Spec %post server does not wait for the old glusterd to exit
  • 1114501: Dist-geo-rep : deletion of files on master, geo-rep fails to propagate to slaves.
  • 1115369: Allow the usage of the wildcard character '*' to the options "nfs.rpc-auth-allow" and "nfs.rpc-auth-reject"
  • 1115950: glfsheal: Improve the way in which we check the presence of replica volumes
  • 1116672: Resource cleanup doesn't happen for clients on servers after disconnect
  • 1116997: mounting a volume over NFS (TCP) with MOUNT over UDP fails
  • 1117241: backport 'gluster volume status --xml' issues
  • 1120151: Glustershd memory usage too high
  • 1124728: SMB: CIFS mount fails with the latest glusterfs rpm's

Known Issues:

  • The following configuration changes are necessary for 'qemu' and 'samba vfs plugin' integration with libgfapi to work seamlessly:
    1. gluster volume set <volname> server.allow-insecure on
    2. restarting the volume is necessary
       gluster volume stop <volname>
       gluster volume start <volname>
      
    3. Edit /etc/glusterfs/glusterd.vol to contain this line:
       option rpc-auth-allow-insecure on
      
    4. restarting glusterd is necessary
       service glusterd restart
      
      More details are also documented in the Gluster Wiki on the Libgfapi with qemu libvirt page.
  • For Block Device translator based volumes open-behind translator at the client side needs to be disabled.
      gluster volume set <volname> performance.open-behind disabled
    
  • libgfapi clients calling glfs_fini before a successfull glfs_init will cause the client to hang as reported here. The workaround is NOT to call glfs_finifor error cases encountered before a successfull glfs_init.
  • If the /var/run/gluster directory does not exist enabling quota will likely fail (Bug 1117888).