Wednesday, December 11, 2013

Gluster and (not) restarting brick processes upon updates

Gluster users have different opinions on when the Gluster daemons should be restarted. This seems to be a very common discussion for a lot daemons, and pops up on the Fedora Developers mailinglist regularly.

An explanation on how and when Gluster starts its daemons is probably in order. A storage server running Gluster always has at least one process running, the management daemon (glusterd). The management daemon is responsible for building the Trusted Pool (aka cluster) of the Friends (other storage servers) that it knows. The glusterd process also handles the actions from the commandline client or other storage servers' glusterd processes.

Before a storage server provides bricks for a volume, glusterd is the only process that is running. After a volume has been created and started, each brick will have its own glusterfsd process. glusterd starts these glusterfsd processes when the volume is started (gluster volume start VOLNAME) or when booting and the volume should be in a 'started' state.

In addition to starting the brick processes, glusterd is also responsible for starting the NFS-server and the self-heal-daemon (when these are not disabled). Both of these processes are a glusterfs client process and are started once per storage server.

Client processes for mounting Gluster Volumes through FUSE are not started by the glusterd management daemon. These processes are started upon mounting and are not known to the Gluster processes that provide the storage services.

When updates are installed, it is highly recommended to restart all the binaries that had their content (either the binaries themselves, or loaded libraries) changed. When no restart is performed, the old binaries are still running and existing bugs that the update intends to fix are not applied. This add to the confusion about which version is running, because rpm -q glusterfs will return the updated version, which is different from the most recent version that has been logged when the daemons started.

Luckily, systemd makes it pretty easy to restart all processes that glusterd started. But, unfortunately there are some valid (advanced/power-user) use-cases where restarting all the processes is not needed and can cause more problems than it would prevent. To accommodate these users on Fedora, we have split the management of the daemons over two systemd services:
  • glusterd.service for starting/stopping the glusterd process
  • glusterfsd.service for stopping the glusterfsd/brick processes
On most storage servers, both services should be activated. glusterfsd.service does nothing on start, but it will kill the glusterfsd processes when it gets stopped (or restarted). The glusterd.service starts and stops the glusterd management process (which in turn starts the needed glusterfsd processes).

Those users that can not allow automatic updates restart the glusterfsd processes, can disable the glusterfsd.service and no processes that provide the bricks for the volumes should be restarted:

# systemctl disable glusterfsd.service

As long as this service is active (check with systemctl status glusterfsd.service), an update will cause a restart of the brick processes. Stopping the service and restarting the glusterd.service is required once, or a reboot will suffice too.

In order to have the glusterfsd processes stopped on shutdown, the glusterfsd.service file can be copied with a name that is unknown to the GlusterFS package. If the RPMs do not know about the service, they will not try to restart it. The following set of commands should work for these users:

# systemctl disable glusterfsd.service
# cp /usr/lib/systemd/system/glusterfsd.service /etc/systemd/system/
# systemctl daemon-reload
# systemctl start glusterfsd-shutdown-only.service

Any issues, questions, suggestions or notes can be passed to me on IRC (ndevos on Freenode in #gluster) or can be reported in a bug against the Fedora GlusterFS package.

Sunday, December 1, 2013

Using Gluster as Primary Storage in CoudStack

CloudStack could use a Gluster environment for different kind of storage types:
  1. Primary Storage: mount over the GlusterFS native client (FUSE)
    This post shows how it is working and refers to the patches that make this possible.
  2. Volumes for virtual machines: use the libgfapi integration in QEMU
    Next upcoming task, initial untested patch in the wip-branch.
  3. Secondary Storage: mount over the GlusterFS native client (FUSE)
The current work-in-progress repository on the Gluster Community Forge already has functional support for creating Primary Storage on existing Gluster environments:
  • Infrastructure -> Primary Storage -> Add Primary Storage
    Add Primary Storage
  • Infrastructure -> Zones -> Add Zone - [wizard]
    Add Primary Storage through the Zone Wizard
Via the Infrastructure -> Primary Storage menu, the details of the newly created storage can be displayed.
Primary Storage Details

After creating a virtual machine from the standard CentOS template, it can be verified that the Primary Storage Pool on the Gluster environment is functioning. On the hypervisor that runs the VM:

[root@agent ~]# mount | grep gluster on /mnt/dd697445-f67c-33bc-af52-386de3ff7245 type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072)

[root@agent ~]# ps -C qemu-kvm -o command | grep i-2-3-VM
/usr/libexec/qemu-kvm -name i-2-3-VM ... -drive file=/mnt/dd697445-f67c-33bc-af52-386de3ff7245/1afd48d2-c5e1-44ce-bcb3-051cc4d59716,if=none,id=drive-virtio-disk0,format=qcow2,cache=none ...

The changes to CloudStack that make this possible are located on the Gluster Community Forge and have been posted for review:
  • [#15932] Add support for Primary Storage on Gluster using the libvirt backend
  • [#15933] Add Gluster to the list of protocols in the Management Server

Monday, November 25, 2013

Initial work on Gluster integration with CloudStack

Last week there was a CloudStack Conference at the Beurs van Belage in Amsterdam. I attended the first day and joined the Hackathon. Without any prior knowledge of CloudStack, I was asked by some of the Gluster community people to have a look at adding support for Gluster in CloudStack. An interesting topic, and of course I'll happily have a go at it.
CloudStack seems quite a nice project. The conference showed an awesome part of the community, loads of workshops and a surprising number of companies that sponsor and contribute to CloudStack. Very impressive!
One of the attendants at the CloudStack Conference was Wido den Hollander. Wido has experience with integrating CEPH in CloudStack, and gave an explanation and some pointers on how storage is implemented.

Integration Notes


It seems that the most useful way to integrate Gluster with CloudStack is to make sure libvirt know how to use a Gluster backend. Checking with some of my colleagues that are part of the group that support libvirt, quickly showed that libvirt knows about Gluster already (Add new net filesystem glusterfs).
This suggests that it should be possible to create a storage pool in libvirt that is hosted on a Gluster environment. A little trial and error shows that a command like this creates the pool:

# virsh pool-create-as --name primary_gluster --type netfs --source-host $(hostname) --source-path /primary --source-format glusterfs --target /mnt/libvirt/primary_gluster

The components that the above command uses, are:
  • primary_gluster: the name of the storage pool in libvirt
  • netfs: the type of the pool, netfs mounts the 'pool' under the given --target
  • $(hostname): one of the Gluster servers that is part of the Trusted Storage Pool that provides the Gluster volume
  • /primary: the name of the Gluster volume
  • /mnt/libvirt/primary_gluster: directory where libvirt will mount the Gluster volume
Creating a volume (a libvirt volume, which is a file on the Gluster volume) can be done through libvirt:

# virsh vol-create-as --pool primary_gluster --name virsh-created-vol.img --capacity 512M --format raw

This will create the file /mnt/libvirt/primary_gluster/virsh-created-vol.img and that file can be used as a storage backend for a virtual machine. An example of a snippet for the disk that can be attached to a VM:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='gluster' name='/primary/virsh-created-vol.img'>
        <host name='HOSTNAME' port='24007'/>
      <target dev='vda' bus='virtio'/>

There are some important prerequisites that need to be applied to the Gluster volume so that libvirt can start a virtual machine with the appropriate user. After setting these options on the Gluster volume and in /etc/glusterfs/glusterd.vol, a test virtual machine can get started. The log of the vm (/var/log/libvirt/qemu/just-a-vm.log) shows the QEMU command line, and this contains the path to the storage:

... /usr/libexec/qemu-kvm -name just-a-vm ... -drive file=gluster+tcp://HOSTNAME:24007/primary/virsh-created-vol.img,if=none,id=drive-virtio-disk0,format=raw,cache=none ...

Design Overview

When CloudStack utilized libvirt, it should be relatively straight forward to add support for Gluster in CloudStack. A diagram that shows the main interactions and their components looks like this:

                    |  CloudStack  |
                      |  libvirt  |
           |                               |
 .---------+----------.         .----------+----------.
 |  / storage pool /  |         |   virtual machine   |
 |  image management  |         |      management     |
 '---------+----------'         | / XML description / |
           |                    '----------+----------'
           V                               |
........................                   V
:     / vfs/fuse /     :     .............................
:  mount -t glusterfs  :     :    / QEMU + libgfapi /    :
:......................:     :  qemu file=gluster://...  :

The parts that are already functioning are these:
  • libvirt mounts a Gluster volume as a netfs/fuse-filesystem
  • create a XML definition for the disk and pass gluster:// on to QEMU

The actual development work will be in teaching CloudStack to intruct libvirt to use a Storage Pool backed by a Gluster Volume and attach disks to a virtual machine with the gluster protocol.

CloudStack Storage Subsystem modifications

Wido pointed out that most of the storage changes will be needed in the LibvirtStoragePoolDef and LibvirtStorageAdapter Java classes. Also the Storage Core would need to know about the new storage backend.
After some browsing and reading the sources, the needed modifications looked straight forward. The Gluster backend compares to the NFS backend, which can be used as an example.
Changing the code is an easy part, compared to testing it. Remember that I have no CloudStack background what so ever... Setting up a CloudStack environment to see if the modifications do anything, is far from trivial. Compared to the time I spend on changing the source code, trying to get a minimal test environment functioning took most of my time. At this moment, my patches are untested and therefore I have not posted them for review yet :-/

Setting up a CloudStack environment for testing

Some pointers to setup a development environment:
  • Building CloudStack manually (non RPMs)
  • maven 3.0.4 has been deprecated, use maven 3.0.5 instead
  • Installation Guide
  • RHEL6 requires the Optional Channel for jsvc from the jakarta-commons-daemon-jsvc package
  • install the cloudstack-agent (and -common) package
  • set guid and in /etc/cloudstack/agent/

Running the CloudStack Management server is easy enough when the sources are checked out and build. A command like this works for me:

# mvn -pl :cloud-client-ui jetty:run

To deploy the changes for the cloudstack-agent, I prefer to build and install RPMs. Building these is made easy by the packaging/centos63/ script:

# cd packaging/centos63 ; ./ ; cd -

This script and the resulting packages work well on RHEL-6.5.

Upcoming work

With the test environment in place, I can now start to make changes to the Management Server. The current modifications in the JavaScript code make it possible to select Gluster as a primary storage pool. Unfortunately, I'm no web developer and changing JavaScript isn't something I'm very good at. I will be hacking on it every now and then, and hope to be able to have something suitable for review soon.
Of course, any assistance is welcome! I'm happy to share my work in progress if there is an interest. No guarantees about any working functionality though ;-)

Saturday, May 4, 2013

Fedora 18 Remix for Genesi EFIKA MX Smartbook available

After quite some delay, I have been able to try out a work-in-progress kernel-tree (3.7) from Sascha Hauer for the Genesi EFIKA MX Smartbook. The Fedora Remix image I have created uses the barebox bootloader to load the device-tree and the kernel (a zImage with concatenated initramfs).

The contents of the image is based on the Fedora 18 Generic Root Filesystem armhfp. XFCE and most of the desktop basics are included. Very little changes were needed over all. As the Smartbook has only 512MB of RAM, it is recommended to add some swap space, re-using the swap on the internal HD works for me (you need to update your /etc/fstab).

The current kernel configuration as found in Sascha's tree does not play very nice with the Fedora rootfs. When exercising some load, quite some oopses occur and it often results in a kernel panic (this happens because some 'sh' process reads /proc/meminfo, no idea what process, or why). I have been able to re-configure the kernel to the Fedora specifics, while keeping to the options from Sascha's defconfig. This kernel looks more stable, and a 'yum update' over the integrated Wifi adapter has finished successfully. So, if you have tried an earlier (not publicly announced) image, it would be in your benefit to update to this new release.

A difficulty with using this image, is that you need to configure your Smartbook to boot from SD-card, not the internal SPI-NOR. This change is done by flipping three dip-switches that are located under the keyboard, next to the flat-cable that connects the keyboard. The Genesi site contains a very good explanation on removing the keyboard so that you can access the switches.

The standard and default configuration of these switches cause the system to use the bootloader (uboot) to boot from the internal SPI-NOR and look like this:

 | X       |
 |   X X X |
Change this to the following to boot with barebox from the SD-card:
 |     X X |
 | X X     |

For now, I was only successful with booting from the left SD-card slot. barebox should also support booting from the micro-SD-card slot that is accessible by removing the battery. You will need to modify the barebox configuration (adding variables/scripts) for this. A future release of this remix will hopefully come with a modified barebox configuration so that both card slots work (and also separating zImage from the initramdisk).

These instructions and a link for downloading are captured in the Fedora Wiki and anyone is free to improve, correct and amend that page.

Sunday, April 7, 2013

Configuring a bluetooth keyboard system-wide from the command line

Recently I bought a new keyboard, which I intend to use when my laptop is placed in its docking station. There are two external monitors connected, making the display of the laptop rather useless (only two outputs are supported at the same time). In normal circumstances the laptop lid will be closed, so the keyboard is not accessible.

My new keyboard is a Logitech K760 and is connected through bluetooth. Pairing with help from the the XFCE/GNOME tools is easy enough, but this causes the keyboard to be available after login only. That is not very practical. After boot, I have to login through GDM and prefer to not need to use the keyboard of the laptop itself. For this, I needed to figure out how to make the bluetooth keyboard available on system level, and not per user. Descriptions on how to do this seem to be very sparse, and mostly depend on other distributions than RHEL or Fedora. I prefer to use standard tools as much as possible, adding custom scripts for these things makes it more difficult to move configurations between systems. Furthermore the keyboard can be paired to multiple (3) systems at the same time, the F1-3 keys can be used to select a system, similar to a KVM switch.

The most minimal and easy to use tools I could find, are included in the test suite of the BlueZ package. Unfortunately, these are not packaged for all I know, so installing or using these scripts is impractical. But, as these scripts are only needed for pairing once, I think they are a nice solution anyway. The advantage over other options, is that the scripts are updated with the bluez software itself, which causes the same scripts (well, different versions) to work regardless of changes to the bluez API.

Getting the scripts from the bluez test-suite that matches the available version in Fedora or RHEL, can be done with yumdownloader from the yum-utils package (all as normal unprivileged user):

$ yumdownloader --source bluez

Extract the source RPM by installing it:

$ rpm -ivh bluez-4.66-1.el6.src.rpm

Extract the sources which include the test-suite:

$ rpmbuild --nodeps -bp ~/rpmbuild/SPECS/bluez.spec
Note that the --nodeps parameter is used. The -bp argument causes all BuildRequires dependencies to be checked, most of them are not needed for the test-suite scripts.

After extracting the sources successfully, the test-suite is located under the BUILD directory:

$ cd ~/rpmbuild/BUILD/bluez-4.66/test/

Everything is now ready for pairing, so put the keyboard in discovery mode and scan for it:

$ sudo hcitool scan
Scanning ...
 00:1F:20:3C:A2:03 Logitech K760

The keyboard will need ao authenthicate to the system. simple-agent can be used for that, like this:

$ sudo ./simple-agent hci0 00:1F:20:3C:A2:03
DisplayPasskey (/org/bluez/2117/hci0/dev_00_1F_20_3C_A2_03, 716635)
New device (/org/bluez/2117/hci0/dev_00_1F_20_3C_A2_03)
The simple-agent script will wait for a response of the keyboard, press the PIN that is shown (here 716635) and hit enter.

Obviously the keyboard is a device that supports the input-class. Hence test-input can be used to setup the connection:

$ sudo ./test-input connect 00:1F:20:3C:A2:03

If this worked without error message, mark the keyboard as a trusted device. This will make it possible for the keyboard to connect to the system without requesting for approval:

$ sudo ./test-device trusted 00:1F:20:3C:A2:03 yes

After these steps, verify that the keyboard connects automatically after a reboot. This worked for me on my RHEL-6 laptop, and a cubieboard installed with Fedora 18 ARM.

Sunday, March 17, 2013

Use dnsmasq for separating DNS queries

Automatic network configuration with DHCP is great. But if you need to use multiple separated networks at once, it gets more difficult pretty quickly. For example, my RHEL-6 laptop

  1. connects through wifi to the network at home, which provides internet access
  2. accesses remote systems connected via a VPN
  3. and manages virtual machines that need access to any of those

Now, when NetworkManager connects to the VPN, the DNS-servers for the VPN are added to /etc/resolv.conf with a higher priority than the home network one. This is fine in a lot of circumstances, but that means all domain name
service lookups will go through the VPN first. That's not optimal, and the administrator of the VPN does not need to see all the hostname lookups my laptop is doing either. Also, any lookups for the local network will go through
the VPN, fail there and are retried with the next DNS-server, making queries for the LAN slower than all the others.

The solution sounds simple: Only use the DNS-servers on the VPN for lookups for resources that are on the VPN.

Unfortunately, the configuration is not that simple if it needs to work dynamically. The main configuration file that contains DNS-servers (/etc/resolv.conf) does not have any options to tell that some DNS-servers are to be used for certain domains only. A workaround for this limitation is to use a DNS-server that supports filtering and relaying queries, and have it listen on localhost. This DNS-server is configured in /etc/resolv.conf, and any new network configurations (or removed ones) should not change the configuration in /etc/resolv.conf, but the local DNS-server instead.

This means that my /etc/resolv.conf looks like this:

The minimal /etc/resolv.conf file is also saved as
/etc/resolv.conf.dnsmasq, which is used as a template for restoring the configuration when a VPN service (like OpenVPN) modified it.

The DNS-server for this setup became dnsmasq. This piece of software was already installed on my laptop as a dependency of libvirt, and offers the simple configuration that this setup can benefit from. For this setup, the
libvirt configuration of dnsmasq is not touched, it works fine and with its integrated DHCP-server I am not tempted to break my virtual machines (not now, and not when I install updates).

The configuration to let dnsmasq listen on localhost, and not intervene with libvirt that listens on virbr0 is very minimal as well. My preference is to prevent big changes in packaged configuration files as these may become difficult to merge with updates, so the only change in /etc/dnsmasq.conf that is required is this (newer versions seem to have this by default):
# Include a another lot of configuration options.

An additional file in the /etc/dnsmasq.d directory suffices,  /etc/dnsmasq.d/localhost.conf:

The default configuration file /etc/dnsmasq.conf contains a good description of these options. It is not needed to repeat them here.

Enabling dnsmasq to start at boot is a prerequisite, otherwise any lookup that uses DNS-servers will fail completely. On my RHEL-6 system, I needed to enable starting of dnsmasq with /sbin/chkconfig dnsmasq on, start
the service with /sbin/service dnsmasq start.

With this current configuration, only hostnames and IP-addresses that are in /etc/hosts are being resolved. Which means, it is difficult to create any network connections outside of the laptop. The next step is to integrate the available connected networks with the dnsmasq configuration.

NetworkManager is used to configure the network on my laptop. This is convenient as it supports WLAN and can connect to the VPN. In order to teach it to write a dnsmasq configuration file for each network that gets setup, I used
an event script, /etc/NetworkManager/dispatcher.d/90-update-resolv.conf:
# NetworkManager dispatcher script to prevent messing with DNS servers in the
# LAN.
# Author: Niels de Vos 


function write_dnsmasq_header
 if [ ! -e ${DNSMASQ_RESOLV} ]
  echo "# ${DNSMASQ_RESOLV} generated on $(date)" > ${DNSMASQ_RESOLV}
  echo "# Generator: ${0}" >> ${DNSMASQ_RESOLV}
  echo "# Connection: ${CONNECTION_UUID}" >> ${DNSMASQ_RESOLV}

function create_dnsmasq_config_env
 local NS


  echo "server=${NS}" >> ${DNSMASQ_RESOLV}

function create_dnsmasq_config_from_resolv_conf
 local NS
 local DOMAIN=""


 DOMAIN=$(awk '/^domain/ {print $2}' /etc/resolv.conf)
 [ -n "${DOMAIN}" ] && DOMAIN="/${DOMAIN}/"

 for NS in $(awk '/^nameserver/ {print $2}' /etc/resolv.conf)
  # make sure the NS is not from an other config
  grep -q "[=/]${NS}\$" /etc/dnsmasq.d/resolv-*.conf && continue

  echo "server=${DOMAIN}${NS}" >> ${DNSMASQ_RESOLV}

function remove_dnsmasq_config

function remove_stale_configs
 local CONF
 local UUID

 for CONF in /etc/dnsmasq.d/resolv-*.conf
  # in case of a wildcard error
  [ -e "${CONF}" ] || continue

  UUID=$(awk '/^# Connection: / {print $3}' ${CONF})
  if ! ( nmcli -t -f UUID con status | grep -q "^${UUID}\$" )
   rm -f ${CONF}

function reload_dnsmasq
 cat /etc/resolv.conf.dnsmasq > /etc/resolv.conf
 [ -n "${DHCP4_DOMAIN_SEARCH}" ] && echo "search ${DHCP4_DOMAIN_SEARCH}" >> /etc/resolv.conf
 # "killall -HUP dnsmasq" is not sufficient for new files
 /sbin/service dnsmasq restart 2>&1 > /dev/null

case "$2" in

This script will write a configuration file like /etc/dnsmasq.d/resolv-0263cda6-edbd-437e-8d36-efb86dcc9112.conf:
# /etc/dnsmasq.d/resolv-0263cda6-edbd-437e-8d36-efb86dcc9112.conf generated on Sun Mar 17 11:57:26 CET 2013
# Generator: /etc/NetworkManager/dispatcher.d/90-update-resolv.conf
# Connection: 0263cda6-edbd-437e-8d36-efb86dcc9112

The generated configuration file for dnsmasq simply states that there is a DNS-server on, which can be used for any query. When the configuration has been written, the dnsmasq daemon is sent a SIGHUP which causes it to reload its configuraion files.

After connecting to a VPN, an other partial configuration file is generated. In this case /etc/dnsmasq.d/resolv-ba76186a-9923-4756-aa8a-19706a4d273c.conf:
# /etc/dnsmasq.d/resolv-ba76186a-9923-4756-aa8a-19706a4d273c.conf generated on Sun Mar 17 11:57:41 CET 2013
# Generator: /etc/NetworkManager/dispatcher.d/90-update-resolv.conf
# Connection: ba76186a-9923-4756-aa8a-19706a4d273c

Similar to the main WLAN connection, this configuration contains two DNS-servers, but these are to be used for the network only.

For me this works in the environments I visit, wifi at home, network cable connected docking station, and several other (non-)public wireless networks.

Wednesday, January 23, 2013

Changing vim settings depending on the git repository containing the file

Not all projects I am regularly working on use the same CodingStyle. This is very unfortunate, and sometimes makes it time consuming to provide acceptable patches. One common example is that some project indent with a <tab>, where others expect <4-spaces>.

I could not find a vim plugin or other extension that lets me pick vim-settings per git repository. The idea that I came up with, is to set the project specific vim-settings in the git-config itself. For example:

$ git config --add vim.settings 'tabstop=4 expandtab'

Now, in my ~/.vimrc, I have the following snippet:

let git_settings = system("git config --get vim.settings")
if strlen(git_settings)
    exe "set" git_settings

Editing a file in the git repository that contains the above vim.settings, now replaces my <tab> by <4-spaces>. Other repositories that do not have the vim.settings will fall-back to my vim defaults. If you can think of any improvements or suggestions, please share them.