Turbocharge DevStack with Raid 0 SSDs

Turbocharging DevStack

I wanted to turbocharge my development cycle of OpenStack running on Fedora 18 so I could be waiting on my brain rather then waiting on my workstation.  I decided to purchase two modern solid state drives (SSD) and run them in RAID 0.  I chose two Intel S3500 160 GB Enterprise grade SSDs to run in RAID 0.  My second choice was the Samsung 840 Pro which may have been a bit faster, but perhaps not as reliable.

Since OpenStack and DevStack mostly use /var and /opt for their work, I decided to replace only /var and /opt.  If a SSD fails, I am less likely to lose my home directory which may contain some work in progress because of the lower availability of RAID 0.

The Baseline HP Z820

For a baseline my system is a Hewlett Packard Z820 workstation (model #B2C08UT#ABA) that I purchased from Provantage in January 2013.  Most of the computer is a beast sporting an 8 core Intel Xeon 35-2670 @ 2.60GHZ running with Hyperthreading for 16 total cpus, Intel C602 chipset,  and 16 GB Quad Channel DDR3 ECC Unbuffered RAM.

The memory is fast as shown with ramspeed:

[sdake@bigiron ramspeed-2.6.0]$ ./ramspeed -b 3 -m 4096
RAMspeed (Linux) v2.6.0 by Rhett M. Hollander and Paul V. Bolotoff, 2002-09

8Gb per pass mode

INTEGER   Copy:      11549.61 MB/s
INTEGER   Scale:     11550.59 MB/s
INTEGER   Add:       11885.79 MB/s
INTEGER   Triad:     11834.27 MB/s
---
INTEGER   AVERAGE:   11705.06 MB/s

Unfortunately the disk is a pokey 1TB 7200 RPM model.  The hdparm tool shows a pokey 118MB/sec.

[sdake@bigiron ~]$ sudo hdparm -tT /dev/sda
/dev/sda:
Timing cached reads: 20590 MB in 2.00 seconds = 10308.76 MB/sec
Timing buffered disk reads: 358 MB in 3.02 seconds = 118.69 MB/sec

Using the Gnome 3 Disk Image Benchmarking tool show a lower average of 82MB per second, although this is also passing through the LVM driver:

bench-disk

Warning: I didn’t run this benchmark with write enabled, as it would have destroyed the data on my disk.

Running stack.sh takes 6 minutes:

[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...[|[/]^C[sdake@bigiron devstack]$ 
[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[/]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at http://192.168.1.20:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.20
done
stack.sh completed in 368 seconds

I timed a heat stack-create operation at about 34 seconds.  In a typical day I may create 50 or more stacks, so the time really adds up.

Turbo-charged DevStack

After installing two SSD devices, I decided to use LVM raid 0 striping.  Linux Magazine indicates mdadm is faster, but I prefer a single management solution for my disks.

The hdparm tool shows some a beast 1GB/sec throughput on reads:

[sdake@bigiron ~]$ sudo hdparm -tT /dev/raid0_vg/ssd_opt

/dev/raid0_vg/ssd_opt:
Timing cached reads: 21512 MB in 2.00 seconds = 10771.51 MB/sec
Timing buffered disk reads: 3050 MB in 3.00 seconds = 1016.47 MB/sec

I also ran the Gnome 3 disk benchmarking tool, this time in write mode.  It showed an average 930MB/sec read and 370MB/sec write throughput:

pic2

I ran stack.sh in a little under 3 minutes:

[sdake@bigiron devstack]$ ./stack.sh
Using mysql database backend
Installing package prerequisites...done
Installing OpenStack project source...done
Starting qpid...done
Configuring and starting MySQL...done
Starting Keystone...done
Configuring Glance...done
Configuring Nova...done
Configuring Cinder...done
Configuring Nova...done
Using libvirt virtualization driver...done
Starting Glance...done
Starting Nova API...done
Starting Nova...done
Starting Cinder...done
Configuring Heat...done
Starting Heat...done
Uploading images...done
Configuring Tempest...[|]
Heat has replaced the default flavors. View by running: nova flavor-list
Keystone is serving at http://192.168.1.20:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.20
done
stack.sh completed in 166 seconds

I timed a heat stack create at 6 seconds.  Comapred to the non-ssd 34 seconds, RAID 0 SSDs rock!  Overall system seems much faster and benchmarking shows it.

How we use CloudInit in OpenStack Heat

Many people over the past year have asked me how exactly to use CloudInit while the Heat developers have implemented OpenStack Heat.  Since CloudInit is the default virtual machine bootstrapping system on Debian, Fedora, Red Hat Enterprise Linux, Ubuntu and likely more distros, we decided to start with CloudInit as our base bootstrapping system.  I’ll present a code walk-through of how we use CloudInit inside OpenStack Heat.

Reading the CloudInit documentation is helpful, but it lacks programming examples of how to develop software to inject data into virtual machines using CloudInit.   The OpenStack Heat project implements injection in Python for CloudInit-enabled virtual machines.  Injection occurs by passing information to the virtual machine that is decoded by CloudInit.

IaaS paltforms require a method for users to pass data into the virtual machine.  OpenStack provides a metadata server which is co-located with the rest of the OpenStack infrastructure   When the virtual machine is booted, it can then make a HTTP request to a specific URI and return the user data passed to the instance during instance creation.

CloudInit’s job is to contact the metadata server and bootstrap the virtual machine with desired configurations.  In OpenStack Heat, we do this with three specific files.

The first file is our CloudInit configuration file:

runcmd:
- setenforce 0 > /dev/null 2>&1 || true
user: ec2-user

cloud_config_modules:
- locale
- set_hostname
- ssh
- timezone
- update_etc_hosts
- update_hostname
- runcmd

# Capture all subprocess output into a logfile
# Useful for troubleshooting cloud-init issues
output: {all: '| tee -a /var/log/cloud-init-output.log'}

This file directs CloudInit to turn off SELinux, install ssh keys for the user ec2-user, setup the locale, hostname, ssh, timezone, modify /etc/hosts with correct information and output the results of all cloud-init data to /var/log/cloud-init-output.log

There are many cloud config modules which provide different functionality.  Unfortunately they are not well documented, so the source must be read to understand their behavior.  For a list of cloud config modules, check the upstream repo.

Another file required by OpenStack Heat’s support for CloudInit is a part handler:

#part-handler
import os
import datetime
def list_types():
return(["text/x-cfninitdata"])

def handle_part(data, ctype, filename, payload):
if ctype == "__begin__":
try:
os.makedirs('/var/lib/heat-cfntools', 0700)
except OSError as e:
if e.errno != errno.EEXIST:
raise
return

if ctype == "__end__":
return

with open('/var/log/part-handler.log', 'a') as log:
timestamp = datetime.datetime.now()
log.write('%s filename:%s, ctype:%s\n' % (timestamp, filename, ctype))

if ctype == 'text/x-cfninitdata':
with open('/var/lib/heat-cfntools/%s' % filename, 'w') as f:
f.write(payload)

The part-handler.py file is executed by CloudInit to separate the UserData provided by the MetaData server in OpenStack.  CloudInit executes handle_part() for each part of a multi-part mime message which CloudInit doesn’t know how to decode.  This is how OpenStack Heat passes unique information for each virtual machine to assist in the orchestration process.  The first ctype is always set to __begin__. which triggers handle_part() to create the directory /var/lib/heat-cfntools.

The OpenStack Heat instance launch code uses the mime type of x-cfninitdata  In OpenStack Heat.  OpenStack Heat passes several files via this mime subtype each of which is decoded and stored in /var/lib/heat-cfntools.

The final file required is a script which runs at first boot:

#!/usr/bin/env python

path = '/var/lib/heat-cfntools'

def chk_ci_version():
    v = LooseVersion(pkg_resources.get_distribution('cloud-init').version)
    return v >= LooseVersion('0.6.0')

def create_log(path):
    fd = os.open(path, os.O_WRONLY | os.O_CREAT, 0600)
    return os.fdopen(fd, 'w')

def call(args, log):
    log.write('%s\n' % ' '.join(args))
    log.flush()
    p = subprocess.Popen(args, stdout=log, stderr=log)
    p.wait()
    return p.returncode

def main(log):

    if not chk_ci_version():
        # pre 0.6.0 - user data executed via cloudinit, not this helper
        log.write('Unable to log provisioning, need a newer version of'
                  ' cloud-init\n')
        return -1

    userdata_path = os.path.join(path, 'cfn-userdata')
    os.chmod(userdata_path, 0700)

    log.write('Provision began: %s\n' % datetime.datetime.now())
    log.flush()
    returncode = call([userdata_path], log)
    log.write('Provision done: %s\n' % datetime.datetime.now())
    if returncode:
        return returncode

if __name__ == '__main__':
    with create_log('/var/log/heat-provision.log') as log:
        returncode = main(log)
        if returncode:
            log.write('Provision failed')
            sys.exit(returncode)

    userdata_path = os.path.join(path, 'provision-finished')
    with create_log(userdata_path) as log:
        log.write('%s\n' % datetime.datetime.now())

This script logs the output of the execution of /var/lib/heat-cfn/cfnuserdata.

These files are co-located with OpenStack Heat’s engine process which loads these files and combines them plus other OpenStack Heat specific configuration blobs into one multipart mime message.

OpenStack Heat’s UserData generator:

    def _build_userdata(self, userdata):
        if not self.mime_string:
            # Build mime multipart data blob for cloudinit userdata

            def make_subpart(content, filename, subtype=None):
                if subtype is None:
                    subtype = os.path.splitext(filename)[0]
                msg = MIMEText(content, _subtype=subtype)
                msg.add_header('Content-Disposition', 'attachment',
                               filename=filename)
                return msg

            def read_cloudinit_file(fn):
                return pkgutil.get_data('heat', 'cloudinit/%s' % fn)

            attachments = [(read_cloudinit_file('config'), 'cloud-config'),
                           (read_cloudinit_file('part-handler.py'),
                            'part-handler.py'),
                           (userdata, 'cfn-userdata', 'x-cfninitdata'),
                           (read_cloudinit_file('loguserdata.py'),
                            'loguserdata.py', 'x-shellscript')]

            if 'Metadata' in self.t:
                attachments.append((json.dumps(self.metadata),
                                    'cfn-init-data', 'x-cfninitdata'))

            attachments.append((cfg.CONF.heat_watch_server_url,
                                'cfn-watch-server', 'x-cfninitdata'))

            attachments.append((cfg.CONF.heat_metadata_server_url,
                                'cfn-metadata-server', 'x-cfninitdata'))

            # Create a boto config which the cfntools on the host use to know
            # where the cfn and cw API's are to be accessed
            cfn_url = urlparse(cfg.CONF.heat_metadata_server_url)
            cw_url = urlparse(cfg.CONF.heat_watch_server_url)
            is_secure = cfg.CONF.instance_connection_is_secure
            vcerts = cfg.CONF.instance_connection_https_validate_certificates
            boto_cfg = "\n".join(["[Boto]",
                                  "debug = 0",
                                  "is_secure = %s" % is_secure,
                                  "https_validate_certificates = %s" % vcerts,
                                  "cfn_region_name = heat",
                                  "cfn_region_endpoint = %s" %
                                  cfn_url.hostname,
                                  "cloudwatch_region_name = heat",
                                  "cloudwatch_region_endpoint = %s" %
                                  cw_url.hostname])

            attachments.append((boto_cfg,
                                'cfn-boto-cfg', 'x-cfninitdata'))

            subparts = [make_subpart(*args) for args in attachments]
            mime_blob = MIMEMultipart(_subparts=subparts)

            self.mime_string = mime_blob.as_string()

        return self.mime_string

This code provides two functions:

  • make_subpart: Takes a list of attachments and creates mime subparts out of them
  • read_cloudinit_file: Reads OpenStack Heat’s CloudInit three files above

The rest of the function generates the UserData OpenStack Heat needs based upon the attachments list.  These attachments are then turned into a mime message which is passed to the instance creation:

       server_userdata = self._build_userdata(userdata)
        server = None
        try:
            server = self.nova().servers.create(
                name=self.physical_resource_name(),
                image=image_id,
                flavor=flavor_id,
                key_name=key_name,
                security_groups=security_groups,
                userdata=server_userdata,
                meta=tags,
                scheduler_hints=scheduler_hints,
                nics=nics,
                availability_zone=availability_zone)
        finally:
            # Avoid a race condition where the thread could be cancelled
            # before the ID is stored
            if server is not None:
                self.resource_id_set(server.id)

This snippet of code creates the UserData and passes it to the nova server create operation.

The flow is then:

  1. Create user data
  2. Heat creates nova server instance with user data
  3. Nova creates the instance
  4. CloudInit distro initialization occurs
  5. CloudInit reads config from OpenStack metadata server UserData information
  6. CloudInit executes part-handler.py with __start__
  7. CloudInit executes part-handler.py for each x-cfninitdata mime type
  8. part-handler.py writes the contents of each x-cfninitdata mime subpart to /var/lib/heat-cfntools on the instance
  9. CloudInit executes part-handler.py with __end__
  10. CloudInit executes the configuration operations defined by the config file
  11. CloudInit runs the x-shellscript blob which in this case is loguserdata.py
  12. loguserdata.py logs the output of  /var/lib/heat-cfn/cfnuserdata which is the initialization script set in the OpenStack Heat templates

This code walk-through will help developers understand how OpenStack Heat integrates with CloudInit and provide a better understanding of how to use CloudInit in your own Python applications if you roll your own bootstrapping process.

The Heat API – A template based orchestration framework

Over the last year, Angus Salkeld and I have been developing a IAAS high availability service called Pacemaker Cloud.  We learned that the problem we were really solving was orchestration.  Another dev group was also looking at this problem inside Red Hat from the launching side.  We decided to take two weeks off from our existing work and see if we could join together to create a proof of concept implementation from scratch of AWS CloudFormation for OpenStack.  The result of that work was a proof of concept project which provided launching of a WordPress template, as had been done in our previous project.

The developers decided to take another couple weeks to determine if we could get a more functional system that would handle composite virtual machines.  Today, we released that version, our second iteration of  the Heat API.  Since we have many more developers, and a project that exceeded our previous functionality of Pacemaker Cloud, the Heat Development Community has decided to cease work on our previous orchestration projects and focus our efforts on Heat.

A bit about Heat:  The Heat API implements the AWS Cloud Formations API.  This API provides a rest interface for creating composite VMs called Stacks from template files.  The goal of the software is to be able to accurately launch AWS CloudFormation Stacks on OpenStack.  We will also enable good quality high availability based upon the technologies we created in Pacemaker Cloud including escalation.

Given that C was a poor choice of implementation language for making REST based cloud services, Heat is implemented in Python which is fantastic for REST services.  The Heat API also follows OpenStack design principles.  Our initial design after our POC shows the basics of our architecture and our quickstart guide can be used with our second iteration release.

mailing list is available for developer and user discussion.  We track milestones and issues using github’s issue tracker.  Things are moving fast – come join our project on github or chat with the devs on #heat on freenode!

Corosync 2.0.0 released!

A few short weeks after Corosync 1.0.0 was released, the developers huddled for our future planning of Corosync 2.0.0.  The major focus of that meeting was “Corosync as implemented is too complicated”.  We had threads, semaphores, mutexes, an entire protocol, plugins, a bunch of unused services, a backwards compatability layer, multiple cryptographic engines.

Going for us, we did have a world class group communication system implementation (if not a little complicated) developed by a large community of developers, battle hardened by thousands of field deployments, tested by tens of thousands of community members.

As a result of that meeting, we decided to keep the good and throw out the bad, as we did between the openais and corosync transitions.  Gone are threads.  Gone are compatibility layers.  Gone are plugins.  Gone are unsupported encryption engines.  Gone are a bunch of other user-invisible junk that was crudding up the code base.

Shortly after Corosync 2.0.0 development was started, Angus Salkeld had the great idea of taking the infrastructure in corosync (IPC, Logging, Timers, Poll loop, shared memory, etc) and putting that into a new project called libqb.  The objective of this work was obvious:  To create a world-class infrastructure library specifically focused on the needs of cluster developers with a great built-in make-check test suite.

This helped us reach even closer to our goals of simplification.  As we pushed the infrastructure out of base Corosync, we could focus more on protocols/APIs.  You would be surprised to find that implementing the infrastructure took about as much effort as the rest of the system (APIs and Totem).

All of this herculean effort wouldn’t be possible without our developer and user community.  I’d especially like to acknowledge Jan Friesse in his leadership role of helping to coordinate the upstream release process and drive the upstream feature set to 2.0.0 resolution.  Angus Salkeld was invaluable in his huge libqb effort which occurred on time and with great quality.  Finally I want to thank Fabio Di Nitto for beating various parts of the Corosync code base into submission and his special role in designing the votequorum API.  There are many other contributors including developers and tested who I won’t mention individually, but I’d also like to thank for their improvements to the code base.

Great job devs!!  Now its up to the users of Corosync to tell us if we delivered on our objective we set out with 18 months ago – making Corosync 2.0 faster, simpler, smaller, and most importantly higher quality.

The software can be downloaded from Corosync’s Website.  Corosync 2.0, as well as the rest of the improved community developed cluster stack will show up in distros as they refresh their stacks.

Announcing Pacemaker Cloud 0.6.0 release

I am super pleased to announce the release of Pacemaker Cloud 0.6.0.

Pádraig Brady will be providing a live demonstration of Pacemaker Cloud integrated with OpenStack at FOSDEM.

 

What is pacemaker cloud?

Pacemaker Cloud is a high scale high availability system for virtual machine and cloud environments.  Pacemaker Cloud uses the techniques of fault detection, fault isolation, recovery and notification to provide a full high availability solution tailored to cloud environments.

Pacemaker Cloud combines multiple virtual machines (called assemblies) into one application group (called a deployable).  The deployable is then managed to maintain an active and running state in in the face of failures.  Recovery escalation is used to recover from repetitive failures and drive the deployable to a known good working state.

New in this release:

  • OpenStack integration
  • Division of supported infrastructures into separate packaging
  • Ubuntu assembly support
  • WordPress vm + MySql deployable demonstration
  • Significantly more readable event notification output
  • Add ssh keys to generated assemblies
  • Recovery escalation
  • Bug fixes
  • Performance enhancements

Where to get the software:

The software is available for download on the project’s website.

Adding second monitoring method to Pacemaker Cloud – sshd

Recently Angus Salkeld and I have decided to start working on a second approach to Pacemaker Cloud monitoring. Today we monitor with Matahari. We would also like the ability to monitor with OpenSSH’s sshd.  With this model, sshd becomes a second monitoring agent in addition to Matahari.  Since sshd is everywhere, and everyone is comfortable with the SSH security model, we believe this makes a superb alternative monitoring solution.

To help kick that work off, I’ve started a new branch in our git repository where this code will be located called topic-ssh.

To summarize the work, we are taking the dped binary and making a second libssh2 specific binary based on the work of the dped. We will also integrate directly with libdeltacloud as part of this work. The output of this topic will be the major work in the 0.7.0 release.

We looked at python as the language for dped, but testing showed that not to be particularly feasible without drastically complicating our operating model. With our model of running thousands of dpe processes on one system, one dpe per deployable, we would need python to have a small footprint. Testing showed that python consumes 15 times as much memory per dpe instance vs a comparable C binary.

We think there are many opportunities for people without a strong C skillset, but with a strong python skillset to contribute tremendously to the project in the CPE component. We plan to rework the CPE process into a python implementation.

If you want to get involved in the project today, working on the CPE C++ to python rework would be a great place to start!

Release schedule for Corosync Needle (2.0)

Over the last 18 months, the Corosync development community has been hard at work making Corosync Needle (version 2.0.0) a reality.  This release offers an evolutionary step in Corosync by adding several community requested features, removing the troubling threads and plugins, and tidying up the quorum code base.

I would like to point out the dilligent work of Jan Friesse (Honza) for tackling the 15 or so feature backlog items on our feature list.  Angus Salkeld has taken the architectural step of moving the infrastructure (ipc, logging, and other infrastructure components) of Corosync into a separate project (http://www.libqb.org).  Finally I’d like to point out the excellent work of Fabio Di Nitto and his cabal for tackling the quorum code base to make it truly usable for bare metal clusters.

The release schedule is as follows:

Alpha		January 17, 2012	version 1.99.0
Beta		January 31, 2012	version 1.99.1
RC1		February 7, 2012	version 1.99.2
RC2		February 14, 2012	version 1.99.3
RC3		February 20, 2012	version 1.99.4
RC4		February 27, 2012	version 1.99.5
RC5		March 6, 2012		version 1.99.6
Release 2.0.0	March 13, 2012		version 2.0.0