Thursday, December 2, 2010

Automating EBS Snapshot validation with @fog - Part 2

This is part 2 in a series of posts I'm doing - You can read part 1 here

Getting started

I'm not going to go into too much detail on how to get started with Fog. There's plenty of documentation on the github repo (protip: read the test cases) and Wesley a.k.a @geemus has done some awesome screencasts. I'm going to assume at this point that you've at least got Fog installed, have an AWS account set up and have Fog talking to it. The best way to verify is to create your .fog yaml file, start the fog command line tool and start looking at some of the collections available to you.

For the purpose of this series of posts, I've actually created a small script that you can use to spin up two ec2 instances (m1.small) running CentOS 5.5, create four (4) 5GB EBS volumes and attach them to the first instance. In addition to the fog gem, I also have awesome_print installed and use it in place of prettyprint. This is, of course, optional but you should be aware.

WARNING: The stuff I'm about to show you will cost you money. I tried to stick to minimal resource usage but please be aware you need to clean up after yourself. If, at any time, you feel like you can't follow along with the code or something isn't working - terminate your instances/volumes/resources using the control panel or command-line tools. PLEASE DO NOT JUST SIMPLY RUN THESE SCRIPTS WITHOUT UNDERSTANDING THEM.

The setup script

The full setup script is available as gist on github - https://gist.github.com/724912#file_fog_ebs_demo_setup.rb

Things to note:

  • Change the key_name to a valid key pair you have registered with EC2
  • There's a stopping point halfway down after the EBS volumes are created. You should actually stop there and read the comments.
  • You can run everything inside of an irb session if you like.

The first part of the setup script does some basic work for you - it reads in your fog configuration file (~/.fog) and creates an object you can work with (AWS). As I mentioned earlier, we're creating two servers - hdb and tdb. HDB is the master server - say your production MySQL database. TDB is the box which will be running as the validation of the snapshots.

In the Fog world, there are two big concepts - models and collections. Regardless of cloud provider, there are typically at least two models available - Compute and Storage. Collections are data objects under a given model. For instance in the AWS world, you might have under the Compute model - servers, volumes, snapshots or addresses. One thing that's nice about Fog is that, once you establish your connection to your given cloud, most of your interactions are the same across cloud providers. In the example above, I've created a connection with Amazon using my credentials and have used that Compute connection to create two new servers - hdb and tdb. Notice the options I pass in when I instantiate those servers.

  • image_id
  • key_name

If I wanted to make these boxes bigger, I might also pass in 'flavor_id'. If you're running the above code in an irb session, you might see something like the following when you instantiate those servers: Not all of the fields may be available depending on how long it takes Amazon to spin up the instance. The above shot is after the instance was up and running. For instance, when you first created 'tdb', you'll probably see "state" as pending for quite some time. Fog has a nice helper method for all models call 'wait_for'. In my case I could do:

tdb.wait_for { print "."; ready?}

And it would print dots across the screen until the instance is ready for me to log in. At the end, it will tell you the amount of time you spent waiting. Very handy. You have direct access to all of the attributes above via the instance 'tdb' or 'hdb'. You can use 'tdb.dns_name' to get the dns name for use in other parts of your script for example. In my case, after the server 'hdb' is up and running, I now want to create the four 5GB EBS volumes and attach them to the instance:

I've provided four device names (sdi through sdl) and I'm using the "volumes" collection to create them (AWS.volumes.new). As I mentioned earlier, all of the attributes for 'hdb' and 'tdb' are accessible by name. In this case, I have to create my volumes in the same availability zone as the hdb instance. Since I didn't specify where to create it when I started it, Amazon has graciously chosen 'us-east-1d' for me. As you can see, I can easily access that as 'hdb.availability_zone' and pass it to the volume creation section. I've also specified that the volume should be 5GB in size.

At the point where I've created the volume with '.new' it hasn't actually been created. I want to bind it to a server first so I simply set the volume.server attribute equal to my server object. Then I 'save' it. If I were to log into my running instance, I'd probably see something like this in the 'dmesg' output now:

sdj: unknown partition table

sdk: unknown partition table

sdl: unknown partition table

sdi: unknown partition table

As you can see from the comments in the full file, you should stop at this point and setup the volumes on your instance. In my case, I used mdadm and created a RAID0 array using those four volumes. I then formatted them, made a directory and mounted the md0 device to that directory. If you look, you should now have an additional 20GB of free space mounted on /data. Here I might make this the data directory for mysql (which is the case in our production environment). Let's just pretend you've done all that. I simulated it with a few text files and a quick 1GB dd. We'll consider that the point-in-time that we want to snapshot from. Since there's no actual constant data stream going to the volumes, I can assume for this exercise that we've just locked mysql, flushed everything and frozen the XFS filesystem. Let's make our snapshots. In this case I'm going to be using Fog to do the snapshots but in our real environment we're using the ec2-consistent-snapshot script from Aelastic. First let's take a look at the state of the hdb object:

Notice that the 'block_device_mapping' attribute now consist of an array of hashes. Each hash is a subset of the data about the volume attached to it. If you aren't seeing this, you might have to run 'hdb.reload' to refresh the state of the object. To create our snapshots, we're going to iterate over the block_device_mapping attribute and use the 'snapshots' collection to make those snapshots:

One thing you'll notice is that I'm being fairly explicity here. I could shorthand and chain many of these method calls but for clarity, I'm not.

And now we have 4 snapshots available to us. The process is fairly instant but sometimes it can lag. As always, you should check the status via the .state attribute of an object to verify that it's ready for the next step. Here's a shot of our snapshots right now:

That's the end of Part 2. In the next part, we'll have a full fledged script that does the work of making the snapshots usable on the 'tdb' instance.

No comments: