Laborantin

A slave to run all your measurements.

Cheaper than an intern !

Introduction

Laborantin is a framework written in Ruby to quickly define and run long measurements with many parameters. It helps you for organizing all the results in a neat hierarchy. It uses flat files for results, and has a such does not require any database set-up or migrations sorcery. Maybe in the future some ORM will be supported, but it is definitely not a big need.

Paradigms

Laborantin let you define two main things: environments and scenarii.

A scenario is a measurement actually performed, it has parameters that you are interested in varying and results.

The other paradigm is the Environment, which is more or less what cannot be changed easily in your measurements (e.g. a given hardware part).

On one hand, the Environment will hold all the data that is not changing a lot during various measurements. On the other hand, every scenarii will have different parameters and should be uncorrelated from each other (in the sense that the order you performs them is not important).

Installation

You need to have Ruby and Rubygems installed. Usually, on a UNIX operating system you will issue the following command:

$ sudo gem install laborantin

Then you can try to simply run

$ labor
to have an information message. If you have a "command not found" like message error, it means that there was an issue when installing laborantin from the gem.

Workflow tutorial

Laborantin can be used as a simple set of libraries, but its power comes from the 'labor' script. It helps you create measurements environments and scenarii, export the results, scan a directory that you've run two weeks before to see what is in it and much more.

In our example, we will create a measurement set for RTTs from a Linux machine (A) to another machine (B). We will change the Ethernet cable between them, thus it is not handy to do in a scenario automatically, we will have two environments e10Mb and e100Mb for the two different speed cables.

We will simply measure the RTT between A and B from A for various ping size, thus we will have one scenario with one parameter.

Initialization

Let's create a measurement directory for Laborantin. With the environments needed and the scenario wanted.

$ labor create network_measurements \
	 -e e10mb,e100mb \
	 -s rtt

The output will show the generated files, there are a bunch of them. Some are explained in the README file generated, so you should read the README.

You can now change your directory to the created one.

$ cd network_measurements

Edition

Environments

There is not much edition to do to the environments, but let's edit environments/e10mb.rb . Once opened in your favourite text editor, you will notice that a file for a subclass of Laborantin::Environment has been created. You can change the description by replacing the string. Let's write something like "A is connected to B via a 10Mbps Ethernet cable". We let the other commented lines untouched.

Now, a thing that will not change a lot is the IP address of B. Thus you add an attribute to the class, and set-it at the initialization. Don't forgot to call the super method in your initialize.

describe "A is connected to B via a 10Mbps Ethernet cable"

attr_accessor	:target_ip

def initialize
  super
  @target_ip = '192.168.1.2'
end

Do the same with the other environment e100mb.rb .

Scenario

Now let's edit the scenario/rtt.rb file. Again it has some generated lines. Again, you can change the description to something meaningful for the RTT estimation.

describe "An RTT estimation with the regular ping command" 

Let's add a parameter with the parameter method. It needs a name, then a set of values in which the parameter will change for the various instances of this Scenario class. You should describe what you expect to observe by modifying this parameter, it's not mandatory but useful when you'll come back two months later on the same measurements.

parameter(:size) do
  values 10, 50, 100, 500, 1000
  describe "We expect the RTT to increase linearly with the size"
end 

Now you need to define the run method. This method is called by labor to actually perform the measurements. It should yield strings that will be appended to the raw result file. You can also yield a big output of external command, but it is recommanded to do it line by line when the result file is very large (to avoid storing everything in memory). In our case we have less than 1000 lines, so it does not accounts a lot.

def run
  cmd = "ping -c 100 -i 0.2 -s #{params[:size]} #{environment.target_ip}"
  log cmd
  ret = %x{#{cmd}}
  yield ret
end 

Finally, the command line output is not very convenient, we would like to parse it and have only the RTT values for each ping in a text file. Plus we would like to plot the CDF later, so we sould store the data in a file that Gnuplot understands. Let's compute it here. Like for the run method, you should yield line by line what has to be appended to the result files.

produces :parsed, :cdf

PING_REGEXP = /[^t]*time=(\d+\.\d+)/    

def parsed
  raw_result_file do |f|
    f.each_line do |l|
      if l =~ PING_REGEXP
	yield Regexp.last_match(1)
      end
    end
  end
end

def cdf
  yield "X Y"
  product_file(:parsed) do |f|
    values = f.map{|l| l.to_f}
    min = values.min
    max = values.max
    step = (max - min).to_f / values.size
    th = min - step
    while (th < (max + step))
      cnt = values.select{|v| v <= th}.size
      yield "#{th} #{(cnt.to_f / values.size) * 100}"
      th += step
    end
  end
end

Ok, that's enough for the code. Close your text editor you can describe what are the measurement plans with the describe command of the labor script.

$ labor describe
E10mb:
	A is connected to B via a 10Mbps Ethernet cable
E100mb:
	A is connected to B via a 100Mbps Ethernet cable
Rtt:
	An RTT estimation with the regular ping command
	- size: [10, 50, 100, 500, 1000]
		We expect the RTT to increase linearly with the size.

You now understand why it was worth describing all this stuff.

Running the experiments

Let's say you have plugged the 100Mbps Ethernet cable, you now want to run only this environment.

$ labor run -e e100mb

Now take a coffee and wait that the measurements are performed. You can see the debug informations on the terminal, you can see that Laborantin is iteratively performing every possible set of measurements and producing the results file. All these results are stored in the results/ directory.

Results are stored like that: a directory per environment, with this environment name. Under this, we have one directory per "date" the environments were created. This is useful when you run several time the measurements, thus it is more like an instance of a environment. It holds the environment.log that also was printed in the terminal. For each scenario class, again you have a dir, for each scenario instance a timestamped directory. In each scenario instance, you will find the configuration of parameters in YAML format, the raw results and all the products results.

You can use the "tree" command if it is installed in your system to see the arborescene easily.

[...]
|-- results
    |   `-- e100mb
    |       `-- 2009-Jul-28_12-08-36
    |           |-- environment.log
    |           `-- rtt
    |               |-- 2009-Jul-28_12-08-36
    |               |   |-- config.yaml
    |               |   |-- result.cdf.txt
    |               |   |-- result.parsed.txt
    |               |   `-- result.raw
    |               |-- 2009-Jul-28_12-08-56
    |               |   |-- config.yaml
    |               |   |-- result.cdf.txt
    |               |   |-- result.parsed.txt
    |               |   `-- result.raw
    |               |-- 2009-Jul-28_12-09-16
    |               |   |-- config.yaml
    |               |   |-- result.cdf.txt
    |               |   |-- result.parsed.txt
    |               |   `-- result.raw
[...]

You also can have a simple summary of the various environments that where ran before with:

$ labor scan
.

More to come

ftp upload of results etc.

Planned for the future

I plan to enhance Laborantin a bit, by adding some facility to compare several scenarii, maybe integrating tighly with a database or some statistics modules etc. I also would like to add "roles" in the environments to have a nice cooperation when more than one computer is involved.

Hacking

You can hack into the development version with the rubyforge git repository for Laborantin.

git://rubyforge.org/laborantin.git

Contacts

lucas dot dicioccio arobaze frihd dot net