550 regression tests in 4 minutes with Joyent Manta

November 01, 2013 - by TJ Fontaine

To find a performance regression between Node v0.10 and v0.11, I used Joyent Manta instead of git-bisect to find the offending commit(s) among all 550 commits of our development branch in under 4 minutes. Thankfully the result was bimodal and pointed to a single commit. Here's how you can use Manta to parallelize your development process to quickly identify a regression in your project.

Software inevitably has bugs. We're only human. Hopefully you have a test suite with plenty of coverage such that when you introduce a new bug you'll notice it pretty quickly. Of course, you need to be running your test suite regularly to make sure you notice when a regression was introduced.

However, even with the most diligent and dutiful of engineers and test suites, there may be times when a bug or regression finds its way into your code base. When this happens there are a few ways you can solve it:

  • You are very aware of all changes coming into the system and can therefore easily identify which change introduced the regression and back it out
  • Use bruteforce/hunt-and-peck strategy and revert commits that may or may not be related until you find the offending commit
  • Use git-bisect to do a binary search between a known good and a known bad commit and find the exact commit where the regression was introduced


It's almost always to your benefit to use git-bisect, as you will only be checking against a subset of the potential commits that might have introduced the regression. As you remember from your algorithms class, a binary search is O(log n), because at each step you are cutting the size of possible search points in half. To work with git-bisect you tell it the last commit sha you knew was good, and the first commit sha you know is bad, git will then split the set of commits in half and ask you if the commit you're on now is good or bad. If you say it's good it splits the remaining commits between it and the last bad commit in half, if you say it was bad it splits the remaining commits between the last good commit in half, and then moves the checkout to that point and asks you again. So on and so forth, until you find your offending commit. You can even supply a test script that can be executed, and if the exit code is 0 git-bisect will assume the commit was good, and if the exit code was non-zero assume it was bad.

There are a few problems with using git-bisect though. You may mistakenly answer wrong, or you may find out the build was broken for other reasons at that commit. Which results in you wasting time only to never find the offending commit, or worse believe you had found the appropriate commit only to be chasing a ghost.

Ultimately though, for every step you need to do your prep work, i.e. build your project. In the case of Node, that means recompiling the source tree along each set of commits. So while you may have a beefy workstation, it can still take a lot of time to find your answer.

Enter Joyent Manta. Manta is a parallel compute service that is backed by an object storage service. Compute jobs are expressed with Unix commands and pipeline semantics where input is passed in via stdin and chained to the next phase via stdout. A job may have many phases of which there are two kinds: map and reduce. Parallelization is primarily introduced in map phases.

Cache your build artifacts

As part of the continuous integration that we do for Node.js, we cache the build artifacts (the results of make install) for every commit on the master branch. Note that jenkins when triggered by a push to GitHub will only build what's changed since the last time a job was run. So if you want per-commit granularity you'll need to manage that on your own.

I use a cron job that launches this script, which keeps track of commits that have been scheduled to be built and what new commits have come in since then and consequently schedules more builds. The commands that start with 'm' are Manta CLI commands (e.g. mput = put a file into manta):

## We only care about the commits since we branched off v0.10,

cd /var/tmp/node
git fetch -q

## Sometimes people force-push, don't let that ruin your day.
git reset -q --hard origin/master

## get the list of commits for this branch and preserve the ordering
git log --first-parent --pretty="%H" $START_COMMIT..HEAD > order
mput -qf order /NodeCore/public/builds/node/order

## sort these so comparing what we've already built to what's outstanding is useful
sort < order > known

## find the commits that have been previously scheduled
mfind --type d --maxdepth=1 /NodeCore/public/builds/node | \
  xargs -I{} sh -c 'basename {}' | sort > built

## we now know the commits that we haven't yet built
comm -13 built known > tobuild


## trigger a jenkins job for each commit
xargs -I{} sh -c 'mmkdir -p /NodeCore/public/builds/node/{} &&
  curl -sS "$JENKINS_URL&GIT_COMMIT={}"' < tobuild

If you don't already have your builds cached in manta, you could (of course) use manta itself to build all the commits with something like:

git clone git://github.com/joyent/node
cd node
git log --first-parent --pretty="%H" $START_COMMIT..HEAD > order
mmkdir -p ~~/public/builds/node
xargs -I{} sh -c 'echo | mput ~~/public/builds/node/{}'
mfind -t o ~~/public/builds/node | mjob create \
 --init 'git clone git://github.com/joyent/node' \
 -m 'COMMIT=$(basename $MANTA_INPUT_OBJECT);
     cd node &&
     git checkout $COMMIT &&
     ./configure --prefix=/build &&
     make -j8 &&
     make install &&
     mmkdir -p ~~/public/builds/node/$COMMIT &&
     tar cj /build | mput ~~/public/builds/node/$COMMIT/build.tar.bz2'

Parallel Regression Tests

A recent regression we had in Node, involved trying to figure out when the following code snippet started to take longer to run on master than on v0.10.

mput ~~/public/test.js <

In this example, it's just creating 10K vm contexts. So for every commit we have previously built, let's find out how long it takes to run that script.

mjob create -o \
## first phase, ask manta to find all of our builds, lower latency then us doing it
## only grab our builds for ia32 smartos and use that name as a key to the next phase (mcat)
  -r 'mfind /NodeCore/public/builds/node -n build.tar.bz2 |
      grep smartos | grep ia32 | xargs mcat' \
## include the test script as an asset
  -s '~~/public/test.js' \
## extract build, set path to that node, grab wall time to run script
  -m 'tar xjf ${MANTA_INPUT_FILE}; export PATH=$PWD/build/bin:$PATH;
      ECODE=$(ptime -m ctrun -i core,signal -l child
         node /assets/$MANTA_USER/public/test.js 2>&1 |
         grep real | awk "{ print $2 }");
      echo "$(echo ${MANTA_INPUT_FILE} | cut -f7 -d/) ${ECODE}"' \
## include the original commit order
  -s /NodeCore/public/builds/node/order \
## include a script to sort the results back into commit order
  -s /NodeCore/public/builds/shasort.js \
  --init 'npm install lstream' \
## shasort reads on stdin and sorts it into the given order
  -r 'node /assets/NodeCore/public/builds/shasort.js /assets/NodeCore/public/builds/node/order' < /dev/null

Now, this isn't very scientific. Generally when doing a benchmark (especially in a virtualized environment) you're going to want to run your script multiple times in a row and do some statistical analysis to make sure you're getting significant and accurate results. But just as a first pass, let's see if we can identify any anomalies.

To reiterate, we've run this test script across every build we have (at the time I ran this 550 commits) and it completed in total time just under 4mins.

58e4edaf6855025099d400ccc1ac23291b109a41 11.688058687
82ff891e226ecadde68d000a12e7eb1fd0a17d13 10.348681261
fe176929c2963d1e48d34e8f2cb367d9801395a0 12.325443068
0181fee411e217236c4ec0bf22c61466df5a56b5 11.555078847
7684e0b554c7d7ee007959e250700473f64c9fa6 -
d2d07421cad4a20778bf591e279358dd0442382e -
0693d22f86de01b179343cc568a5609726bef9bb -
c56a96c25cabf40801a800c76b44f08d94ac839b -
8985bb8bfd0c4b9fa8dcf001306f1cf7e6c886b4 -
110a9cd8db515c4d1a9ac5cd8837291da7c6c5ea -
9b3de60d3537df657e75887436a5b1df5ed80c2d -
588040d20d87adc1dced78a3c7243b0a27ae8ec5 -
704fd8f3745527fc080f96e54e5ec1857c505399 -
eec43351c44c0bec31a83e1a28be15e30722936a 7.490435929
f0a05e4bc39beb1a15b34dfe906fed3be37c5ac8 5.234022189
28609d17790215ae1b7c7c59e8157ea92cd7cf2f 8.670257433
71ade1c212365099dccb16ee7a9094261629c35a 5.993378213

That output is significantly abbreviated. The first column is the commit sha1 and the second is the time it took to run the test script. You'll notice that for some of these commits there's an - in the timing field, that means we don't have a build for that commit (because the build was broken).

So we can see a series of commits where the build was broken, and on one side of the branch history we see the test script taking 5 ~ 8 seconds to run, and on the other side 10 ~ 12 seconds. It's not always this easy, but if you look at the log for commit 704fd8f you can notice that this is when we upgraded to v8 3.20. That's pretty damning evidence. And sure enough, if I checkout before and after that commit on my local machine I can see that indeed that is the offending commit.

Other uses

One of the fun parts about having Manta at your disposal is that you never know just what use case will pop up next. Going forward, the Node team will be using more of these cached builds for regression testing and benchmarking. That will translate into a better Node experience for everyone.

Tangentially, I've been working on a node-bisect script, which will allow core developers and community members alike to do more traditional bisect work over our commit history without having to build Node for each commit. That is, if you're on linux and want to find out which commit in our development branch broke your application? No problem, the script will just download the binaries for the commits you're on, run your test case, and reduce to the answer you need. Stay tuned.

Like many things that are painfully manual in testing, finding a regression is generally simple, but time consuming to execute. Using a parallel, automated process makes sense no matter the size of your project.