Solemn's Site
Buggy software since 200X
IPXWrapper testing infrastructure

Posted in programming on 06 Mar 2018 at 21:39 UTC

In 2014 I wrote a fairly comprehensive test suite for IPXWrapper, which tests it end-to-end, from the APIs through to the network traffic they generate and process. It depends on a meticulously configured set of Windows and Linux machines, which I had duplicated using several different versions of Windows.

Eventually bit-rot set in and some of the Windows VMs became unusable for quick testing; sat installing updates whenever I booted them, broke themselves in odd ways, etc. Also my workstation doesn't have enough RAM for Chrome and several Windows VMs at the same time. No machine does.

This was not ideal.

I wanted the test environments to run on a dedicated host, using fixed VM base images each time. And I wanted them to be self-contained so they could theoretically be parallelised. And I wanted it to be easy to deploy.

Initially I attempted to use Vagrant to automate provisioning of the test environment, because I use it to deploy various (Linux) test environments for my day job, which its pretty good at. Sadly, the Windows support is mediocre, in addition to WinRM itself being very hit and miss (in my experience), I gave up and started making my own wheels.

VirtualBox is easy to work from the command line and I had a glorified shell script setting up VMs, running the tests and tearing them down fairly quickly.

I quickly encountered a race condition which is supposedly in all versions of Windows that can corrupt a registry key necessary to login, once that happens, the system will not recover by itself. At times I could run the test suite and the Windows systems would break themselves over the hundreds of SSH logins of a single run. Enabling SSH connection multiplexing on the Linux system seemed to make the bug occur very rarely and greatly sped up the tests too.

The VM images are still all manually built, but now I have a documented process for configuring them and a collection of disk images which won't be changed by running the tests (or Windows installing updates, due to their limited lifespan).

The test script evolved to read the build tree as a tarball from standard input, unpack it within a temporary instance of the test environment, run the tests and then clean up, passing along the output and success/failure via exit status. This allows running it from a remote host in a single step, no separate uploading of code, etc.

Something cool I found during all this was that btrfs can create Copy-on-Write (CoW) copies of files. Creating the working (temporary) VM disk images from the master copies using cp --reflink massively sped up creating (and deleting) the VMs.

Below is an excerpt from running ipxtester remotely on a local tree:

$ tar -cf - ./ | ssh ipxtest@glados ./ipxtester test winXPx86
Waiting for 4 resource units...
--- Setting up environment
Running [cp] [--reflink] [/mnt/vmstore/ipxtest/images/ipxtest-director.vdi] [/mnt/vmstore/ipxtest/tmp/ipxtest-1016-director.vdi]...
Running [VBoxManage] [internalcommands] [sethduuid] [/mnt/vmstore/ipxtest/tmp/ipxtest-1016-director.vdi]...
UUID changed to: 99f02521-c9ac-4c18-84d1-3ac4e099bef2
Running [VBoxManage] [createvm] [--register] [--name] [ipxtest-1016-director] [--ostype] [Debian_64]...
<snip>
Running [VBoxManage] [startvm] [ipxtest-1016-dp2] [--type] [headless]...
Waiting for VM "ipxtest-1016-dp2" to power on...
VM "ipxtest-1016-dp2" has been successfully started.
Waiting for VM to boot...
<snip>
--- Running tests
tests/05-addr.t ........ 
ok 1 - addr32_in(00:00:00:00) => addr32_out()...
ok 2 - addr32_in(00:00:00:00) => addr32_string()... 
ok 3 - addr32_in(12:34:56:78) => addr32_out()...
<snip>
ok 27 - Concurrent DirectPlay clients can send a message to all players
1..27
ok
All tests successful.

Test Summary Report
-------------------
tests/30-eth-ipx.t   (Wstat: 0 Tests: 75 Failed: 0)
  TODO passed:   6, 31, 56
tests/30-ip-ipx.t    (Wstat: 0 Tests: 27 Failed: 0)
  TODO passed:   12
Files=9, Tests=455, 453 wallclock secs ( 0.64 usr  0.23 sys +  7.87 cusr 12.08 csys = 20.82 CPU)
Result: PASS
Connection to localhost closed.
--- Destroying environment
Powering off VM ipxtest-1016-director...
Powering off VM ipxtest-1016-main...
Powering off VM ipxtest-1016-dp1...
Powering off VM ipxtest-1016-dp2...
Deleting VM ipxtest-1016-director...
Deleting VM ipxtest-1016-main...
Deleting VM ipxtest-1016-dp1...
Deleting VM ipxtest-1016-dp2...

This also forms part of the Buildkite CI deployment for IPXWrapper, which I will probably write about sometime...


Comments

No comments have been posted

Comments are currently disabled