It turns out that that having lots of independent microservices being built by independent teams changes the way you need to build and run your system tests. Configuration management becomes crucial when you have so many moving parts to test and deploy.
We’ve actually been through a couple of iterations of our deployment pipeline structure. To start with, we ran all the new system tests against the whole system every time anything changed in any service anywhere.
Figure 1: First build pipeline structure we tried (foo, bar and baz are placeholder names for our actual service names, and of course, in reality, we had more like 20 or so different services).
If you look at Figure 1, you can see the first way we arranged our build pipeline. When anyone pushes to any of the services (foo, bar or baz service) it triggers a new build of ‘update-release-candidates’ which updates the list of all the latest versions of all the services. Then we test these versions of the services running at the same time, in all the all-system-tests pipeline. If they all pass, then the ‘deploy-to-staging’ pipeline becomes available to run (we actually trigger it manually so we don’t disturb other work going on there).
“Having lots of independent microservices changes the way you need to build and run your system tests.”
There were several problems with this structure. Firstly, the tests couldn’t keep up with the number of changes being generated by our developers. Quite often a pile of tens of changes would all go into the all-system-tests pipeline together, and it would be hard to work out which one had caused a test breakage. While the tests were broken, we got no new release candidates of the system at all, so everyone was blocked from deploying their changes.
One solution we tried for this problem was to improve our analysis tools. We had a couple of bright young summer students help us to improve our ‘dashboard’ web application to analyse failing tests (it’s free and open source by the way – available here).
When the tests fail, it shows you all the commits that could possibly have caused it, going all the way upstream in the pipeline. It then lets you assign the task of fixing the tests to someone appropriate. This did help quite a bit, it got much easier to work out what was happening, which service had a problem, and who should fix it.
Figure 2: Second pipeline structure we tried. The system tests are now divided up by feature area (“FeatureA”, “FeatureB” are placeholders for the actual feature names, and of course, in reality, we had a lot more than two of them).
We also divided up the system tests into several suites by feature. Mostly this was because the number of microservices was so large, you couldn’t run all of them on the same machine any more. So each featureX-system-tests run will start a subset of the microservices, and run tests against one functional area that can be handled by only those services. This helps us to pinpoint which functional area is broken, and therefore which team should fix them.
The trouble was, it didn’t seem to increase the number of release candidates very much. People still broke tests, and even though we could quickly discover who it was who should fix them, failing tests were still blocking our pipeline too often. It becomes easier to see why that is, if I draw the same pipeline diagram again, but highlighting which teams owned which parts (Figure 3).
Figure 3: Same diagram as figure 2, but with team ownership added (in reality our teams have much better names than ‘A’ and ‘B’ and we have a lot more of them).
“It’s all about understanding that we live in a microservices architecture now.”
It’s clear from Figure 3 that if Team B make a mistake in the ‘baz-service’ that breaks the ‘featureB-system-tests’ then this will block Team A from getting any new versions of their services into staging.
Independent Teams with shared tests
We had split up our functionality across many services. We had split up our developers into independent feature teams. The tests were formally owned by different teams. It wasn’t enough! The problem was that a mistake by one team in one service would block all the other teams from deploying their services. So we changed to a pipeline structure as in Figure 4.
Figure 4: Latest pipeline structure we’re using.
This change wasn’t as simple as it looks in the diagram. In system tests, because you’re testing whole features, you’ll necessarily be running several microservices. Team A needs to also use the services owned by Team B when they run their featureA-system tests. Crucially though, they don’t need the very latest version.
We created a new definition of the ‘release candidate’ version of a service. This is the version that has passed all the system tests owned by the team that develops it. In future, we might expand that definition to be more stringent, perhaps even that it is a version that is already released in production, but for the moment, a ‘release candidate’ has passed its own team’s portion of the system tests.
So when Team A is going to run the system tests, they use these ‘release-candidate’ versions of all the other teams’ services, together with the development versions of their own. If the tests pass, then these development versions of team A’s services get promoted to ’release candidate’ status, and the other teams start using them in their system tests.
Crucially, if team A introduces a change in one of their services that break their system tests, it should only stop team A from deploying their service. All the other teams keep running with the ‘release candidate’ version.
Of course, it can still happen that a release candidate version of Team A’s services can break the tests owned by Team B, and block them, but we think it should happen a lot less often. Team A’s services have at least been shown to pass some system tests. It also ought to be pretty straightforward for Team B to go back to using the previous release candidate version and get their tests passing again. In the meantime, Team A can improve their tests to catch this problem discovered by Team B.
“We’ve landed with a pipeline structure that is more in tune with a microservices architecture.”
Generally, it’s all about understanding that we live in a microservices architecture now, and microservices need to be compatible with both older and newer versions of the other services.
This move to decentralized system tests has been implemented quite recently, and I’m sure there are going to be further tweaks and improvements to this pipeline structure. As I mentioned before, the definition of ‘release candidate’ may be changed to ‘version in production’. We may also find it useful to run all the system tests again before release, against staging. Alternatively, we may discover we find no new bugs that way, and it’s not worth it. In any case, I’m confident the change I’ve just described is a useful one. What we have now is much more in tune with our microservices architecture and our feature-team organization.
Conclusions and lessons learned
Compared to when we had a monolith, our pyramid of tests is still the same shape, but each team has a separate slice of it to care for. The initial structure for system tests was actually pretty similar to how it would have looked in the days when we had a monolith to test. We ran all the tests against the latest versions of all the services. This didn’t work too well, one team could break some tests and block all the other teams from deploying.
After a few iterations, we’ve landed with a pipeline structure that is instead more in tune with a microservices architecture. The system tests are divided by team, and each team’s changes are tested together with the release candidate versions of all the other teams’ services. Generally, we feel confident that we’re moving in the right direction, and the automated test strategy we’ve chosen for our microservices architecture help us achieve our aim to continuously deliver a constantly improving Pagero Online to our customers.