Move fast, and test, test, test things

CI/CD
DevOps

By Danny Bradbury

November 12, 2018

Mark Zuckerberg famously advised people to move fast and break things. It sounds good on paper, until you move so fast that one of your software bugs shares 14 million users’ most private thoughts.

Real pros move fast, typically by using a DevOps pipeline. They also break things, but they spot that they’re broken before they get to deployment. That requires close attention to testing. How can DevOps teams build a robust testing environment?

Speed needs testing

Ideally, DevOps teams will go through several phases of maturity. Continuous Integration gets all developers merging their working code to a shared mainline repository several times a day, with an automated process running unit and integration tests on the code and identifying problems that are then fed back to developers.

Then, the next stage is Continuous Delivery. This goes one step further, running acceptance tests on the code before allowing DevOps teams to deploy manually to production.

The ultimate promise of DevOps is Continuous Deployment, which makes this last step automatic. Code runs through a battery of automated tests, and assuming that it passes everything, the system automatically deploys. In theory, this enables companies to constantly update and deploy the code base as frequently as dozens of times each day, introducing new features, fixing bugs, and paying down technical debt. That’s the dream, but it takes serious cojones. Who is brave enough to trust their entire development process so much that they let the pipeline deploy it automatically?

“The faster you go, the more likely you are to make mistakes,” says Stephen Reid. He is head of education at Sparta Global, a training company set up by software testing service provider Testing Circle. The consultancy was having trouble finding professional testers, so it created the training branch to slurp up university grads and turn them into testing ninjas.

“Testing is integral to what you do in DevOps,” Reid says. “Without a strong testing ethos at every stage, you either have to slow everything down by going back to manual testing or you risk throwing rubbish into production.”

Sam White, tech lead at digital transformation consulting firm Red Badger, told us about one well-known SaaS firm that is pulling back from Continuous Deployment, moving to a less frequent release cycle until it can nail its automated DevOps testing properly.

“They are stopping Continuous Delivery of new code soon because they were delivering too fast,” he says. “They had automated testing in their pipeline but it isn’t comprehensive enough to stop some bugs getting through.”

So, how can DevOps teams enhance and automate their testing to get them further down the road to Continuous Deployment, even if they don’t end up making everything automatic?

The testing tools themselves will be the same language-specific ones used in any development process. For unit testing, Ruby coders might use RSpec, while Java shops might use tools like JTest and JUnit. Python testers might use UnitTest or Nose.

It’s the automation part that changes testing in a DevOps environment, and that begins with a mature pipeline. You’re looking for tools that manage the entire software development flow from development to post-deployment testing. The likes of Jenkins, CircleCI and Travis CI handle this part.

These tools must be good at automating potentially hundreds of thousands of unit tests, and executing them as quickly as possible. Dave Konopka, site reliability engineer at DevOps services firm ReactiveOps, says that building concurrent testing into DevOps pipelines is a useful way of speeding up a process that can take hours or even days to complete.

“You’ll want to run those all at the same time, so that segment happens faster,” he says. “It’s easily defined in a YAML file. Deploy that to GitHub and then Circle knows how to do that.”

One of the big differences lies in whether you want your tests run on your own servers or not. Konopka uses CircleCI because it is a cloud-based system and can run the pipeline and testing system without a dedicated in-house server.

Project gating – ensuring that tests pass before allowing code to move further along the pipeline – is an important part of the Continuous Integration process. Zuul, a tool just adopted as a top-level project by the OpenStack Foundation, also addresses this issue. It, too, allows for concurrent testing, managing the order of tests and co-ordinating testing to manage dependencies across different projects’ pipelines.

What to test

So much for how to run tests with your DevOps pipeline. The other important piece of the puzzle is working out what kind of tests to run. Unit tests – tests for function or class method-level code – are the bedrock of any testing process. They are fast, producing clear results that make root cause analysis easier.

There are higher-level tests such as integration tests (testing the end-to-end functionality of collected components) and user acceptance tests (UAT). Often, testers will use headless browser systems such as Selenium to simulate users accessing a web interface.

“A while back these tools to emulate end users became popular but because they’re so slow they have fallen out of favour,” says Reid. “People had written heavily UAT-based suites, but now we’re finding that you get faster feedback if you switch that and have more unit tests and fewer UAT tests.”

This aligns with Martin Fowler’s comments on the testing pyramid, where he calls high-level tests like these a second line of defence.

We shouldn’t abandon these high-level tests altogether though, warns Red Badger’s White. “There are disagreements in the industry about the balance of testing. A lot of people still have the opinion that the best way to achieve good quality assurance is to have some level of manual delivery in your testing process.”

One example of this is exploratory testing, in which manual testers think outside the box, doing unexpected things with the product to try and find edge cases. Automating that would be tough. One innovative company in the manual testing space, RainforestQA, takes a crowdsourced approach to this and other types of higher-level test. It uses machine learning to work out how many human testers should work on a product and then farms the tests out concurrently, delivering fast results back to the testing team that integrate into their pipeline.

None of the tests mentioned thus far are specific to DevOps, but one class of test lends itself particularly well to DevOps processes that tie production and development together. Reid calls it the operational acceptance test (OAT), and it tests the production environment to ensure that it can support the application under stress. If a server goes down, will another server pick the application up? He cites Netflix’ Chaos Monkey, which goes around breaking VMs at random, as a good example of a tool in this class.

Monitoring

There’s one last element of testing that no DevOps team should leave out, and that’s monitoring. Some tests are difficult to automate, Reid points out, recalling one job where a team only spotted an error after deployment.

Someone had changed the colour of an ecommerce checkout button to be the same as the page background, making it practically invisible. The only reason that the team caught it was because its monitoring system included a ‘revenue per hour’ metric that dropped substantially and got flagged in the pipeline.

An effective monitoring system that tracks KPIs like these can help DevOps teams to spot bugs quickly and roll back releases or quickly fix them. “There are tools like New Relic which are great for that,” he says .

Testing has often been a trouble spot for software developers. Many is the team that has left testing until the last minute and then made the junior member of staff do it. In DevOps, that’s no longer an option.

By integrating testing properly into your DevOps pipeline, you can help speed up your development cycle while keeping the number of bugs relatively low. That isn’t the only step needed to produce fast, low-error software development and deployment, though. The other is reining back on the hubris and being very careful before you remove all the manual checks.