Debugging microservices? Beyond logging and into the woods

Debugging microservices? Beyond logging and into the woods

Microservices live and breath the Unix philosophy: they do one thing and they do it well.

The benefits gained from this focused functionality include, but are certainly not limited to, quicker execution times, faster and easier deployment processes, smaller code bases to manage and maintain, and easier to debug components.

We know microservices are powerful and we know how-to build them, but how do we debug them? What can we do when things don’t go according to plan?

Logging

Trusty old log files are often the first port of call when trouble strikes any system or network. They’ve been around since the beginning of time and they’ve not gone anyway or been replaced, even in the microservices world.

In most cases logging is handled for you, automatically, out of the box. With Google’s AppEngine you get automatic logs for HTTP requests, standard out, standard error, and everything else you’d expect. It’s the same situation on AWS Lambda too, with logs reaching CloudWatch Logs automatically.

If anything, logging in microservices is extremely simple and handled transparently. By writing to standard output, which is like printing the console, logs from your microservice are pushed to the provider’s logging platform automatically.

Another option is using a third party library for your logging needs, a lot of which now support logging “over the wire”, as opposed to simply writing to a static log file or standard out. This means your microservice can push directly to AWS S3 for archiving, AWS Kinesis for analyses of logging streams, an ElasticSearch cluster for indexing, or simply to a black hole if you don’t care (you should!)

In the event of some failure or problem, you want to be able to replicate issues based on known user input and this is where, as a collective, logging, tracing, and error reporting come into play.

Tracing

This is the meat and potatoes of observability when it comes to monitoring and debugging an application written using microservices.

Tracing is like observing a set of transactions on a bank account, except you also get information about who made the transaction, and how long it took each transaction to execute, and why.

Tracing requests in this manner, through to their conclusion, is a powerful way of observing what’s happening in a microservice or an entire system.

On an application level, across everything, you’ll be able to track and observe requests between microservices, databases, caches, and even external HTTP API calls.

On an individual microservice level it might not seem so useful at first, but being able to trace a request down the call stack enables you to determine if the happy path is being taken by your customers.

This means being able to determine which parts of your application are being called the most; which are the slowest and introduce bottlenecks; how the cache is helping to save time by reducing round trips to the database; and also how a customer’s request managed to end up down a not-so-happy path.

Tools like New Relic and AWS X-Ray will give you this kind of visibility They present the information like a bank statement, with each transaction following the next. All the default information that’s collected about the call is presented in the transaction for you to work with. If the information and metadata isn’t enough, you can add additional information to the transaction from within the microservices.

Tracing can also be used to monitor behaviour whilst trying to replicate a problem. In the event a customer does have a bad experience, repeating the customer’s journey and tracing the results through the system is an excellent way of finding and squashing bugs.

Tracing comes in many forms, like New Relic, Google’s Stackdriver, or AWS X-Ray and is integrated into the code base.

Be warned that integrating tracing solutions into your application does mean tightly coupling whatever solution you select into your microservice’s code. Most solutions require you to write code to implement their feature sets, which means calls to their services are sat right alongside your own business logic.

Error Reporting

Error reporting platforms, such as New Relic, Bugsnag, and Google’s Stackdriver Error Reporting can be used to capture and alert when an application error triggers. This is different to logging and tracing in that output from the microservice can explicitly be an error, such as an exception.

The key difference is errors can be very serious situations whereas logs and tracing are simply forms of observability. When an error occurs, you use logs and tracing to find the source. When an exception is thrown, an error reporting framework can be used to categorise, visualise and alert the problem.

These systems can be very helpful when you separate them from the logging mechanics listed above – only errors are reported here, reducing the amount you need to dig through.

A good use of error reporting systems is in microservices that don’t produce any logs. They do exist, and rightly so. It’s common to find situations where the log files of a microservice are simply uninteresting and wouldn’t be helpful at all, so instead only metrics and errors are captured.

This means when an alert for a microservice is triggered, it’s going to be an actual error.

Localised Dependencies

Good local development practices enable developers to discover bugs and problems ahead of time. This means working and testing locally, versus constantly pushing to a CI/CD stack. It also means being smart with technology and utilising everything you can to ensure you’re producing solid work.

As you introduce new microservices to your application, it will become clear that some services will have others as a dependency. An example of this might be a login session management service that relies on a caching service to look up session tokens. The latter is a dependency of the former. This kind of dependency is easy to resolve locally whilst developing using existing tools.

Using Docker and Docker Compose enables developers to deploy and update components locally. You’re given the opportunity to run any version of any components locally, composing the entire application and using your composition to test local changes for behavioural differences.

I’m a big fan of Google AppEngine, which enables developers to run a lightweight, local copy of the AppEngine stack for development needs. This greatly reduces the overall risk of writing buggy code and also prevents having to push code to CI to discover bugs in the real environment.

Combined with all your dependencies running locally, you can replicate production on your laptop and eventually push solid code that’ll (much more likely) pass review.

Conclusion

Overall the development process for microservices is different, but many of the methodologies around testing and debugging are the same. Microservices do make it easier, however. Having smaller components that can be tested in isolation or only alongside what they need to operate, such as other services, means that development iteration is overall a leaner and faster process.

And the services you’ll find in AppEngine, AWS Lambda, Heroku, and many other platforms will enable you to observe and debug problematic applications just as easily, if not more easily, than their monolithic cousins.