Carnegie Mellon University

Stylized photo of a shattered pane of glass on a black background

April 26, 2022

Meiklejohn and DoorDash Use Fault Injection to Improve Reliability

By Joshua Quicksall

From Amazon to Netflix to eBay some of the largest and most complex software platforms in the world are only possible because of microservice architecture.

But this architectural choice— often adopted in order to improve feature delivery at scale — comes at a considerable cost. That cost is often made manifest in terms of a platform's reliability and resilience. When a single request relies upon possibly hundreds of disparate services, all of which are subject to individual failure, the odds of a system wide outage can become an issue of serious concern.

This is the challenge Christopher Meiklejohn, Software Engineering PhD student in the Institute for Software Research, aimed to address over the course of his 2021 internship with online food delivery service, DoorDash.

With millions of users and one of the most sophisticated microservice-driven platforms today, Doordash was the perfect testing ground for an automated resilience testing tool developed by Meiklejohn, and members of the Composable Systems Lab, as the subject of his Ph.D. thesis. That tool is called Filibuster.

Sound interesting? Head on over to the DoorDash blog to learn more about Filibuster and Chris’ work.