Plan for Unplanned Work: Game Days with Chaos Engineering

About the Talk and Speaker(s)

Mandi Walls

Mandi Walls

Developer Advocate

PagerDuty

Plan for Unplanned Work: Game Days with Chaos Engineering

Jan 24, 2024 4:45 PM(GMT)

Flipkart is the largest e-commerce platform in India. The Flipkart infrastructure follows a hybrid cloud strategy where our internal cloud platform powers a significant part of our workloads along with public clouds. This cloud platform is powered by our multi-DC infrastructure that provides virtual machines and bare metals servers. While a large part of our serverless workloads is on Kubernetes, the large fleets of VMs and BMs power our stateful data platforms and clusters. As part of our drive to achieve better resilience with chaos experiments, we explored several chaos tools. What we discovered was that most of the tools while working well with cloud-native workloads, were not suitable for running chaos experiments against stateful workloads running on our servers. This led us to invest in building a Chaos Platform that can perform chaos experiments against our server fleets. Our strategy was to completely use existing open-source tools including a few existing chaos products, and mix and match different tools, and integrate them to build the features that we needed for our chaos drills. The talk will open up with an introduction to chaos practices, requirements, and how we evaluated various open-source products for our chaos needs. Then we will move on to our VM Chaos solutions and how we brought together different open-source tools to build our own Chaos platform.

Building a Chaos Platform for Virtual Machines with OpenSource Tools
Chaos on Serverless Computing
Chaos Carnival
JAN 24 - 25, 2024VirtualVirtual
Chaos Carnival