Photo

The Chaos Engineering Maturity Model A Roadmap to Operational Resilience

Sayan Mondal

from Harness (India)

About speaker

Senior Software Engineer II at Harness, Maintainer & LFX Mentor of LitmusChaos

Sayan Mondal is a Senior Software Engineer II at Harness, building their Chaos Engineering platform and helping them shape the customer experience market.

About speakers company

Harness is a modern software delivery platform that focuses on automating and simplifying Continuous Integration (CI) and Continuous Delivery (CD) pipelines, with a strong emphasis on reliability, efficiency, and ease of use. Known for its powerful suite of DevOps tools, Harness helps engineering teams streamline deployments, manage costs, and improve overall developer productivity. The platform includes modules for feature flagging, cloud cost management, chaos engineering, and more, making it a comprehensive solution for organizations looking to enhance their software delivery practices with automation and intelligence.

Abstracts

broad

Is your organization effectively measuring and improving resilience over time, or is chaos engineering just a set of experiments on your checklist? As technology stacks grow in complexity, so does the need for a structured approach to chaos engineering that moves beyond isolated tests. How do you ensure that chaos engineering is not just practiced but actually drives continuous resilience improvements across teams and infrastructure?

In this talk, we’ll explore a maturity model framework for chaos engineering that turns ad-hoc chaos experiments into a progressive journey toward organizational resilience. Attendees will learn how to evaluate, track, and level up their chaos engineering practices. This framework is designed to guide organizations from basic resilience tests to advanced fault-tolerant systems, essential for those striving for high reliability in complex environments.


As chaos engineering matures, there is a growing need for a structured approach to measure and advance its impact across organizations. Moving beyond a series of random failure simulations, chaos engineering can, and should, progress through identifiable levels of sophistication, resulting in resilience improvements that align with organizational goals. But how do you measure chaos maturity, and what roadmap should guide your journey?

This session introduces a Chaos Engineering Maturity Model (CEMM), a framework developed to assess and improve chaos engineering practices at every level of the organization. We’ll dive into how this model outlines progressive stages, from initial chaos test adoption to advanced system-wide fault tolerance. Using LitmusChaos as our primary tool, we’ll discuss how to establish measurable benchmarks at each stage, ensuring that chaos engineering is continually driving tangible improvements.

By adopting a structured maturity model, organizations can identify resilience gaps, improve cross-team collaboration, and build a culture of reliability at scale. Whether your chaos engineering journey is just starting or you’re looking to enhance an existing program, this session will provide a comprehensive roadmap for evolving chaos engineering into a core practice.

The Program Committee has not yet taken a decision on this talk

other talks of this topic

Photo
Pentesting Kubernetes Services in the Cloud

Sergey Chubarov

Independent consultant

specific
Photo
Behind the curtain of PowerShell cmdlets

Sergey Chubarov

Independent consultant

specific
Photo
The Balancing Act of Reliability

Yusuf Aytas

Workday

broad
Photo
DevOps done right: RBAC

Daniel Drack

FullStackS GmbH

specific
Photo
How do we deliver Agile Service Management?

Cristan Massey

Pearson Education

specific
Photo
CRaCing Java Snapshots

Pasha Finkelshteyn

BellSoft

specific
Photo
Empowering Developers: Building an Application Catalogue with Crossplane

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
AI for Next-Gen Security: OpenAI and Copilot for Security Synergy

Sergey Chubarov

Independent consultant

specific
Photo
Autonomous Agents and Their Role in Incident Management

Yoseph Reuveni

Not Affiliated

specific
Photo
Reduce Alert Fatigue with AIOps

Birol Yildiz

ilert GmbH

broad
Photo
Platform Engineering for a Greener Future

Pini Reznik

re:cinq

broad
Photo
Delivering SaaS on-prem with Cloud-native tools

George Hantzaras

MongoDB

specific
Photo
CNCF sandbox project k8up under the hood

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
An Intro to Kubernetes Hardening

Ayesha Kaleem

MBition GmbH

broad
Photo
Knowledge Discovery Efficiency: The FeedHenry Case Study

Benjamin Igna

Stellar Work GmbH

specific
Photo
Actionable Observability

Lesley Cordero

The New York Times

broad
Photo
Securing K8s: back and forth to RBAC Enforce

Roman Levkin

Exness

specific
Photo
Guarding the ML Galaxy: Beyond Accuracy to Privacy and Security

Rishabh Misra

Attentive Mobile Inc

broad
Photo
How to Measure PromQL/MetricsQL Expression Complexity

Roman Khavronenko

VictoriaMetrics

specific
Photo
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
K8s load testing at scale with k6-operator

Ant(on) Weiss

PerfectScale

specific