Photo

Operational Excellence in Large-Scale Systems: Ensuring Performance and Stability in High-Load Env

Vamsi Krishna Rao

from Salesforce (USA)

About speaker

Principal Storage Engineer , Salesforce

Vamsi is a highly experienced enterprise infrastructure architect with over 20 years specializing in SAN, NAS, cloud, and distributed storage technologies.

About speakers company

.

Abstracts

broad

Ensuring performance and stability in large-scale, high-load systems requires more than just reactive measures—it demands a proactive approach through Site Reliability Engineering (SRE) and operational excellence. This session will provide key insights into maintaining large-scale systems, with a focus on multi-petabyte storage, observability, and automation to prevent downtime and improve system reliability.

Running large-scale systems in production is a balancing act between performance, stability, and operational efficiency. In this session, I will explore the key principles of Site Reliability Engineering (SRE) and DevOps that ensure the smooth operation of high-load systems, focusing on the management of multi-petabyte storage and cloud environments. From managing large data pipelines to automating incident response, this talk will provide insights into how to create reliable systems that minimize downtime and improve performance through observability and automation. Attendees will learn how to implement best practices in monitoring, alerting, and stress testing, ensuring that their systems remain resilient under heavy loads. Real-world examples will highlight the importance of proactive problem-solving and the lessons learned from addressing operational bottlenecks in distributed systems.

The Program Committee has not yet taken a decision on this talk

other talks of this topic

Photo
Securing K8s: back and forth to RBAC Enforce

Roman Levkin

Exness

specific
Photo
Actionable Observability

Lesley Cordero

The New York Times

broad
Photo
Autonomous Agents and Their Role in Incident Management

Yoseph Reuveni

Not Affiliated

specific
Photo
Troubleshooting Microservice Architectures

Peter Zaitsev

Percona, Coroot

specific
Photo
Empowering Developers: Building an Application Catalogue with Crossplane

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
An Intro to Kubernetes Hardening

Ayesha Kaleem

MBition GmbH

broad
Photo
CNCF sandbox project k8up under the hood

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
Reduce Alert Fatigue with AIOps

Birol Yildiz

ilert GmbH

broad
Photo
How to Measure PromQL/MetricsQL Expression Complexity

Roman Khavronenko

VictoriaMetrics

specific
Photo
The Balancing Act of Reliability

Yusuf Aytas

Workday

broad
Photo
CRaCing Java Snapshots

Pasha Finkelshteyn

BellSoft

specific
Photo
Knowledge Discovery Efficiency: The FeedHenry Case Study

Benjamin Igna

Stellar Work GmbH

specific
Photo
Zero-instrumentation observability based on eBPF

Peter Zaitsev

Percona, Coroot

specific
Photo
Platform Engineering for a Greener Future

Pini Reznik

re:cinq

broad
Photo
DevOps done right: RBAC

Daniel Drack

FullStackS GmbH

specific
Photo
How do we deliver Agile Service Management?

Cristan Massey

Pearson Education

specific
Photo
DevOps for AI: running LLMs in production with Kubernetes and KubeFlow

Aarno Aukia

VSHN - The DevOps Company

specific
Photo
Guarding the ML Galaxy: Beyond Accuracy to Privacy and Security

Rishabh Misra

Attentive Mobile Inc

broad