Tech Internals Conf is the largest conference for developers of complex and high-load systems

personal
account

schedule

stay tuned

Operational Excellence in Large-Scale Systems: Ensuring Performance and Stability in High-Load Env

from Salesforce (USA)

About speaker

Principal Storage Engineer , Salesforce

Vamsi is a highly experienced enterprise infrastructure architect with over 20 years specializing in SAN, NAS, cloud, and distributed storage technologies.

About speakers company

Abstracts

broad

Ensuring performance and stability in large-scale, high-load systems requires more than just reactive measures—it demands a proactive approach through Site Reliability Engineering (SRE) and operational excellence. This session will provide key insights into maintaining large-scale systems, with a focus on multi-petabyte storage, observability, and automation to prevent downtime and improve system reliability.

Running large-scale systems in production is a balancing act between performance, stability, and operational efficiency. In this session, I will explore the key principles of Site Reliability Engineering (SRE) and DevOps that ensure the smooth operation of high-load systems, focusing on the management of multi-petabyte storage and cloud environments. From managing large data pipelines to automating incident response, this talk will provide insights into how to create reliable systems that minimize downtime and improve performance through observability and automation. Attendees will learn how to implement best practices in monitoring, alerting, and stress testing, ensuring that their systems remain resilient under heavy loads. Real-world examples will highlight the importance of proactive problem-solving and the lessons learned from addressing operational bottlenecks in distributed systems.

The Program Committee has not yet taken a decision on this talk