About speaker
Principal software developer at Yandex Infrastructure.
I've had a passion for programming and computer systems since my teenage years. Now, holding a Master's degree in Computer Science, I have been fortunate throughout my career to amass a breadth of experience.
About speakers company
At Yandex, thousands of team members generate hundreds of products that entail tens of thousands of comments and pool requests every day. Building user-friendly infrastructure for developing and operating products at that scale is a serious challenge. We build the systems, services, and tools Yandex developers rely on.
Our solutions are aimed at ensuring that every product Yandex delivers has ready-to-use infrastructure at every stage. We have our own version control system for storing source code; C++, Java, Python, and Go systems for distributed builds and seamless integration capable of processing hundreds of builds a minute; a distributed task management system; rollout systems; and app monitoring systems. We also develop products for supporting development processes, resource planning, and much more.
Modern distributed databases scale horizontally with great efficiency, making them almost limitless in capacity. This implies that benchmarks should be able to run on multiple machines and be very efficient to minimize the number of machines required. This talk will focus on benchmarking high-performance databases, with a particular emphasis on YDB and our implementation of the TPC-C benchmark—the de-facto gold standard in the database field.
First, we will speak about benchmarking strategies from a user's perspective. We will dive into key details related to benchmark implementations, which could be useful when you create a custom benchmark to mirror your production scenarios. Throughout our performance journey, we have identified numerous anti-patterns: there are things you should unequivocally avoid in your benchmark implementations. We'll highlight these "bad" and "ugly" practices with illustrative examples.
Next, we’ll briefly discuss the popular key-value benchmark YCSB, which we believe is a prerequisite for robust performance in distributed transactions. Following this, we'll explore the TPC-C benchmark in greater detail, sharing valuable insights derived from our own implementation.
We'll conclude our talk by presenting performance results from TPC-C benchmark, comparing YDB and CockroachDB with PostgreSQL to illustrate situations where PostgreSQL might be not enough and when you might want to consider a distributed DBMS instead.