About speaker
Programming-as-a-Passion, Architecture-as-a-Job
Data Solutions Architect and Tech Lead with a passion for programming and well-architected highly-available and scalable applications. 12+ years of production experience in IT. Certified Databricks/Azure/AWS Solutions Architect Expert.
About speakers company
EPAM Systems, Inc. is an American company that specializes in software engineering services, digital platform engineering, and digital product design. Since 1993, EPAM has helped customers digitally transform their businesses through a unique blend of world-class software engineering, design and consulting services. EPAM is a founding member of the MACH Alliance.
Need to continuously ingest data from numerous disparate and/or non-overlapping data sources and then merge them together into one huge knowledge graph to deliver insights to your end users?
Pretty cool, huh? And what about multi-tenancy, mirroring access policies and data provenance? Perhaps, incremental loading of data? Or monitoring the current state of ingestion in a highly-decoupled distributed microservices-based architecture?
In my talk I will tell you our story: all started with a simple idea of building connectors, we ended up building fully configurable and massively scalable data ingestion pipelines which deliver disparate data pieces into a single data lake for their later decomposition and digestion in a multi-tenant environment. All while allowing customers and business analysts to create and configure their own ingestion pipelines in a friendly way with a bespoke pipeline designer with each pipeline building block being a separate decoupled microservice (think Airflow, AWS Step Functions and Azure Logic Apps). Furthermore, we'll touch such aspects as choreography vs orchestration, incremental loading strategies, ingestion of access control policies (ABAC, RBAC, ACLs), parallel data processing, how frameworks can help in the implementation of cross-cutting concerns, and even briefly talk about benefits of knowledge graphs.
Building a distributed highly-available system is a huge undertaking. Building such a system requires a lot of upfront designing and consideration.
An example of one such system that will be considered in this talk is an ingestion platform which:
a) allows to ingest data from a set of external data sources
b) enables end users to configure managed data pipelines to ingest data of arbitrary type and shape
c) is microservice-based
d) allows to securely and reliably transfer data from on-premises or another cloud
The discussion consists of two major parts: why we set out to build such a system and how we accomplished it.
The first part revolves around key business requirements that drived majority of our decisions. Our clients were primarily from the engineering domain so the data that we had to ingest can be descibed as technical requirements and specifications how to build a complex machinery or equipment (e.g. Part A is contained within Part B which is contained within Part C, can tolerate temperatures in the range [-30º; +40º], is of size X x Y x Z and of weight N kg).
What stands out the most is that the system had to be entirely configuration-driven meaning that end users having very little knowledge of the platform had to be able to configure the behavior of the data pipelines themselves in a user-friendly UI. While this requirement along significantly complicates the design, it was crucial in our case as we had a number of customers with their own processes, requirements and policies which dictated how to process and manage their data. It consequently led us to the second requirement – extensibility. We had to make sure that the system could be extended to support new use-cases and mostly proprietary data sources. And because data is often considered to be one of the most valuable assets by any data-driven company, we had to ensure that we could securely access and transfer the data from one cloud to another.
Now, the second part of the talk is going to shed light onto the implementation of these requirements. Having three types of components (connector, operator and uploader) allowed us to build a system that can be extended with new implementations of these components. Implementing these components as relatively small microservices allowed us to develop and maintain them independently in a very scalable and agile fashion. Microservices allowed us to scale the platform in and out at arbitrary points of time to be able to process the data with unknown data churn patterns and be ready to ingest potentially huge volumes of data. These are just a few examples of the architectural decisions that we had to make.
In the end of the talk, I am going to tell a bit about what we had to do with all the ingested data and what alternatives we had considered before making the decision to build such a system ourselves.
The talk was accepted to the conference program
Aman Sharma
Lamatic.ai
Vinit Dhatrak
Docusign
Neel Bhatt
RTL/Buienradar
Federico Fregosi
OpsGuru
Adrian Kodja
softgarden e-recruiting GmbH
Ambesh Singh
Visionet Systems Deutschland
Andrii Raikov
Delivery Hero SE
Utku Özdemir
Sidero Labs
Florian Lenz
neocentric GmbH - Azure Cloud Developer / Architect
Mons Anderson
Exness
Andrei Kvapil (kvaps)
Ænix
Joachim Aumann
Amazon Web Services
Ivan Koveshnikov
Gcore
Florian Lenz
neocentric GmbH - Azure Cloud Developer / Architect
Vijaykumar Jangamashetti
Alina Krasavina
Delivery hero
Joachim Aumann
Amazon Web Services
Vamsi Krishna Rao
Salesforce
Viktor Vedmich
Amazon Web Services
Florian Lenz
neocentric GmbH - Azure Cloud Developer / Architect
Florian Lenz
neocentric GmbH - Azure Cloud Developer / Architect
Joachim Aumann
Amazon Web Services
Geetha Anne
Temporal Technologies
Tech Internals Conf is the leading conference for developers of complex and highly loaded systems
Participation options
Offline
The price is soaring —> the closer the conference is, the more it costs.
The current price of a ticket is —> 360 EUR
If you have any questions you can reach out to our support service —> support@internals.tech
Special offer (from 5 tickets)
To order from 5 tickets, contact us support@internals.tech
leave a requestChanged your mind?
Please tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems