Attending this event?
Welcome to Percona Live Online 2021
Online Open Source Database Conference
Back To Schedule
Wednesday, May 12 • 15:30 - 16:00
Massive Data Processing in Adobe Using Delta Lake

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Video will become available 10 minutes before session start

At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost-effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.

* What are we storing?
* Multi Source - Multi Channel Problem
* Access Pattern to optimize for
* Custom High Performance Query engine
* Data Representation and Nested Schema Evolution
* PerformanceTrade Offs with Various formats
* Go over anti-patterns used
* (String FTW)
* Data Manipulation using UDFs
* Writer Worries and How to Wipe them Away
* Gotchas
* Concurrency
* Column size
* Update frequency
* Transaction Management for A Healthy State
* Staging Tables FTW
* Why we can't live without them
* Datalake Replication Lag Tracking
* Instrumentation of the data pipeline gives more confidence to the reader
* Downstream Data Pipelines
* Showcase easy building of incremental versions of applications
* Maintenance Jobs
* Go over essentials of compaction and vacuuming
* Performance Time!
* What scale are we operating at?
* Settings like autoCompact and optimizeWrite
* Timings With and Without Delta
* Cost


Yeshwanth Vijayakumar

Sr. Engineering Manager/Architect, Adobe Systems Inc
I am a Sr. Engineering Manager/Architect on the Unified Profile Team in the Adobe Experience Platform; it’s a PB scale store with a strong focus on millisecond latencies and Analytical abilities and easily one of Adobe’s most challenging SaaS projects in terms of scale. I am actively... Read More →

Wednesday May 12, 2021 15:30 - 16:00 EDT
Room #6
Feedback form isn't open yet.