Loading…
Welcome to Percona Live Online 2021
Online Open Source Database Conference
REGISTER HERE!

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, May 12
 

06:00 EDT

Performance Comparison of MySQL and PostgreSQL Based on Kernel Level Analysis [30Min]
In this talk the author will show the results of performance comparison between MySQL and PostgreSQL using well known benchmarks such as sysbench and tpcc, and analyze the causes of such differences in terms of the database kernel design and key algorithms of the two competitors, in their kernel modules and functionality including transaction management, access methods, query processing, among others.

Following that, the author will show why the Kunlun distributed DBMS can combine the strengths of MySQL and PostgreSQL and avoid their weaknesses to achieve outstanding performance.

Speakers
avatar for David Zhao

David Zhao

database systems expert, zettadb
Zhao Wei (David Zhao) has been working on database kernels throughout his career. He worked on Oracle Berkeley DB and MySQL in the two outstanding global teams in Oracle during 2007 and 2015, and he worked in Tencent to upgrade its TDSQL from a table sharding solution to a full-fledged... Read More →


Wednesday May 12, 2021 06:00 - 06:30 EDT
Room #1

06:00 EDT

How We Built a Geo-Distributed Database With Low Latency [30min]
In modern cloud architectures, the holy grail is to be able to operate across multiple geographic regions concurrently, and tolerate the failure of either an availability zone or an entire region. This is a challenging problem for databases, which typically requires a tradeoff, such as higher latency for client connections, or a weaker level of consistency. For example, in MySQL it is common to use asynchronous replication in order to keep latency low.

In this talk, Ming Zhang will cover TiDB's approach to geographically distributing data. It includes features such as the ability to schedule data placement to different locations in order to provide low latency, as well as 'stale reads' which can read a local copy of data that may be slightly delayed from the primary copy.

Speakers
avatar for Ming Zhang

Ming Zhang

Research and Develop Engineer, PingCAP
Ming Zhang is an infrastructure engineer at PingCAP, responsible for the development of TiDB SQL infrastructure.


Wednesday May 12, 2021 06:00 - 06:30 EDT
Room #2

06:00 EDT

PostgreSQL on Arm: Ecosystem, Optimization & Tuning [30min]
2020 was a year of rapid growth for Arm Data center hardware, tech giants like AWS, Ampere, Apple etc have released their powerful Arm products, ARM is known to have a lower cost of ownership there-by delivering more TPS for the same cost, effectively generating cost savings. And yet the newly released hardware is also quite powerful.

PostgreSQL has been releasing packages for ARM for quite some time. So how will PostgreSQL perform on ARM? In this session, we will introduce what we have done in the PG upstream for ARM and why, sharing the real experience of PostgreSQL tuning on ARM platform.

On the other hand, only database kernel available on Arm is not enough, there are still gaps between PoC and production, so we need to still focus on the whole PostgreSQL ecosystem/others aspects. We will also provide a whole view of them in this session.

Speakers
avatar for Bo Zhao

Bo Zhao

Senior Software Engineer, Huawei Technologies
Bo Zhao has been actively working in opensource community for over 6 years. Currently, he is actively introducing and expanding the general arm ecosystem in the upstream communities on DB area.
avatar for Amit Khandekar

Amit Khandekar

Huawei Technologies
Amit Khandekar has been working in PostgreSQL server internals for more than 10 years, and is a PostgreSQL community contributor


Wednesday May 12, 2021 06:00 - 06:30 EDT
Room #3

06:30 EDT

Logical mariadb-dump --system Migration [30min]
Ever had some intricate permission structure you needed to move from one database to a different version database? Ever looked at a mysqldump of user tables and mapped up columns to see how a user was actually defined? Ever had a heirarchy of roles to move from MySQL to MariaDB? and back?

Logical dumps of system tables with inserts aren't logical enough, that what CREATE USER/ROLE etc was for. MariaDB added a "mariadb-dump --system=all" to allow forwards, backwards and sideways between MySQL/MariaDB versions of all those global things that exist in the mysql database.

Let me show you how.

Speakers
avatar for Daniel Black

Daniel Black

Chief Innovation Officer, MariaDB Foundation
After a reasonable amount of time doing development and IT security work, Daniel landed a DBA Consultant job and loved it.After writing a few too many bug fixes without a client to bill them to, he joined IBM to make MariaDB and MySQL scale on IBM POWER.The love for the community... Read More →


Wednesday May 12, 2021 06:30 - 07:00 EDT
Room #2

06:30 EDT

Percona Server for MySQL in the Enterprise
Large organisations were/are always hesitant in migrating towards Open Source Technologies. Percona Server for MySQL is an Open Source database solution offering features that are required by the organisation for Security, Compliance, availability and DR requirements. This talk will go deeper into why using Percona Server can be a benet for your organisation, while also having a comparison to the alternatives.

Speakers
avatar for Dimitri Vanoverbeke

Dimitri Vanoverbeke

Senior Solutions Engineer, Percona
As a young padwan, Dimitri was triggered into IT by an early adoption in BASIC, DOS, the early windows era, and early Linux adoption. This made him aware of the numerous pitfalls, challenges and actual fun of working in IT. In his career, Dim0 has worked for emerging technology organisations... Read More →


Wednesday May 12, 2021 06:30 - 07:00 EDT
Room #1

06:30 EDT

Building and Scaling a Robust Zero-Code Data Pipeline With Open Source Technologies [30min]
With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.

We built a similar (but simpler) demonstration pipeline for ingesting, indexing, and visualizing some publicly available tidal data using multiple open source technologies including Apache Kafka, Apache Kafka Connect, Apache Camel Kafka Connectors, Open Distro for Elasticsearch and Kibana, Prometheus and Grafana.

In this talk, we introduce each technology, the pipeline architecture, and walk through the steps, challenges and solutions to build an initial integration pipeline to consume USA National Oceanic and Atmospheric Administration (NOAA) Tidal data, map and index the data types in Elasticsearch, and add missing data with an ingest pipeline. The goal being to visualize the results with Kibana, where we’ll see the period of the “Lunar” day, and the size and location of some small and large tidal ranges.

But what can go wrong? The initial pipeline only worked briefly, failing when it encountered exceptions. To make the pipeline more robust, we investigated Apache Kafka Connect exception handling, and evaluated the benefits of using Apache Camel Kafka Connectors, and Elasticsearch schema validation.

With a sufficiently robust pipeline in place, it’s time to scale it up. The first step is to select and monitor the most relevant metrics, across multiple technologies. We configured Prometheus to collect the metrics, and Kibana to produce a dashboard. With the monitoring in place we were able to systematically increase the pipeline throughput by increasing Kafka connector tasks, while watching out for potential bottlenecks. We discovered, and fixed, two bottlenecks in the pipeline, proving the value of this approach to pipeline scaling.

We conclude the presentation with lessons learned so far, and some potential future challenges.

Speakers
avatar for Paul Brebner

Paul Brebner

Open Source Technology Evangelist, Instaclustr.com
Since learning to program on a VAX 11/780, Paul has extensive R&D and consulting experience in distributed systems, technology innovation, software architecture and engineering, software performance and scalability, grid and cloud computing, and data analytics and machine learning.Paul... Read More →


Wednesday May 12, 2021 06:30 - 07:00 EDT
Room #4

07:00 EDT

Pandemic - A Tale of 25x Growth in Three Weeks [30min]
Edmodo is an educational website that takes the ideas of a social network and refines them and makes it appropriate for a classroom. Our traditional peak period happens during the Back to School when students - and teachers - realize they have to go back to the classrooms and suddenly rush to the site to prepare their online alias to the school year.

Last year was different and it happened quickly and without too much warning in March when people were suddenly sent home from the schools to do all their tasks completely remotely.

That's how our adventure started and this is when the site started doubling its traffic on a daily basis depending how many countries went into lockdown on that given day. The traffic on the dbs went up from 200K QPS to 5 million within a course of 3 week, we hit several limitations when scaling our dbs, sometimes we had no other choice but to make risky decisions but the site stayed up and running.

Please join me on this retroactive journey.

Speakers
avatar for Natarajan Chidhambharam

Natarajan Chidhambharam

MySQL DBA, Edmodo
Natarajan Chidhambharam is an infrastructure engineer at Edmodo with focus on database scalability and reliability. Relational databases, db infrastructure solutions for large scale websites are his main working interests. Edmodo successfully handled 25x db trac growth during the... Read More →
avatar for Miklos Szel

Miklos Szel

Senior MySQL Architect, Edmodo
Miklos Mukka Szel is a Senior DB Architect at Edmodo. With more than 20 years’ experience in system and network administration, he has also worked for Walt Disney International as its main International MySQL DBA. Miklos specializes in MySQL-based high availability solutions, performance... Read More →


Wednesday May 12, 2021 07:00 - 07:30 EDT
Room #3

07:00 EDT

Unified Point in Time Recovery in the Cloud [60min]
Meet WAL-G - disaster recovery tool for PostgreSQL, MySQL, MS SQL, MongoDB and other databases. WAL-G was designed for cloud deployments of PostgreSQL HA clusters. But its approach scaled well not only for petabytes of data and thousands of PG instances, but for various database engines as well.
In this talk, we will present architecture of point in time recovery with WAL-G, common points, and differences of many OLTP databases wrt online backup and changed data capture.

WAL-G is free and open source, led by community of developers.

Speakers
avatar for Andrey Borodin

Andrey Borodin

Team lead of opensource RDBMS development, Yandex.Cloud
Software engineer, computer scientist, developer at Yandex, Ph.D., associated professor at Ural Federal University, co-founder of Octonica company. Researching data indexing since 2008. Teaching at Yandex School for Data Analysis and UrFU. Interested in backup technologies and data... Read More →
avatar for Dmitry Smal

Dmitry Smal

Team lead of Managed MySQL and SQL Server Development, Yandex.Cloud


Wednesday May 12, 2021 07:00 - 08:00 EDT
Room #2

07:00 EDT

Next Generation Databases
Over the past twelve years, we've seen a "third revolution" in database systems. The one size fits all RDBMS has given way to an explosion of diverse data management technologies. In this talk, we'll discover these new database technologies, consider their utility in leveraging existing data assets and speculate on how these will evolve to meet tomorrow's data needs.

The relational model dominated for a generation of computer professionals and represents a triumph of software architecture. However, today we are clearly in the midst of the third database revolution as the demands of an increasingly information-centric economy and global always-on applications have led to the emergence of new database architectures. In this third wave, multiple and diverse database technologies co-operate to accomplish the disparate business challenges provided by the migration to cloud computing, web-scale applications with social and mobile contexts, the promise of big data analytics, and the emerging challenge presented by the Internet of Things.

In this presentation, we’ll look at the key technologies and imperatives driving the third wave of databases and dive into specific facets of the revolution such as Big Data technologies, NoSQL, NewSQL, and graph technologies. In particular, we'll review important new database systems such as MongoDB, Cassandra, CockroachDB, SnowflakeDB and Neo4J. Finally, we’ll contemplate what might be in store for database management systems of the future.

Speakers
avatar for Guy Harrison

Guy Harrison

CTO, Southbank Software
Guy Harrison is CTO at Southbank Software, a database and blockchain tools company. He is the author of *MongoDB Performance Tuning*, *Next Generation Databases*, *MySQL Stored Procedure Programming* and many other books, articles and presentations on database technology. He writes... Read More →


Wednesday May 12, 2021 07:00 - 08:00 EDT
Room #4

07:30 EDT

PostgreSQL-As-A-Service: Comparison of Cloud Providers
Hosting Databases in general and PostgreSQL in particular as a managed service is getting more and more common.
The main offerings are Amazon RDS, Google Cloud SQL and Microsoft Azure Postgres.
They all provide a managed environment, handling backups (and optionally high-availability) for their customers.
On the flip-side, the customer no longer has Superuser-access to their Postgres instances which makes some things more difficult.

This talk will present the similarities and differences between the managed Postgres offerings from Amazon RDS, Google Cloud SQL and Microsoft Azure.
It will also try to present a quantitative look at their patching cycle and how long it takes for new major versions to be supported.

Speakers
avatar for Michael Banck

Michael Banck

Senior Consultant, credativ GmbH
Michael Banck is a senior consultant at credativ GmbH. He joined the company in 2009 and is a Debian Developer since 2001, besides being active in several other open source projects like PostgreSQL.As a member of the Database Team in credativ's PostgreSQL Competence Center, he has... Read More →


Wednesday May 12, 2021 07:30 - 08:00 EDT
Room #3

08:00 EDT

Projections in ClickHouse
Projections are collections of table columns with different physical layout to speed up queries. They are the main constructs of Vertica. At kuaishou.com we have implemented PROJECTIONs support for ClickHouse. It boosts our OLAP analytic capabilities by an order of magnitude. This talk will highlight the design considerations, some implementation details, how it is integrated into ClickHouse design. Examples and demo of the feature will be shown

Speakers
avatar for Amos Bird

Amos Bird

software engineer, kuaishou.com
Amos Bird (郑天祺) is a software engineer at KuaiShou Technologies in China. He graduated from the Institute of Computing Technology, Chinese Academy of Science, with a Doctor's degree in Database Systems. He is an active ClickHouse contributor for over more than three years, and... Read More →


Wednesday May 12, 2021 08:00 - 08:30 EDT
Room #7

08:00 EDT

Running PostgreSQL on Kubernetes
Running databases in Kubernetes attracts a lot of attention today. Orсhestration of PostgreSQL on Kubernetes is no way a straightforward process. Kubernetes has known as a platform that was built around the idea that everything can fail at any time. In this talk, we will review approaches that guaranty consistency on failover and do automatic promotions of new Primary, switchover traffic, perform updates, automate backups and simplify monitoring. So let’s make a fair review of the current state of the PostgreSQL toolset for Kubernetes reality.

Speakers
MM

Mykola Marzhan

Director of Server Engineering, Percona
Mykola is Kubernetes and Clouds lover and, currently, his goal to bring databases into Kubernetes world. Since 2004, most of his career has focused on development of monitoring, update and deployment systems.


Wednesday May 12, 2021 08:00 - 08:30 EDT
Room #3

08:00 EDT

Revertible, Recoverable Schema Migrations in Vitess
Recent developments in Vitess take online schema migrations to a new level, allowing us to think of DDLs as little more than a transaction, giving both developers and DBAs new super powers and peace of mind.

- Have you ever completed a schema migration only to realize a column should not have been dropped, an index should not have changed?
- Have you ever dropped a table only to find it was a big mistake?
- Have you ever waited a week for a schema migration, only to see it go down the drain due to a failover? Or have you postponed important maintenance work due to a running migration?

Vitess online schema migrations now utilize _VReplication_, a core component in Vitess that empowers Vitess’s live resharding, materialized views, live imports, and more.

In this session we present VReplication and illustrate how it works. We follow up to explain how online schema changes employ VReplication for both revertible, and recoverable, lossless schema migrations. We will discuss some of the internal logic, and present a demo of these new super powers.

Vitess is a CNCF open source database clustering system for horizontal scaling of MySQL.

Speakers
avatar for Shlomi Noach

Shlomi Noach

Engineer, PlanetScale
Engineer and database geek, works at PlanetScale as a maintainer for open source Vitess. Previously at GitHub. Interested in database infrastructure solutions such as high availability, reliability, enablement, automation and testing. Shlomi is an active MySQL community member, authors... Read More →


Wednesday May 12, 2021 08:00 - 08:30 EDT
Room #1

08:00 EDT

openGauss: A Fast Growing Open Source RDBMS Community
openGauss is an enterprise-grade open source relational database with high-performance, high-security, high-reliability. After it is open sourced in Q3 2020, the community has been rapidly grow to attract 10k+ users and over 500+ contributors.

In this session, we will introduce the basics about openGauss and the openGauss community, including goal, tech details, and roadmaps. We will also comparing openGauss with other top RDBMS to show pros and cons and what are the benefits for using openGauss.

Speakers
avatar for Zhenyu Zheng

Zhenyu Zheng

Senior Software Engineer, Huawei Technologies
6+ years of experiences in various open source projects and communities, currently focus on promoting and optimization for Arm platform in various open source communities, including DB, big data, cloud etc.
XX

Xinyong Xiang

openGauss Community Manager, Huawei Technologies
openGauss Community Manager, openGauss Maintainer. Has been engaged in open source community related work, including OpenStack, Kubernetes, SODA, KubeEdge, openEuler , openGauss, etc. since 2015
avatar for Bo Zhao

Bo Zhao

Senior Software Engineer, Huawei Technologies
Bo Zhao has been actively working in opensource community for over 6 years. Currently, he is actively introducing and expanding the general arm ecosystem in the upstream communities on DB area.


Wednesday May 12, 2021 08:00 - 08:30 EDT
Room #4

08:00 EDT

Dr. XtraBackup or: How I Learned to Stop Worrying and Love Backups I
This is the first session of two.
In this session we will discuss the fundamentals of backups how to perform basic operations with Percona XtraBackup 8.

Intro
- Why your backup strategy is (probably) wrong?
- The Schröedinger Backup.

Percona XtraBackup 8
-The Swiss Army Knife of MySQL Backups.
- The Inconsistent Backup Made Consistent.
- Backup, Prepare and Restore.
- Incremental Backups.
- Compression.

Speakers
avatar for Pep Pla

Pep Pla

Consultant, Professional Services, Percona
Pep has been working with databases all his life. Born in a small village by the Mediterranean, he currently lives in Barcelona. He loves tech, traveling, good food, music and, all things NASA. He hates talking about himself in the third person and has a particular sense of humor... Read More →


Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #5

08:00 EDT

Optimizing and Troubleshooting MySQL with PMM
Two of the most critical and challenging tasks for MySQL DBA's are optimizing MySQL performance and troubleshooting MySQL problems. The databases powering your applications need to be able to handle heavy loads while remaining responsive and stable, so that you can deliver an excellent user experience. DBA’s are expected to have plans in place to solve these issues. In this presentation, we will briefly talk about best practices for troubleshooting and optimization, as well as spending time showing how Percona Monitoring and Management (PMM) approaches typical performance optimization and troubleshooting tasks. We will look into how to spot a bad query which needs an index, inefficient queries which may need to be reworked, as well as spotting when a system (not sized correctly to manage its current load) can become saturated.


Speakers
avatar for Peter Zaitsev

Peter Zaitsev

CEO & Co-founder, Percona
Peter Zaitsev is CEO and co-founder of Percona. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the... Read More →


Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #6

08:00 EDT

What is OpenSearch?
You may have heard of OpenSearch, but what is it exactly and how did it come about? In this session, you’ll learn about the components of OpenSearch, what they do, and what problems they can solve. No prerequisites for this session, but it prepares you for other OpenSearch sessions.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Wednesday May 12, 2021 08:00 - 09:00 EDT
Room #8

08:30 EDT

Practical Database Automation with Ansible
Automation has been a marketing buzzword in the industry for a long time.

But that doesn't make it irrelevant.

One area that needs attention is how automation affects the database.

Should we automate databases? And if so, should we automate all the things? Where can I start? What are the dangers of automation?

This presentation will explore those questions and provide a good starting place for anyone who wants to get going with automation.

Specifically, we will look at Ansible, a popular automation tool. We will get to know Ansible concepts by using database-centric examples.

The examples provided will be based in MySQL, but the concepts can be used on any database environment.

By the end of this presentation, everyone should be ready to automate their databases responsibly.

Speakers
avatar for Derek Downey

Derek Downey

Founder/Trainer, DistributedDBA
As a technologist excited about open source database systems and the businesses that they power, Derek is enthusiastic about efficiency through automation, Operational Visibility, and the adoption of Cloud Technologies.Derek has helped with many customers implement automation for... Read More →


Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #4

08:30 EDT

MongoDB surviving after unclean shutdowns
Abstract: How MongoDB recovers from unclean shutdown explaining according to what works the recovery process with WiredTiger Journal internally and how data is protected during the whole process. It could be internal stus but I think it's wonderful how MongoDB and WiredTiger implements Write Ahead Log and I love to teach in a simple way.

Speakers
avatar for Alexandre Araujo

Alexandre Araujo

Senior Database Engineer, DBAcorp Brazil
Specialist Database Adminsitrator with 20 years of experience in Brazil acting in projects on the main Brazilians FinTechs and StockBrokers companies at Financial Brazilian Market


Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #2

08:30 EDT

Creating MySQL User-Defined Functions in C++ Has Never Been Easier
In this session I will show how to use **C++ UDF wrappers** from Percona Server 8.0.22+ to add new custom functionality to MySQL.
Forget about *funcinit()* / *func()* / *funcdeinit()* functions, individually defined context structures for passing data between them, manual memory allocations and ugly casts from *void* * to extract function parameter values - now you have nice **c++14** wrappers to do all the dirty work behind the scene with minimal overhead.

Speakers
avatar for Yura Sorokin

Yura Sorokin

Principal Software Engineer, Percona
Yura is a Principal Software Engineer at Percona, mostly working on Percona Server Core. He is the primary developer who implemented "Compressed Columns with Dictionaries" and "SEQUENCE_TABLE()". Before joining in July 2015 he was leading a cloud file service backend dev team which... Read More →


Wednesday May 12, 2021 08:30 - 09:00 EDT
Room #1

08:30 EDT

ClickHouse 2021: New Features and Roadmap
In 2021 the ClickHouse community is shipping the features that you probably always dreamed of. We are eliminating previously known limitations of ClickHouse. I will tell you and show demos about: replication without ZooKeeper; semistructured data; processing of frequent small INSERTs; support for transactions; window functions and projections and more and more... We're really excited about these features and hope they will make ClickHouse even better for your analytic applications.

Speakers
avatar for Alexey Milovidov

Alexey Milovidov

Lead ClickHouse Engineer, Yandex
Alexey Milovidov was the original designer of ClickHouse, starting from its inception in 2008. He is an expert on high-performance C++, analytic applications, and SQL databases. Alexey is the lead committer of the ClickHouse open source project on Github. He leads the ClickHouse development... Read More →


Wednesday May 12, 2021 08:30 - 09:30 EDT
Room #7

09:00 EDT

The 10 Open Source Database Trends That Are Transforming Your Database Infrastructure Forever
Open source software is the defacto standard for many new applications, this is especially true in the database industry. Currently, MySQL, PostgreSQL, MariaDB, MongoDB, Elastic, and others have shown up in every industry and organization in the world in some form or another. People are no longer choosing a single database for the company, they are letting developers and architects choose the best database for the job.

This has led to an increase in the number of technologies operations teams have to support. Couple that increases in technologies with a growing micro-service ( or cloud-native ) development paradigm where every service has its own database and where all the data is valuable.

Now companies are now faced with dozens of technologies, hundreds or even thousands of individual database instances, and petabytes of data. The management of the complexity of such an environment is changing the way we look at systems and operations.

Let’s talk about the trends and tell you what you need to know about how to manage the new multi-verse of data.

Speakers
avatar for Matt Yonkovit

Matt Yonkovit

HOSS (Head of Open Source Strategy), Percona
Matt Yonkovit has been in the Open Source Database Community for over 15 years working for MySQL AB, Sun Microsystems, Mattermost, and Percona. Matt has held technical roles, management, and executive roles serving the open source community.    He is currently serving as Percona's... Read More →


Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #1

09:00 EDT

A Large Scale MongoDB Migration From MMAPv1 to WiredTiger
MMAPv1 storage engine was deprecated in MongoDB 4.0 and removed in MongoDB 4.2, which makes WiredTiger the only available storage engine for MongoDB from version 4.2 and downwards. Even WiredTiger became available in MongoDB version 3.0 most users delayed the migration between the storage engines for various reasons, like bugs or the uncertainty of being an early adopter. As a matter of fact, some users delayed the unavoidable storage engine migration until MongoDB 4.0.

In this presentation, we are going to describe a large-scale migration of an MMAPv1 environment to WiredTiger. For the scope of this presentation, we define as a large-scale environment, as an installation with more than 100 TeraBytes of data, more than 1000 shards involved, and more than 100 different workloads. We are going to focus mainly on the preparation steps, as we think it's the most important piece of this type of migration. At the same time, we are going to analyze the actual migration steps, the rollback procedure, and most importantly the lessons learned

Speakers
avatar for Antonios Giannopoulos

Antonios Giannopoulos

Senior Database Administrator, Rackspace Technology
I am working as Senior NoSQL Database Administrator at Rackspace supporting thousands of MongoDB installations over the past 7 years. I have 18 years experience in databases and system engineering. I really enjoy challenges in sharding and schema design and love migrations from Relational... Read More →


Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #2

09:00 EDT

Scaling MySQL @LinkedIn with Vitess
MySQL serves as the datastore for many of the important internal tools at LinkedIn. A typical MySQL cluster at LinkedIn has 1 primary and 2 replicas for read-scaling and High Availability. To scale the reads for all these tools, more replicas are added to the cluster. But, what about write-scaling which still goes to a single primary?

We started looking for an answer to this question about a year and a half ago when one of the tools ramped up quickly and was struggling due to writes. When we decided sharding as the solution to this, there was a choice between writing the sharding logic in the application itself or choosing Vitess.

Vitess stood out for us. Although we had to tweak the schema design, there were minimal changes to the application code as it supports almost all the SQL queries and connection pooling.

This talk will be focused about our journey with Vitess. We will take the attendees through the ‘Why’ of our journey, why we chose Vitess and not any other sharding method; how we migrated the platform real time and what tremendous metrics we achieved post successful migration.

The talk will consist of:-
- Introduction
- Why Vitess?
- Brief Introduction to Vitess
- Challenges while moving to Vitess
- What it looks like at the Infra-level
- Key Achievements

Along with talking about our journey with Vitess, we will touch base upon what is next for Vitess@LinkedIn.

Key Takeaways
- Vitess as a sharding solution to MySQL
- Insights from our learnings

Speakers
avatar for Apoorv Purohit

Apoorv Purohit

SRE, LinkedIn
I started my career as an application developer in 2016 handling C++ telephony module and report generation from MySQL. With my growing interest in databases, I switched to become a full time database engineer in 2017 at OneDirect. I was a part of the team that handled the entire... Read More →
avatar for Karthik Appigatla

Karthik Appigatla

Staff SRE, LinkedIn
Karthik Appigatla has been working on various large scale data stores for a decade primarily focused on MySQL. Currently, he has been working for LinkedIn for the last 5 years. Prior to LinkedIn, he worked for Yahoo, Pythian and Percona where he was responsible for helping clients... Read More →


Wednesday May 12, 2021 09:00 - 09:30 EDT
Room #6

09:00 EDT

Open Source Database Architectures: Shifting From Capture-First to Query-First
It’s like picking a flavor of ice cream. You have your go-to databases, your favorites, the ones you know and love. But are they always the right choice for every task? And, the reality is you’re going to be working across five or six anyway. So how to choose? Instead of fitting the workload to the database, you fit the database to the workload. Is your workload mostly reads? Writes? Or a mixture? And what kinds of reads, linear scans, random reads? Do you need a database for transactioning or for durability? We’ll walk through the many open source options, project by project, and map workloads to the right database that can make or break the success of your next project.

Speakers
avatar for Rob Dickinson

Rob Dickinson

CTO, Resurface Labs
Co-founder and CTO at Resurface Labs, Rob lives and breathes databases. Years at Intel, Dell, and Quest Software, honed his database design skills and his immersion into open source databases was forged by the need to architect and build a scalable solution to solve for customer escalation... Read More →


Wednesday May 12, 2021 09:00 - 10:00 EDT
Room #3

09:00 EDT

Dr. XtraBackup or: How I Learned to Stop Worrying and Love Backups II
This is the second session of two.
In this session we will discuss how to perform advanced operations with Percona XtraBackup 8.

Contents:
- Set up a replica.
- Streaming Backups: Multicast.
- Partial restore. Migration.
- Partial restore. Fix catalog corruption.
- Throttling.
- Point in time recovery.
- Tales from the Encrypt: Backup encryption and backup of encrypted databases.
- Back to the Cloud: store your backups in the cloud.
- Xtrabackup for Windows: use a network share to backup your Windows database.

Speakers
avatar for Pep Pla

Pep Pla

Consultant, Professional Services, Percona
Pep has been working with databases all his life. Born in a small village by the Mediterranean, he currently lives in Barcelona. He loves tech, traveling, good food, music and, all things NASA. He hates talking about himself in the third person and has a particular sense of humor... Read More →


Wednesday May 12, 2021 09:00 - 10:00 EDT
Room #5

09:00 EDT

The Essentials of Search
The term ‘search engine’ is badly overloaded and can cause confusion for many folks. Is it a database? Is it a service to nd recipes for lasagna? Is it something else entirely? This session will teach you not only how to answer these questions but also what makes search distinct and so darn useful across so many domains.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Wednesday May 12, 2021 09:00 - 10:00 EDT
Room #8

09:00 EDT

Reducing Costs and Improving Performance With Data Modeling in Postgres
During this talk we will explain how PostgreSQL organizes the data internally, how the Free Space Map (FSM) works and how we can reorganize the data model to take advantage of data alignment inside blocks reducing the size of the data in disk and consequently in memory, which can save money and improve performance.

Speakers
avatar for Charly Batista

Charly Batista

Senior Support Engineer, Percona
A Brazilian living in China... Charly is passionate about new cultures, their languages and traditions. Charly has been working with database and development for more than 12 years and has participated in small and large projects in Brazil, the US, China and other countries.


Wednesday May 12, 2021 09:00 - 10:00 EDT
Room #4

09:30 EDT

MongoDB : To Shard or Not To Shard?
Sharding is one of the most difficult aspects of MongoDB to get right. Shard too early and your costs go up with little gain, shard too late and you and your customers will be feeling the pain. This talk aims to help inform you on considerations about when to shard, when sharding may not be the right thing for your database, and things to consider when you are going to move forward with sharding your MongoDB database.

Speakers
avatar for Mike Grayson

Mike Grayson

MongoDB Database Engineer, Percona
Mike Grayson is a MongoDB Database Enginer at Percona, the unbiased open source database experts. Mike has been involved in many aspects of the MongoDB community since he started using the database in 2014. Heading the Western NY MongoDB User Group, blogging and being involved in... Read More →


Wednesday May 12, 2021 09:30 - 10:00 EDT
Room #2

09:30 EDT

The Lost Art of Database Design
The scalability of your application and your database is only as good as the database design you put behind it. Designing your schema, the database structures, and planning for the future need to happen early and needs to evolve over time. In today's rapid pace development cycles, the database design is often overlooked or even dismissed entirely. Databases and marketing teams tout "Schemaless Designs", database as a service, and new tech that makes caring about databases a thing of the past. I will explain why database design is as important as ever and I will give you the 8 things you need to design on every application regardless of which database or service you use.

Speakers
avatar for Matt Yonkovit

Matt Yonkovit

HOSS (Head of Open Source Strategy), Percona
Matt Yonkovit has been in the Open Source Database Community for over 15 years working for MySQL AB, Sun Microsystems, Mattermost, and Percona. Matt has held technical roles, management, and executive roles serving the open source community.    He is currently serving as Percona's... Read More →


Wednesday May 12, 2021 09:30 - 10:00 EDT
Room #6

09:30 EDT

ClickHouse Developer Tutorial, Part 1 - Intro to ClickHouse
Heard about ClickHouse and been itching to try it out? This talk is for you! It's a tutorial to get new ClickHouse developers up and running quickly. We'll begin by summarizing practical differences between ClickHouse and row stores like MySQL or PostgreSQL. Next, we'll show how to install ClickHouse and connect to it with popular client tools. We'll then teach basics of ClickHouse SQL, focusing on commands to build reports and dashboards. Continue from this talk to the Tutorial Lab for some real ClickHouse exercises and Q&A with our experts.

Speakers
avatar for Robert Hodges

Robert Hodges

CEO, Altinity
Robert Hodges has worked on database systems since 1983. During that he has loved, worked on, and occasionally despised 20 dierent DBMS types. His technical interests include data, distributed systems, virtualization technology, and security. He is currently CEO of Altinity, which... Read More →
AZ

Alexander Zaitsev

CTO, Altinity
Alexander is CTO and a founder of Altinity, which operates and supports ClickHouse for enterprises. After a career building large analytic apps on Vertica and ClickHouse, he turned to making ClickHouse itself work better. Alexander has helped over 4 of 4 Submit one hundred companies... Read More →


Wednesday May 12, 2021 09:30 - 10:30 EDT
Room #7

09:30 EDT

MySQL Performance for DevOps
MySQL performance can be improved by tuning queries, server options, and hardware. Traditionally it was an area of responsibility of three different roles: Development, DBA and System Administrators. Now DevOps handle these all. But there is a gap. Knowledge, gained by MySQL DBAs after years of focus on the single product is hard to gain when you focus on more than one. This is why I am doing this session. I will show minimal, but the most effective, set of options which will improve MySQL performance. For illustrations, I will use real user stories, gained by my Support experience, and Percona Kubernetes operator for PXC.

Speakers
avatar for Sveta Smirnova

Sveta Smirnova

Principal Support Escalation Specialist, Percona
Sveta Smirnova is a MySQL Support Engineer with over 10 years of experience. She currently works in Percona. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker, teaching others how to deal with... Read More →


Wednesday May 12, 2021 09:30 - 10:30 EDT
Room #1

10:00 EDT

Backup, DR, and Migration of Data-Rich Applications in Kubernetes
**Title: Backup, DR, and migration of data-rich applications in Kubernetes**

Managing application state in Kubernetes requires handling not only persistent data requirements of the application, but also associated Kubernetes objects and declarative configuration that is specified by
application developers. This expands the boundaries of data protection in the traditional application sense. In this session, we will discuss and demo how NetApp Astra simplifies and provides a consistent end-to-end application data lifecycle management for modern applications running on Kubernetes clusters.

Speakers
avatar for Diane Patton

Diane Patton

Technical Marketing Engineer, NetApp
Diane is a Technical Marketing Engineer with NetApp Cloud Services supporting Astra, application-aware data management for Kubernetes. She works with product management, marketing, and development to evangelize, support, and help drive new technologies and features into Astra.  Diane... Read More →
avatar for Rajeev Chawla

Rajeev Chawla

Sr. Director, Product Management and Strategic Partnerships, NetApp, Inc.
Rajeev brings over twenty years of experience as an entrepreneur and product and engineering leader in the areas of hybrid cloud software, system software, storage and security.Prior to NetApp, Rajeev was at VMware where he led the integration of CloudVelox application migration technology... Read More →


Wednesday May 12, 2021 10:00 - 10:30 EDT
Room #4

10:00 EDT

Percona Server for MongoDB as an Alternative for MongoDB Enterprise
In this talk we will evaluate MongoDB Enterprise Advanced and Percona Server for MongoDB side-by-side to better inform the decision making process. Percona Server for MongoDB features enterprise-grade functionalities and runs with tools like Percona Monitoring and Management to monitor MongoDB. Even more, it is 100% free and open source. We will go through features of each database in side-by-side comparisons.

Speakers
avatar for Barrett Chambers

Barrett Chambers

Senior Solutions Engineer, Percona
Barrett is a Senior Solutions Engineer at Percona focused on assisting Percona's customers and the community at adopting open source software solutions to meet Enterprise needs. Barrett has been at Percona for over 4 years and is well-versed in technology solutions in the MySQL, MongoDB... Read More →


Wednesday May 12, 2021 10:00 - 10:30 EDT
Room #2

10:00 EDT

Flexible Data Transfer, Uldra-Replicator
Uldra-Replicator is a binlog parser, that was first started to do streaming processing while reading data to solve complex requirements.
In this session, I'll explain how to transfer data flexibly using binary log. Unlike the existing CDC, various requirements can be handled on a DB event basis, and I will explain this on a case-by-case basis. I want to share ideas that can efficiently handle flowing data and solve complex requirements in a fun way.

The following subjects will be covered:

• Binary log description and structure
• Data transfer technique and limitation
• Uldra-Replicator concepts
- Easy operation
- Performance
- Consistency
- Transaction
• Use case
- Data transfer to Shard DB in real time
- Merge or separate data for delivery
- Change contents while passing data

I'll open my source code, I personally implemented in the Git project, and show the test result based on it.

Speakers
avatar for Dongchan Sung

Dongchan Sung

MySQL DBA, KakaoBank
Dongchan is DBA at KakaoBank, the Korean internet bank. He is very interested in data efficient processing and automation. His final goal is building unbreakable data system. He is interested in all elements (monitoring, automation) for a stable system configuration.


Wednesday May 12, 2021 10:00 - 10:30 EDT
Room #5

10:00 EDT

Comparing High Available Solutions With Percona XtraDB Cluster and Percona Server With Group Replication.
Percona XtraDB Cluster (PXC) is currently the most popular solution for HA in the MySQL ecosystem, and any solutions Galera-based as PXC have been the only viable option when looking for a high grade of HA using synchronous replication.

But Oracle had intensively worked on making Group Replication more solid and easy to use.

It is time to identify if Group Replication and attached solutions, like InnoDB cluster, can compete or even replace solutions based on Galera.

This presentation will focus on comparing the two solutions and how they behave when serving basic HA problems.

Attendees will be able to get a clearer understanding of which solutions will serve them better, and in which cases.

Speakers
avatar for Marco Tusa

Marco Tusa

MySQL Tech Lead, Percona
Marco Tusa had his own international practice for the past twenty eight years. His experience and expertise are in a wide variety of information technology and information management fields, cover research, development, analysis, quality control, project management and team management... Read More →


Wednesday May 12, 2021 10:00 - 11:00 EDT
Room #1

10:00 EDT

Securing OpenSearch
OpenSearch has a strong set of security features but not everyone takes advantage of them. This session will provide an overview of the features as well as show ways to apply these features for real world problems. The session will include demos of data masking, authentication, and authorization, as well as index, document, and eld level security.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Wednesday May 12, 2021 10:00 - 11:00 EDT
Room #8

10:00 EDT

Power Use of Indexes in PostgreSQL - A User Perspective.
There have been many presentations about the Different Indexes in PostgreSQL ( B-Tree, HASH, GIN, GiST etc), especially from the PostgreSQL architecture perspective.

But these talks always lacked details from the user perspective on the selection of indexes.

It is common to see that architects and developers fail to select the right types of index and the way it should be used. Just an overview of all types of indexes also won't help much in decision-making. In this talk, I am covering the following points also.

1. Index when partitioning is not an option.
2. 2. Inverted Indexes and their usefulness in the real world.
3. 3. Tips and techniques for efficient index usage.
4. 4. How important is Index usage monitoring and how to do that.

This talk is more towards proper examples and demonstrations. This presentation with demonstrations is expected to drive users to the right selection of indexes and better usage

Speakers
avatar for Sergey Kuzmichev

Sergey Kuzmichev

Support Engineer, Percona
Sergey is a support engineer in Percona. Interested in all things databases, he's currently working mainly with MySQL and PostgreSQL. He started his career working as an Oracle DBA, later moving to a DevOps engineer role supporting a Java-based trading platform running on PostgreSQL... Read More →
avatar for Jobin Augustine

Jobin Augustine

PostgreSQL Escalation Specialist, Percona
Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open... Read More →


Wednesday May 12, 2021 10:00 - 11:00 EDT
Room #3

10:30 EDT

ClickHouse Developer Tutorial, Part 2 - Lab Exercises
This talk consists of live lab exercises for the ClickHouse Developer Tutorial, Part 1. Please attend that talk and then join us for fun using ClickHouse. We have some puzzles for you to try that will test your ClickHouse knowledge. You can run all queries straight from your web browser, so there's no preparation required. Join us for the fun!

Speakers
avatar for Robert Hodges

Robert Hodges

CEO, Altinity
Robert Hodges has worked on database systems since 1983. During that he has loved, worked on, and occasionally despised 20 dierent DBMS types. His technical interests include data, distributed systems, virtualization technology, and security. He is currently CEO of Altinity, which... Read More →
AZ

Alexander Zaitsev

CTO, Altinity
Alexander is CTO and a founder of Altinity, which operates and supports ClickHouse for enterprises. After a career building large analytic apps on Vertica and ClickHouse, he turned to making ClickHouse itself work better. Alexander has helped over 4 of 4 Submit one hundred companies... Read More →


Wednesday May 12, 2021 10:30 - 11:00 EDT
Room #7

10:30 EDT

Hyperscaling Workloads With Amazon RDS MySQL and Amazon Aurora MySQL
When the right factors align, businesses can grow at a sudden, unanticipated and exponential rate. That's a good thing, but an unexpected and dramatic spinke in users and transactions of your MySQL database can strain resources, potential database failure, and, as a result, a bad user or customer experience. In this session, learn strategies and best practices quickly growing your MySQL deployment to ensure high performance and an exceptional user experience while maintaing ACID compliance and rock solid security.

Speakers
avatar for Sai Kondapalli

Sai Kondapalli

Database Specialist SA, AWS
Sai Kondapalli is a Database Specialist Solutions Architect at Amazon Web Services specializing on MySQL services and assists customers to migrate and optimize their workloads on RDS MySQL, MariaDB and Aurora MySQL.


Wednesday May 12, 2021 10:30 - 11:00 EDT
Room #10

10:30 EDT

Wide Rows NoSQL vs SQL Data Modeling
Some NoSQL databases popularized the notion of “loose schema”, often misunderstood as “schemaless” - but there is always a data model, in the DB, the application or the mind of the developer. However, NoSQL schemas are designed with very different goals in mind than SQL schemas: where SQL normalizes, NoSQL denormalizes; Where SQL joins ad-hoc, NoSQL pre-joins; where SQL tries to push performance to the runtime, NoSQL bakes performance into the schema. Adding to the confusion, various NoSQL databases have different ideas on what schemas should enforce. This talk aims to introduce the core concepts of NoSQL schema design, using Scylla as an example explaining tradeoffs and rationale.

Speakers
avatar for Tzach Livyatan

Tzach Livyatan

VP Product, Scylla
Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier-grade systems, signalling, policy and charging... Read More →


Wednesday May 12, 2021 10:30 - 11:00 EDT
Room #5

10:30 EDT

Integrating Best of Breed Open Source Tools to Vitess: Orchestrator
Orchestrator is a MySQL high availability and replication management tool, runs as a service, and provides command-line access, HTTP API, and Web interface. Orchestrator supports services that include Discovery, Refactoring, Recovery with pre/post hooks to a MySQL cluster.

Vitess is a database clustering framework for managing and maintaining a healthy MySQL system with the ability to scale horizontally.

Combining Open Source utilities into Vitess's ecosystem has been the mission for maintainers; hence Vtorc was born.

In this talk, we will discuss failovers managed by Vtorc within the Vitess cluster with a demo of Orchestrator failover. The conversation will also include a short demo of how Vitess and Orchestrator can work together, bringing the Open Source world's best into one.

Speakers
avatar for Alkin Tezuysal

Alkin Tezuysal

Sr. Technical Manager, Planetscale, Inc.
Alkin Tezuysal has extensive experience in enterprise relational databases, working in various sectors for large corporations. With more than 20 years of industry experience, he has acquired skills for managing large projects from the ground up to production. For the past decade... Read More →


Wednesday May 12, 2021 10:30 - 11:00 EDT
Room #4

10:30 EDT

The Real Costs and Benefits of Open Source Database Adoption
One of the perceived benefits of using open source software is cost. It’s free to obtain, and users can get technical support for free through the community. We’re (hopefully) long past days of being worried that open source can’t handle mission-critical workloads, it’s also increasingly clear that open source doesn’t pose particular security risks. Instead, the biggest risk for open source is that we keep getting confused about its costs - especially in production environments.

The question is - how to calculate if it makes financial sense? Many decision-makers still underestimate the real cost (and benefits!) of the move from proprietary to OSS. In this talk, Michal will discuss different factors that influence the TCO of open source databases’ adoption including hidden costs that you should always factor in. He’ll also try to debunk some of the myths that are associated with open source databases. Last, but not least - he’ll share some tips on how to make the move from proprietary to open source successful.

After this talk, you’ll be better positioned to argue why using open source databases at your organization makes sense or make the informed decision yourself!

Speakers
avatar for Michal Nosek

Michal Nosek

Enterprise Architect, Percona
During ten years of his career, Michal took different roles from a software engineer and business analyst to a technical sales consultant, always staying close to the technology. He has hands-on experience with a broad range of programming languages and database technologies in different... Read More →


Wednesday May 12, 2021 10:30 - 11:00 EDT
Room #6

11:00 EDT

Scaling Applications With Amazon RDS Proxy
Many applications, including those built on modern serverless architectures (https://aws.amazon.com/serverless/), can have a large number of open connections to the database server, and may open and close database connections at a high rate, exhausting database memory and compute resources. Amazon RDS Proxy allows applications to pool and share connections established with the database, improving database efficiency and application scalability. In this session, learn how to RDS Proxy can help reduce MySQL failover times by up to 66% and how to manage database credentials, authentication, and access through integration with AWS Secrets Manager and AWS Identity and Access Management (IAM).

Speakers
avatar for Surendar Munimohan

Surendar Munimohan

Senior Database Specialist Solutions Architect, AWS
Surendar Munimohan is a Senior Database Specialist Solutions Architect at Amazon Web Services. He works with customer to provide guidance and technical assistance on database projects. He holds Masters of Information Technology from Central Queensland University, Australia.


Wednesday May 12, 2021 11:00 - 11:30 EDT
Room #10

11:00 EDT

MongoDB Security Features
When we speak about security, the actual reality is that companies need to comply with multiples frameworks and regulations, and assessing which rules apply to each organization is no easy feat.

Over the talk, we will revisit the security feature we can implement in the MongoDB environment. The aim is to provide further information on what you can use to help your company with future security implementations.

The topics presented will be:
* Authentication
* Authorization
* TLS/SSL
* External Authentication
* Auditing
* Log Redaction
* Encryption - Data at Rest and Client Field Encryption.

Speakers
avatar for Jean da Silva

Jean da Silva

Support Engineer, Percona
Jean joined Percona as a Support Engineer in 2020. Before joining the team, he worked in a mission-critical environment for 4 years, helping administrate databases like MySQL, MongoDB, and Oracle DB. Specializing in Database Engineering, and Big Data, he likes to watch Formula 1 in... Read More →


Wednesday May 12, 2021 11:00 - 11:30 EDT
Room #2

11:00 EDT

Convergence of Different Dimensions within BangDB - A High Performance Modern NoSQL Database
If we look at the data trend and how things are changing as far as the data generation, processing and consumption are concerned, we see that there is a convergence of different problem spaces happening at the core. For example, to do even a simple job of monitoring an ongoing operation, we need various data to be structured, ingested, integrated and processed in real-time (or quasi, streaming) manner. Further training of models or prediction on streaming data is required for it to be predictive in nature, both at the local (edge or within the device) or at cloud level. The speed and scale at which this takes place, it becomes almost infeasible to use siloed or “stitched together” kind of a platform, which simply doesn’t seem to scale anymore.

As a philosophical shift, we must converge all participating dimensions from solution space as well in order to counter this fusion of different problems or challenges that we face at the moment, which will grow only bigger and become tougher to handle. We must break the silos and create a converged architectural space which then should linearly grow in order to tackle the velocity, variety, and volume of data.

This fusion of different dimensions from the solution space would provide ways to natively integrate and support different flavors of data without having to upfront structure the data. The convergence of streaming and AI will allow continuous processing of data in both absolute and predictive manner. The stream processing will ensure continuous aggregation, running statistics, complex event processing, predictions and relevant actions in real-time basis.

The native integration at the buffer pool or IO layer will give the user full control of every single byte being ingested and processed by the system, which will reduce the latency to allow high-speed precision processing. Further siloed (semi siloed) architecture forces too many network hops along with too many copies of data. In this scenario, even with a very high processing efficiency, low latency (or high speed) is not possible with this architecture. We need to minimize network hops and copy of data as much as possible. With convergence, we minimize both the network hops and data copy, thereby improving the performance.

This converge first approach would also allow true linear scaling of the system. With siloed architecture we find it always extremely hard to scale different verticals together. Further complete utilization of resources is also not possible. But with convergence, we need to bother about scaling single dimension and high resource utilization is definitely the by-product.

Therefore a NoSQL database which converges different entities such as ML and streaming and which works within a device connected with the local or cloud instances of itself could possibly offer some relief by reducing the pain of operation and maintenance.

BangDB is a converged NoSQL Database, designed to handle the emerging use cases with ease at scale.

Speakers
avatar for Sachin Sinha

Sachin Sinha

Author of BangDB, Founder of IQLECT
Sachin has over 20 years of experience in building software products in database, ecommerce and distributed computing area. He has previously worked with Microsoft in the SQL org, developing key value store for devices. In Amazon he led the engineering team for sponsored link platform... Read More →


Wednesday May 12, 2021 11:00 - 11:30 EDT
Room #6

11:00 EDT

A Tale of Two Communities: How Open Source, ClickHouse and Superset Help Visualize Your Data
Databases have data. Business intelligence tools visualize it. This talk walks through how we are building a polished integration between ClickHouse database and Superset BI using 100% open source techniques. We'll introduce ClickHouse and Superset, then describe the connectivity problems facing us at the beginning. Next, we'll show how we worked at the community level using Github and Slack workspaces to solve them quickly. We'll end with a demo of the working result. This case study shows the power of open source communities to serve our shared developers.

Speakers
avatar for Robert Hodges

Robert Hodges

CEO, Altinity
Robert Hodges has worked on database systems since 1983. During that he has loved, worked on, and occasionally despised 20 dierent DBMS types. His technical interests include data, distributed systems, virtualization technology, and security. He is currently CEO of Altinity, which... Read More →
avatar for Srini Kadamati

Srini Kadamati

Senior Developer Advocate and Apache Superset Committer, Preset.io
I'm a Senior Data Scientist that's on a mission to enable more people to work with data effectively. I spent 5 years building an online learning platform specifically to help people learn existing data tools before turning my attention to improving the data tools themselves. I now... Read More →


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #7

11:00 EDT

Organize the Migration of a Hundred Database Clusters to the Cloud
At the end of 2018, I told you the story of one of the BlaBlaCar Foundation teams' greatest success, "100% Containers Powered Carpooling". That conference was about our migration into containers, and described a paradigm that became key for us: every component of the infrastructure can be restarted at any time without impacting the applications using them.

In 2019 a new chapter had to be written, BlaBlaCar signed a contract with Google Cloud Platform and like many companies, we were starting our journey to the cloud… Eight open source database softwares, one hundred clusters, and a new infrastructure with new components, patterns, and tons of possibilities.

Building the team, finding the right focus, choosing tradeoffs, making or buying systems, changing our communication, proposing migration paths… With the Engineer Manager's hat, I will share with you what ingredients we had to look for to turn this massive project into a success story.

Speakers
avatar for Maxime Fouilleul

Maxime Fouilleul

Engineering Manager for team Database Reliability Engineering, BlaBlaCar
Max joined BlaBlaCar as MySQL specialist in 2014 to accompany the team in managing the main production databases. He is now leading the Database Reliability Engineering (DBRE) team, working on two missions:- Package and support the database catalog (8 OSDB) for BlaBlaCar application... Read More →


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #5

11:00 EDT

Percona XtraDB Cluster Operator - Architecture Decisions
Percona XtraDB Cluster Operator is a drop-in replacement for MySQL Enterprise with sync replication running on Kubernetes. It automates the creation, alteration, or deletion of members in your Percona XtraDB Cluster environment. It can be used to instantiate a new Percona XtraDB Cluster replica set, or to scale an existing environment.

In this talk we will cover various architecture decisions we made when building PXC Operator. There are lots of differences between how it can be done on regular VMs and in k8s: PITR implementation, autorecovery, retention policies, haproxy/proxysql & proxy protocol.

Speakers
avatar for Sergey Pronin

Sergey Pronin

Product Owner, Percona
Sergey is a passionate technology "driver". After graduation worked in various fields: internet service provider, financial sector and M&A business. Main focal points were infrastructure and products around it. At Percona as a Product Owner drives forward Kubernetes and Cloud databases... Read More →


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #4

11:00 EDT

Monitoring and Tracing MySQL or MariaDB Server With Bpftrace
Bpftrace is a relatively new eBPF-based open source tracer for modern Linux versions (kernels 5.x.y) that is useful for analyzing production performance problems and troubleshooting software. Basic usage of the tool, as well as bpftrace-based one liners and small scripts useful for MySQL, MariaDB, Percona Server or any open source DBAs (and even developers) are presented. Problems of MySQL server dynamic tracing with bpftrace are discussed.

Speakers
avatar for Valerii Kravchuk

Valerii Kravchuk

Principal Support Engineer, MariaDB Corporation
Valerii Kravchuk helps MySQL and MariaDB users and DBAs to resolve their problems since 2005. Worked in MySQL AB, Sun, Oracle, Percona and, since 2016, in MariaDB Corporation. MySQL Community Contributor of the year 2019.


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #1

11:00 EDT

Getting Your Mind Around OpenSearch Geospatial Data
Flat earthers need not attend! This session will go through the basics of geospatial data in general, then go deeper on how it works in (and inside of) OpenSearch. The session will explain how to get geospatial information into OpenSearch, then how to query and visualize the data.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #8

11:00 EDT

High-Performance PostgreSQL
PostgreSQL is one of the leading open-source databases. Out of the box, the default PostgreSQL configuration is not tuned for any particular workload. The default configuration is designed in such a way that PostgreSQL can run on any system using minimum resources. Consequently, a default installation of PostgreSQL does not give optimum performance on the high-performance machine because it is set up to use all available resources.

PostgreSQL provides mechanisms that allow you to tune your database according to your workload and machine specification. Outside of PostgreSQL, though, we can tune the Linux kernel to allow the database load to work optimally. In this talk, we will learn how to tune some of PostgreSQL’s parameters, and we will see the effect of that tuning, but we will focus on demonstrating how to tune Linux for better Postgres performance. As there are so many Linux kernel parameters that can be tuned to improve the performance of PostgreSQL, I will also share the results of benchmarks obtained when tuning some of the Linux parameters.

Speakers
avatar for Ibrar Ahmed

Ibrar Ahmed

Sr. Software Architect, Percona
Ibrar Ahmed is a Software Architect in Percona LLC. Prior to coming to open source development, he had vast experience in software design and development. His main focus was on system-level embedded development. After joining EnterpriseDB in 2006, an Enterprise PostgreSQL company... Read More →


Wednesday May 12, 2021 11:00 - 12:00 EDT
Room #3

11:30 EDT

Supporting Global Scale MySQL Applications With Amazon Aurora Global Database
Critical MySQL workloads with a global footprint, such as financial, travel, or gaming applications, have strict availability requirements and may need to tolerate a region-wide outage. Traditionally this required difficult tradeoffs between performance, availability, cost, and data integrity. In this session, learn how Amazon Aurora Global Database, which allows a single Amazon Aurora MySQL database to span multiple AWS regions, replicates your data with no impact on database performance, enables fast local reads with low latency in each region, and provides fast disaster recovery from region-wide outages.

Speakers
avatar for Vijay Karumajj

Vijay Karumajj

Sr. Specialist Database Solutions Architect, AWS
Vijay Karumajj is a Senior Database Specialist Solutions Architect at Amazon Web Services. He has worked as a SQL Server DBA at Blue Cross Blue Shield, Verizon, and Southwest Airlines. He holds a Bachelor’s degree in computer science from Andhra University and a Master’s degree... Read More →


Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #10

11:30 EDT

MongoDB on Kubernetes in Large Scale E-Commerce Environments
Allegro is growing fast. Nobody sees it better than db admin. Environments with hundreds of users and their unique micro services, demanding over a dozen new databases every day, new cluster every week, and all that with maximum performance and resource isolation makes our job non trivial. I would like to present you a story of changing our architecture of bare bone physical servers setup for mongodb clusters, to fully automated, resource isolated architecture of micro clusters, based on statefulsets. How did we automate our processes, what obstacles we had to avoid, where we are today, and what are we gonna do next, to make mongodb + k8s even more flawless.

Speakers
avatar for Krzysztof Grzempa

Krzysztof Grzempa

Systems Engineer, allegro.pl
Systems Engineer, devops with more that 15 years expierience. Passionate in automating things, making architecures being even more scalable. OpenSource enthusiast. Member of a team, which is resposible for nosql databases deployment and management processes in a large e-commerce company... Read More →


Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #2

11:30 EDT

Postgres HA in the Hybrid Cloud, a Look Under the Hood of Implementing Patroni on Multiple Clouds
Patroni has quickly become recognized as the state-of-the art "high-availability" standard for running Postgres in mission-critical enterprises and public clouds. But Patroni is not a fully architected software system, but a template of best practices to implement a highly-available architecture for running Postgres.

In this presentation we will share some of the early experiences building on the original Patroni open-source project, key architecture and design advantages over alternative approaches. Building a close relationship with the core Nutanix storage platform, we were able to gain some significant performance improvements and satisfy very stringent requirements in failover time and data protection capabilities.

Additionally, we will share some of our recent experiences adopting the recent release of Patroni 2.0 and implementing more advanced capabilities.

Speakers
avatar for Manish Pratap Singh

Manish Pratap Singh

Staff Software Engineer @ Nutanix Era, Nutanix Inc.
Manish leads the open-source database team in the Nutanix Era product group, managing the implementations of Postgres, MySQL and MariaDB. Previous to joining Nutanix, Manish was a Senior MTS at Oracle, working on the Parallel Query framework in the Oracle database engine.
avatar for Mehboob Alam

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.
Mehboob is a long-time open-source advocate and evangelist in the Postgres community, co-organizer of various community meetups and the annual global Postgres US conference. At Nutanix, he guides the development and support of Postgres in the Era DBaaS platform and helps customers... Read More →


Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #6

12:00 EDT

The Changing Face of Open Source Database Software Adoption. How the Market Changed in the Last 12 Months.
In this wide-ranging state of the market keynote, Peter will be discussing a range of recent themes and developments. These include: 1) the overall growth of open source and how this might have been impacted by Covid-19. 2) The role of the public cloud - is this be good or bad for open source? 3) The top reasons companies choose open source software. 4) Whether licensing changes could change the direction of open source software. 5) Why your company should be actively contributing to open source.

Speakers
avatar for Peter Zaitsev

Peter Zaitsev

CEO & Co-founder, Percona
Peter Zaitsev is CEO and co-founder of Percona. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the... Read More →


Wednesday May 12, 2021 12:00 - 12:20 EDT
Keynote

12:20 EDT

Driving Open Source Forward in a Commercial Environment - The Challenges and Opportunities for Open Source Software
Amanda will go through her extensive research in the business of open source, the commercial and revenue models in open source, and her personal experience of over a decade working around open source and legal discussions. She will look at the events of the last decade of open source commercialisation and cover the future of a healthy open source ecosystem.

Amanda is a chapter author and the editor of the forthcoming book "Open Source, law, policy and practice" to be published by Oxford University Press in October. The contents will also be available under Open Access thanks to the sponsorship of the Vietsch Foundation.

Speakers
avatar for Amanda Brock

Amanda Brock

CEO, OpenUK
Amanda is CEO of the UK body for Open Technology, being open source software, open hardware and open data, OpenUK; European Representative of the Open Invention Network; OASIS Open Projects' Advisory Council Member (open source and open standards); Advisory Board Member KDE; Charity... Read More →


Wednesday May 12, 2021 12:20 - 12:40 EDT
Keynote

12:40 EDT

Licenses: Ethical Framework, Business Model, Neither, Both?
Almost 25 years ago, the "open for business" and "free as in freedom" software camps split apart. Since then, the two groups have used largely identical copyright licenses to reach largely divergent ethical and business goals.

That tension has ebbed and flowed ever since, and has now reached a recent peak with two new groups of licenses that break the old rules, trying to encourage investment—or protect human rights.

In this talk, we'll review the history of these conflicts, and give the audience some tools to help understand what these new licenses mean for our industry and our future.

Speakers
avatar for Luis Villa

Luis Villa

General Counsel, Tidelift
Luis Villa is co-founder and general counsel at Tidelift. Previously he was a top open source lawyer advising clients, from Fortune 50 companies to leading startups, on product development, open source licensing, and other matters. Luis is also an experienced open source community... Read More →


Wednesday May 12, 2021 12:40 - 13:00 EDT
Keynote

13:00 EDT

WarpSQL - a distribution of MySQL 8 with columnar storage, bitmap indexing, and parallel query execution
WarpSQL is a distribution of MySQL 8 which includes the WARP storage engine and has parallel query execution features for queries that use WARP tables. The WARP storage engine features automatic creation of compressed bitmap indexes, which can be used to eciently execute complex queries that can not be executed eciently using traditional btree indexes, even over very large amounts of data. Columnar storage means that IO eciency is improved for queries that access a subset of columns in a table, because only columns accessed by a query are read from disk. WarpSQL also features parallel query execution, which can increase query performance substantially, by using multiple cores to execute queries. Performance improvements of 100x or more compared to the InnoDB storage engine are possible for queries in standard benchmarks, like the star schema benchmark. This talk will demonstrate how the WARP storage engine works, including a discussion of bitmap 2 of 4 Speakers indexes as compared to traditional btree indexes, and will oer benchmark comparisons between regular MySQL 8 and WarpSQL.

Speakers
avatar for Justin Swanhart

Justin Swanhart

CEO, LeapDB, LLC
Justin Swanhart has over 20 years of experience working with database technologies. He is the author of WarpSQL, ShardQuery, Flexviews, and PHP-SQL-Parser. Currently the CEO of LeapDB, LLC, a company developing software as a service MySQL and MariaDB solutions for materialized views... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #7

13:00 EDT

Getting Started With Stateful Applications in Kubernetes
Kubernetes and the cloud-native ecosystem have simplified operations by providing a set of robust, open-source abstractions over traditional infrastructure. These abstractions let you consume multiple layers of your application stack. For stateful applications, this includes raw storage like volumes, file systems, and object storage, but also includes higher level primitives like databases and other data services.

To run a stateful application in Kubernetes, you'll need to compose and potentially expand on these primitives. For those new to the space, it can be overwhelming.

In this example-driven session, I'll discuss the current state of state in Kubernetes, giving you all the building blocks and pointers you'll need to successfully develop and operate stateful, cloud-native applications. This talk is targeted towards anyone interested in stateful, containerized applications.

Speakers
avatar for Tom Manville

Tom Manville

Director of Engineering, Kasten by Veeam
Tom graduated with an M.S.E from the University of Michigan in 2013. His first job was on the server team at Maginatics, cloud based file system company which was acquired by EMC late in 2014. After the acquisition, he joined Dropbox where he was focused on improving the efficiency... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #4

13:00 EDT

MariaDB 10.6 - What's New?
MariaDB's 10.6 is coming with a lot of improvements, specifically centered around performance. Although the contents of this talk will depend on exactly what makes it into MariaDB 10.6 GA, the following topics will be covered.
  • Atomic DDL in MariaDB
  • Optimizer changes
  • InnoDB changes
  • Oracle compatibility changes and parser changes

Speakers
avatar for Vicențiu Ciorbaru

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation
Vicentiu works at the MariaDB Foundation as a Software Engineer and Team Lead. He focuses on optimizer development, but has also worked on other parts of the MariaDB Server. Vicențiu has been part of the MariaDB ecosystem since 2013, where he first contributed Roles to MariaDB. Over... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #9

13:00 EDT

Using Percona Audit Plugin in Daily Operation
Log audits have for some become very important for various reasons. Perhaps you need to provide documentation for who had access to your database, perhaps you need to investigate a data breach. Many other scenarios may apply, and for this purpose Percona provides a plugin for MySQL for providing just that!
In this session we'll have a look at how to enable the audit plugin, how to configure it, and - most importantly - collect the logs and import them to a ClickHouse database for storage and analysis. And we'll have a look at some of the possibilities this provides.

Speakers
avatar for Lenny Andersen

Lenny Andersen

DBA, Norlys
I have been using MySQL since 2003 and have been a MySQL DBA since 2013 with a constant focus on security and performance.


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #1

13:00 EDT

Something Went Wrong: Understanding Alerting and Anomaly Detection
Being informed of something going wrong after it has happened is never ideal. Setting up a system to monitor logs and metrics allows you to be proactive rather than reactive. This session will cover the differences between alerting and anomaly detection, how each works, and where they are best employed.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #8

13:00 EDT

Sharding All The Way Down: Building Fast and Highly Concurrent Databases on Modern Hardware
In the last 20 years our data systems have been growing both in terms of data and throughput. However, common database design is based on architectures dating 30 years ago and did not keep up with the changes in modern hardware. The open source Seastar framework has been used by Scylla and other projects to squeeze every last bit of performance from modern hardware unlocking unprecedented vertical scalability. This talk showcases the unique architecture used and its implications for modern database design.

Speakers
avatar for Avishai Ish-Shalom

Avishai Ish-Shalom

Developer Advocate, ScyllaDB
"In a world where anything has an API, everything is a software problem" this insight has guided Avishai Ish-Shalom throughout his diverse career working on improving the complex socio-technical systems that create and operate modern software and promoting the use of Mathematics in... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #5

13:00 EDT

Arduino Direct Connection to Percona Server.
Arduino and Percona Server. I will show how to set up an Arduino Uno/Leonardo to directly communicate with Percona Server. This approach will require an Ethernet shield or an Arduino with built-in WiFi. Many current approaches use phpMyAdmin as a middle man, between the Arduino and Percona Server. In my approach we remove the need for phpMyAdmin. This is a great way to set up a weather station to record data directly to Percona Server. A weather station would be one use case. Other types of sensors could be used to record data. I will explain the Arduino Sketch, the database setup and setting up the Arduino and Sensor. This is a great way to learn MySQL, Arduino and connecting sensors for data collection.

Speakers
avatar for Walter W Leutwyler

Walter W Leutwyler

Principal Engineer, Optum
When I'm not working with MySQL or other Open-source software packages. I like to do woodworking, 3D design and Printing, listening to all forms of Metal music and electronic projects with Raspberry PI, Arduino. I live in Powell Ohio, with his wife, daughter, 6 cats and 2 dogs.


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #6

13:00 EDT

PostgreSQL HA With Patroni: Looking at Failure Scenarios and How the Cluster Recovers From Them
Over the last few years, Patroni has established itself as the main reference for PostgreSQL High Availability and we are witnessing its adoption increase among our customers following this trend in popularity. It has not been, however, a straightforward journey for many of them: the robustness of the Patroni project was built on top of solid layers provided by different components and it is necessary to understand how they connect with each other and what happens when one of them "breaks".

In this talk, we are going to review the most common failure scenarios and how Patroni recovers the PostgreSQL cluster in each of these cases. We'll also discuss the most common mistakes we have observed and how you can avoid these.

Speakers
avatar for Fernando Laudares Camargos

Fernando Laudares Camargos

Senior Support Engineer, Percona
Fernando joined Percona in early 2013 after 8 years working for a Canadian company specialized in Linux and Open Source technologies. As a member of Percona's Support team, Fernando works closely with customers helping them troubleshoot issues with MySQL, PostgreSQL, and MongoDB servers... Read More →
avatar for Jobin Augustine

Jobin Augustine

PostgreSQL Escalation Specialist, Percona
Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open... Read More →


Wednesday May 12, 2021 13:00 - 13:30 EDT
Room #3

13:00 EDT

Push-button deploy MongoDB with Ansible
Installing a large number of MongoDB servers can be quite challenging, especially for the newcomer.
In this talk, we will show you the details about the project of creating a push-button approach to deploy sharded MongoDB clusters and replica sets using Ansible.
We will start with a quick introduction to Ansible, and show you the most interesting parts of the actual playbook code, as well as the benefits of this approach.
You will leave this session with some ideas you can reuse in your particular environment.

Speakers
avatar for Ivan Groenewold

Ivan Groenewold

Architect, Percona
Ivan has been supporting mission-critical environments for top-of-the-line companies for over 15 years. Starting as a system administrator, he eventually became deeply involved with the open source ecosystem. Ivan has experience using diverse database technologies like Oracle, MySQL... Read More →
avatar for Kim Thomas

Kim Thomas

AppOps - OpenSource Database Team, FISERV
I am a DataBase Architect at FISERV, primarily responsible for Delivering Open Source Database solutions across the Enterprise. Knowledgeable in DBaaS, Relational, Columnar, Big Data, OLAP, OLTP, NoSQL, and DB Operator Technologies. Have been working with Various Database Technologies... Read More →


Wednesday May 12, 2021 13:00 - 14:00 EDT
Room #2

13:30 EDT

DuckDB: Embedded Analytics with Parallel/Vector/Columnar Performance
DuckDB is a project coming out of CWI in the Netherlands that combines vector, columnar, and parallel capabilities.
Highlights:
  • Great performance for analytic queries
  • Fast batch load
  •  Near linear scaling  
  • Near-zero administration
DuckDB is a SQLite replacement for analytics:
                                    TRANSACTIONS | ANALYTICS
                                    -------------------------------------------
EMBEDDED/SERVERLESS             SQLite | DuckDB
                                    -------------------------------------------
    SERVER PROCESS     MySQL/MariaDB |
                                                      Postgres | Columnstore
In this session you will learn about:
  • What is different about Embedded/Serverless data engines?
  • Current performance and features of DuckDB
  • Best practices using DuckDB

Speakers
avatar for Jim Tommaney

Jim Tommaney

Staff Performance Engineer, Databricks
Currently performance tuning at Databricks. Previously architect of InniDB (now available as MariaDB Columnstore) Hands-on experience delivering solutions across verticals including Telco, Mobile Marketing, CRM, Genomics, fraud analytics and more.


Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #7

13:30 EDT

Creating Chaos in Databases
In this talk, I will discuss tools for advanced testing of Distributed Databases. Kubernetes offers new interesting ways on how to emulate failures or even chain of catastrophic events like disk or network errors.
Using https://chaos-mesh.org/ Chaos Mesh I will emulate failures in Percona XtraDB Cluster Kubernetes Operator and will show how Operator recovers (or not) from them. This benefits not only database kernel developers but also is helpful to understand how your application behaves under failures.

Speakers
avatar for Vadim Tkachenko

Vadim Tkachenko

CTO, Percona
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Ocer. He leads Percona CTO Labs, which focuses on technology research and performance evaluations of Percona and third-party products, designing hardware, lesystems, storage engines, and databases that surpass... Read More →


Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #1

13:30 EDT

Databases in the Microservices World
Web technologies have come leaps and bounds. But are you still using the tired old database from the last generation? Let's look at the methodology of microservices, compare it to bounded contexts, and look at ops tasks for micro-databases. Let's tour all the flavors of databases, understand their pros and cons, and when you would choose it. You'll leave with a roadmap for moving from data-monolith to micro-databases.

Speakers
avatar for Rob Richardson

Rob Richardson

Developer Advocate, Cyral
Rob Richardson is a software craftsman building web properties in ASP.NET and Node, React and Vue. He’s a Microsoft MVP, published author, frequent speaker at conferences, user groups, and community events, and a diligent teacher and student of high quality software development... Read More →


Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #5

13:30 EDT

PostgreSQL and Monitoring Boosted With Infrastructure as Code
PostgreSQL is probably one of the most powerful OSS RDBMS.

Working in the clouds, you may wonder how to efficiently deploy PostgreSQL.

Once deployed, how about monitoring and performance? Another headache?

What if we could build everything at once using Infrastructure as code technology, few minutes later and everything is up and running, properly configured and ready to use!

If this sounds like a dream for you then please allow me to make your dream come true!

In this talk, I’ll show you how using only Infrastructure as code technology we can efficiently and blazing fast deploy a PostgreSQL instance, configured, database loaded, monitoring and performance built in! With just a single tool nothing more.

Because obviously Infrastructure as code technology isn't only meant for DevOps , DBA can (should) leverage it :)

Speakers
avatar for Rachid Zarouali

Rachid Zarouali

Cloud Architect, sevensphere
Rachid Zarouali is a Microsoft MVP and Docker Captain, international speaker and trainerIn his previous roles as head of the infrastructure team for the French registry and C.I.O of a worldwide recognized CRM and E-COMMERCE agency,he recognized the need to bring the latest technology... Read More →


Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #3

13:30 EDT

PostgreSQL Network Filter For EnvoyProxy
How do you monitor Postgres? What information can you get out of it, and to what degree does this information help to troubleshoot operational issues? What if you want/need to log all the queries? That may bring heavy trafficked databases down.

At OnGres we’re obsessed with improving PostgreSQL’s observability. So we worked together with Tetrate folks on an Envoy’s Network Filter extension for PostgreSQL, to provide and extend observability of the traffic inout a cluster infrastructure. This extension is public and open source. You can use it anywhere you use Envoy. It allows you to capture automated metrics and to debug network traffic. This talk will be a technical deep-dive into PostgreSQL’s protocol decoding, Envoy proxy filters and will cover all the capabilities of the tool and its usage and deployment in any environment.

Envoy [1] is a high performance C++ distributed proxy designed for single services and applications, as well as a communication bus and “universal data plane” designed for large microservice “service mesh” architectures. Built on the learnings of solutions such as NGINX, HAProxy, hardware load balancers, and cloud load balancers, Envoy runs alongside every application and abstracts the network by providing common features in a platform-agnostic manner. When all service traffic in an infrastructure flows via an Envoy mesh, it becomes easy to visualize problem areas via consistent observability, tune overall performance, and add substrate features in a single place.

Envoy can be used to proxy connections to PostgreSQL instances and in this talk we’ll see how we improve PostgreSQL observability without impacting the performance of the database and without needing to install and/or configure a bunch of things like logs, pgstatstatements, etc, using a Network Filter [2] for PostgreSQL we developed that decodes frontend and backend protocol to get transparently some metrics and metadata about it operation.

Even through an encrypted connection we can grab the metrics because the Postgres Network Filter have the ability to terminate SSL on Envoy [3]. This is a new cool feature for the upcomping 1.18 release of EnvoyProxy that is expected to March 31th [4].

Roadmap:
* Integrate Postgres parser to improve dynamic metadata and per-query tracking
* Individual (per-query) tracking of query performance
* Traffic mirroring for Postgres major upgrade testing and validations

[1] https://www.envoyproxy.io/
[2] https://www.envoyproxy.io/docs/envoy/latest/intro/archoverview/otherprotocols/postgres#arch-overview-postgres
[3] https://github.com/envoyproxy/envoy/commit/1aa31dd9ca07f88029101bdecca12173930cf342
[4] https://github.com/envoyproxy/envoy/blob/main/RELEASES.md#release-schedule

Speakers
avatar for Fabrízio de Royes Mello

Fabrízio de Royes Mello

PostgreSQL Developer, OnGres Inc
Currently help people and teams to take the full potential of relational databases, especially PostgreSQL, helping them to design the structure of the database (modeling), build physical architecture (database schema), programming (procedural languages), SQL (usage, tuning, best practices... Read More →
avatar for Álvaro Hernández

Álvaro Hernández

CEO, OnGres Inc
Álvaro is a passionate database and software developer. He founded and works as the Founder & CEO of OnGres (https://ongres.com). He has been dedicated to PostgreSQL and R&D in databases for two decades.Website: https://aht.esAn open source advocate and developer at heart, Álvaro... Read More →


Wednesday May 12, 2021 13:30 - 14:00 EDT
Room #6

13:30 EDT

Collaboration in Open Source: A Q&A on Github, Jira, Zulip and Knowledgebase
This is an extended, interactive panel / Q&A version of the 20 min talk on Collaboration in Open Source. Whereas the 20 min talk sets expectations, this session in the MariaDB Community Room goes in depth and solicits input from the audience. How can MariaDB Foundation improve its level of interactivity and collaboration with its developer community?

Speakers
avatar for Kaj Arnö

Kaj Arnö

CEO, MariaDB Foundation
Kaj Arnö is CEO of the MariaDB Foundation. He is a software industry generalist, having served as VP Professional Services, VP Engineering, CIO and VP Community Relations of MySQL AB prior to the acquisition by Sun Microsystems. At Sun, Kaj served as MySQL Ambassador to Sun and Sun... Read More →
avatar for Ian Gilfillan

Ian Gilfillan

Principal technical writer: documentation, MariaDB Foundation
Ian first came across MySQL in the 90s, upgrading from mSQL while developing South Africas' first online grocery store, and teaching and developing internet programming courses. He was lead developer for South Africa’s largest media company from 2000, and wrote the book Mastering... Read More →
avatar for Robert Bindar

Robert Bindar

Server Developer, MariaDB Foundation
Robert started working for the MariaDB Foundation in 2018 as a server developer. His main focus is divided between server development and helping the community contribute faster and more efficiently to the MariaDB codebase. Robert is based mostly in Brasov, Romania.
avatar for Vicențiu Ciorbaru

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation
Vicentiu works at the MariaDB Foundation as a Software Engineer and Team Lead. He focuses on optimizer development, but has also worked on other parts of the MariaDB Server. Vicențiu has been part of the MariaDB ecosystem since 2013, where he first contributed Roles to MariaDB. Over... Read More →
avatar for Anna Widenius

Anna Widenius

Chief of Staff, MariaDB Foundation
Anna Widenius is a Chief of Sta in the MariaDB Foundation.


Wednesday May 12, 2021 13:30 - 14:30 EDT
Room #9

13:30 EDT

Tricks Of The Trade
TRICKS OF THE TRADE; A COLLECTION OF TECHNIQUES ADRESSING COMMON ADMINISTRATION MISTEPS AND MISTAKES

PostgreSQL is not only the most sophisticated open source database management system in world it's also among the most reliable and easy to setup. But even under the best of circumstances there are situations where things can just plain go wrong by making a wrong assumption. This purpose of this talk is to review the most common missteps and mistakes administrating a Postgres data cluster and how to prevent them from escalating into production-level issues.

For the purposes of this presentation, we will not cover query tuning per se.

We'll first start with the most common issues and gradually review some of the more esoteric challenges a DBA can encounter.

Here's a breakdown of the topics that will be covered:

- Host Based Authentication Rules
- rules that are never reached
- METHOD mangling
- appreciating "peer"
- too much "trust"
- the much maligned "reject"
- about password hashing: password vs md5 vs scram-sha-256
- SSL laxity i.e. host vs hostssl
- Over using the superuser
- SSL
- Certificates
- CA signed vs self-signed
- life span: too long vs too short
- About Ciphers: weak (peformance) vs strong (security)
- Replication
- postgres logging
- where to put it
- too much vs too little
- log rotation
- Over Allocation Of System Resources
- swap vs noswap
- Linux's OOM Process Killer
- some runtime parameters of interest
- max_connections
- effective_cache_size
- work_mem
- maintenance_work_mem
- Good autovacuuming hygiene

Speakers
avatar for Robert Bernier

Robert Bernier

PostgreSQL Consultant, Percona
Robert's experience extends several decades. His first experience was playing hangman on a DECwriter shortly after man first landed on the moon. His foray into commercial applications was programming Fortran, via punchcards, on an IBM 360 which in those days had 4MB RAM. Over the... Read More →


Wednesday May 12, 2021 13:30 - 14:30 EDT
Room #4

14:00 EDT

Change Data Capture (CDC) on Top of Statement-Based Replication (SBR)
Yes, that's right, here at Box, we were able to successfully create a Change Data Capture stream on top of Statement-based Replication. Although normally impossible, some quirks in our data access layer coupled with some unique usages of MySQL query comments, PGTID and Kafka have enabled us to successfully provide an at-least-once delivery event stream of changes in our sharded MySQL infrastructure. The sharded MySQL infrastructure lies at the heart of Box.com, made up of 100s of shards, 1000s of servers and billions of records. In this talk, I will take you through the implementation details as well as the challenges involved in building out our change stream.

Speakers
VM

Venkat Morampudi

Sr. Software Engineer, Box, Inc.
Venkat Morampudi is a Sr. Software Engineer on Database & Cache Infrastructure team at Box.


Wednesday May 12, 2021 14:00 - 14:30 EDT
Room #1

14:00 EDT

I’ve Got a Fever and the Only Prescription is Apache Druid
Digital transformation initiatives have unlocked large and fast-moving data sets including clickstreams, network telemetry, application monitoring and IoT devices. Analytics architectures have not kept pace, with most data still being run through existing “cold analytics” systems and tools designed for smaller and less time-sensitive workloads. “Hot analytics” denotes workloads where the responsiveness of the system is instantaneous and can support self-service data exploration, and where the data is extremely fresh, allowing for more informed decision-making.

The breadth of analytical systems in the world today demands a clear approach to selecting the right one for a given workload. In this talk, we’ll discuss a temperature-based way of thinking, where workloads get “hotter” as they become more interactive, more concurrent, and more likely to need up-to-the-second data.
Apache Druid is a modern cloud-native, stream-native, analytics database designed for workflows where fast queries and instant ingest are important. Druid excels at instant data visibility, ad-hoc queries, operational analytics, and handling high concurrency. It is a strong candidate for being the workhorse system for hot analytics.
In this session Rachel will discuss:
How to categorize your analytics workloads based on temperature.
The distinctive attributes of Apache Druid that recommend it for hot analytics where query speed and data freshness is paramount.

Speakers
avatar for Rachel Pedreschi

Rachel Pedreschi

Vice President, Community, Imply
A “Big Data Geek-ette,” No stranger to the world of high-performance databases and data warehouses, Rachel has more than 20 years of business intelligence and data engineering experience, and is a Cassandra, Vertica, Informix and Redbrick certified DBA on top of her work with... Read More →


Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #7

14:00 EDT

Venmo's Aurora Upgrades With Open Source Tools
Venmo's Aurora database clusters are the centerpiece of its success, but it's not without its own operational challenges like upgrading from one major version to another. To unlock performance efficiency and operational costs, we had to rely on a number of Open Source tools to make a successful non-event upgrade possible.

- Percona Monitoring and Management to measure query performance.
- pt-upgrade to validate queries between versions.
- ProxySQL for a virtually no downtime switchover/rollback.
- In-house tools to bridge some gaps in testing.

During this talk, we will piece together these tools and the process we followed to not let Venmo users down.

Speakers
avatar for Ashwin Nellore

Ashwin Nellore

Manager, Software Developer 3, Venmo (Paypal Inc)
avatar for Kushal Shah

Kushal Shah

MTS 1, Database Engineer, Paypal/Venmo
Kushal is MTS 1, Database Engineer at Venmo (Paypal) with focus on Database scalability and reliability. He has 7 years of expereince working on relational as well as NO SQL data stores like MySQL, MongoDB, DocumentDB, DynamoDB. Prior to Venmo, he has worked as Database Engineer at... Read More →


Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #2

14:00 EDT

The Top 5 Things You Should Know About Databases on Kubernetes vs VMs
Kubernetes is becoming the default infrastructure for deploying a variety of stateless and stateful services. So what are the pros and cons of bringing databases into a DevOps-style management strategy based on Kubernetes? Data systems are very performance sensitive and moving to new virtualization strategies is potentially hazardous. Moreover, Kubernetes doesn’t supply all the capabilities needed to run stateful data services -- you need to understand how to partition responsibilities between Kubernetes and the database itself. This means you can’t just stuff a database into a container and let it fly – YOLO! Instead, Kubernetes requires an entirely different approach to the curation of databases. VMware has deep expertise with virtual machines and Kubernetes. Our team also has committers and contributors to the PostgreSQL project. Our talk will focus on the different techniques of running databases in long-running VMs versus short-running containers. In particular, we’ll cover all of the lifecycle steps, not only the initial deployment experience, including:
  • How quickly can a database be provisioned?
  • What are the real differences in performance?
  • What it's like to scale up and scale down for each?
  • How does recovery compare when there's a failure?
This is an intermediate-level talk, so we'll skip the mechanics of deploying databases (helm charts, operators, etc). Bonus: lots of data and comparison analysis!

Speakers
avatar for Marco Nicosia

Marco Nicosia

Product Manager, Tanzu SQL, VMware
Marco Nicosia is Product Lead for VMware Tanzu SQL Software, which includes MySQL and Postgres. He has been leading Cloud Foundry’s MySQL project since 2015 and Postgres since 2019. He was an early technical leader in the PaaS industry while at Engine Yard, serving clients such... Read More →
avatar for Rachel Heaton

Rachel Heaton

Software Engineer, Tanzu SQL, VMware
Rachel has worked at consultancies, startups, and larger companies in a variety of roles, like head of engineering, manager, and tech lead. She’s currently working at VMware on our Postgres products.
avatar for Adam Berlin

Adam Berlin

Software Engineer, Tanzu SQL, VMware
Adam Berlin is a jack-of-all-trades software engineer with experience coaching and consulting. He is currently a member of the VMware Tanzu SQL with Postgres for Kubernetes engineering team.


Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #6

14:00 EDT

Scaling Large Tables in PostgreSQL With Declarative Table Partitioning
When a table gets too large, performance and maintenance are heavily affected. Splitting the table into multiple partitions to achieve the desired performance.

Table partitioning has been supported in PostgreSQL for many years as a design pattern of table inheritance, which is complex to use correctly, and didn't benefit from any parallelism. Since PostgreSQL version 10, there is support for declarative table partitioning, having new features in improvements in later versions. Table partitioning is now much easier to use and there are more use cases covered. In this talk, we will review with concrete examples how you would benefit from table partitioning, how to use declarative partitioning, and what are the implications of taking some decisions when designing the schema.

Speakers
avatar for Boriss Mejias

Boriss Mejias

Solution Architect, EDB
I'm a holistic system software engineer (officially known as Solution Architect), PostgreSQL consultant and trainer, free software activist, and headbanger. I have been working with PostgreSQL since version 9.1. First, as part of my job related to other projects, and with full dedication... Read More →


Wednesday May 12, 2021 14:00 - 15:00 EDT
Room #3

14:30 EDT

SQL Without the Database? Stream Instead of Storing
Why store when you can stream? Modern open source streaming platforms like Apache Kafka provide means of storing and processing event driven data. When combined with streaming analytics projects like Apache Flink, many business applications, and event driven microservices may not even need a database storage layer. With the rise of streaming SQL engines, in particular FlinkSQL, developers with SQL and database skills can use these to build complex event processing and advanced analytics applications. There are performance advantages to being able to run queries before data lands in a database. However, full solutions often need long term storage as well for regulatory purposes, forensic purposes or to train machine learning to work better in the stream. This talk will cover new approaches to data architecture, how it ts in to existing event and traditional database applications, and provides practical examples, and approaches to operating systems built on streaming on cloud, using open source

Speakers
avatar for Simon Elliston Ball

Simon Elliston Ball

Senior Product Manager for MSK, AWS
Simon has been working on streaming data analytics for many years, at Hortonworks, the Cloudera, and now with Amazon Web Services, where he is the Product Manager for Amazon Managed Streaming for Apache Kafka. He also managed contributions to Flink and Kafka for Cloudera, and is an... Read More →


Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #5

14:30 EDT

Database Backup and Import With Kanister
Backup and migration of database systems running in Kubernetes can be complicated. Kanister, an open source tool from Kasten, simplifies the process. Kanister offers tools for exporting and importing databases from cloud storage - greatly simplifying backup and migration of databases running cluster databases.

This talk will introduce Kanister and discuss basic use. Examples will be given for the migration of a Postgres database between clusters. The talk will also introduce use for disaster recovery and setup of development clusters.

Speakers
avatar for Aaron H Alpar

Aaron H Alpar

Member Technical Staff, Kasten by Veeam
Aaron has extensive experience designing high-performance, domain specifc, database systems and transaction systems for sparsly distributed databases. He currently works at Kasten (unit of Veeam) as a Member of Techinical Staff.


Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #4

14:30 EDT

JSON Additions in MariaDB - Featuring JSON_TABLE
MariaDB has had JSON support for a while now. Released initially in 10.2, MariaDB tries to follow the SQL standard as close as possible. One of the new additions coming to MariaDB 10.6 is support for JSON_TABLE. In this talk we will go through the details of this new feature, as well use cases and interactions with JSON path. We will also compare MariaDB's implementation to other databases, so that you are aware of pitfalls if a migration is due. On the topic of migration, MariaDB has also introduced a data type plugin that understands MySQL's binary JSON format and coverts it to MariaDB's text based representation, without needing to do a full dump and restore.

Speakers
avatar for Vicențiu Ciorbaru

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation
Vicentiu works at the MariaDB Foundation as a Software Engineer and Team Lead. He focuses on optimizer development, but has also worked on other parts of the MariaDB Server. Vicențiu has been part of the MariaDB ecosystem since 2013, where he first contributed Roles to MariaDB. Over... Read More →


Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #9

14:30 EDT

Data Protection for Rapid Recovery at Scale
Some things can’t scale in the cloud. When you are trying to get all the performance out of your systems for SaaS and IaaS instances, RAID 0 seems like a good option. What do you do when you have 60 servers go down due to an SSD failure? In this session you will learn about new breakthrough data protection technology for SSDs that gives you better performance during a drive rebuild than you can get from RAID 0 and from a smaller footprint.

Speakers
avatar for Steve Fingerhut

Steve Fingerhut

President & CBO, Pliops
Steve has built multiple new technology businesses to over a billion in annual revenue. His experience includes SVP/GM Toshiba Memory America’s SSD Business Unit, VP Marketing SanDisk’s Enterprise SSD division, Co-founder and VP Marketing LSI’s Accelerated Solutions Division... Read More →


Wednesday May 12, 2021 14:30 - 15:00 EDT
Room #1

15:00 EDT

Fun and Games: Why We Picked ClickHouse To Drive Gaming Analytics at GiG
With such a saturated market for the iGaming industry, many technology providers are reviewing their architecture to provide a leaner, more sustainable platform. Complex architectures that involve heavy license and high maintenance costs were two of the rules to avoid when GiG kicked o R&D on what technology to choose at the heart of GiG Data. Tied with the necessity of good governance whilst empowering stakeholders needing real-time data, the list of databases became shorter and shorter.

With no vendor-locking and costly licenses ClickHouse came out a winner at being the best candidate for any business looking to host an on premise realtime database for analytics. Stephen and Matthew will be elaborating on how ClickHouse 2 of 5 Speakers became the database of choice.

Speakers
avatar for Stephen Borg

Stephen Borg

Director of Data, Gaming Innovation Group
Stephen has had a career in technology for online and retail gambling, having worked close to business for a number of B2C and B2B providers. Sound background in technology, delivering affiliate and gambling platforms using both .NET and Java frameworks. For the past 8 years Stephen... Read More →
avatar for Matthew Formosa

Matthew Formosa

Enterprise Data Architect, Gaming Innovation Group
Matthew is an experienced Data Engineer having worked on several Big Data applications within various domains, including Gaming and FinTech. Being an AWS certified solutions architect, he is naturally highly comfortable working with AWS solutions, as well as Apache Spark, Apache Kafka... Read More →


Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #7

15:00 EDT

Running and Scaling MongoDB on Kubernetes
Percona is committed to deliver its software on various platforms and operating systems, including Kubernetes. Percona Kubernetes Operator for Percona Server for MongoDB allows do deploy, manage and easily scale MongoDB clusters on Kubernetes with ease. We are going to demonstrate how to do it, share some tips and tricks about managing MongoDB on Kubernetes.

Speakers
avatar for Sergey Pronin

Sergey Pronin

Product Owner, Percona
Sergey is a passionate technology "driver". After graduation worked in various fields: internet service provider, financial sector and M&A business. Main focal points were infrastructure and products around it. At Percona as a Product Owner drives forward Kubernetes and Cloud databases... Read More →


Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #3

15:00 EDT

Virtual Work and Leadership in the Time of Pandemics
Virtual work and leadership in the time of pandemics MariaDB Foundation has been a virtual organization long before the pandemic struck. That gave us a head start in finding ways to adapt team leadership and management to the new times. Team leadership and management are different in a virtual organization, and in this session, we will talk about best practices within development teams. Meeting practices, chat tools, emails, zoom meetings are all affected by the developers spending now also their private lives in isolation – not just their business lives. How do we create humane working conditions for developers and DBAs?

Speakers
avatar for Anna Widenius

Anna Widenius

Chief of Staff, MariaDB Foundation
Anna Widenius is a Chief of Sta in the MariaDB Foundation.


Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #9

15:00 EDT

Why We Chose Trino. Choosing, Using, and Extending Trino (fka PrestoSQL) For a Primary Datastore
There are a lot of capture-first pipelines out there, which are very good at squirreling data away, but are relatively slow or cumbersome for queries. For example, Kafka and Pulsar are great for write performance, but horribly slow at scanning all the data in the queue.

For a query-first architecture, a different mindset and approach is required. You can’t build the whole data pipeline and then hope to tune the queries after the fact. Instead, you model the query behaviors you hope to achieve first, and then work backward to define ingestion and indexing requirements. Build it fast, keep it fast. Enter Trino, a distributed query engine that is an ideal starting point for a query-first data architecture like ours. To use it, we built a custom memory connector to use Trino as a primary memory store. An unusual, but fun, use for Trino.

We’ve developed Trino connectors that are optimized to work with local data, so that there is no network hop between the query engine and the data being computed. This gives a 5-20X improvement for our workloads compared with running against even the fastest remote datastores. I’ll walk through the discovery process to get to Trino, and how we built a new Trino connector.

Speakers
avatar for Rob Dickinson

Rob Dickinson

CTO, Resurface Labs
Co-founder and CTO at Resurface Labs, Rob lives and breathes databases. Years at Intel, Dell, and Quest Software, honed his database design skills and his immersion into open source databases was forged by the need to architect and build a scalable solution to solve for customer escalation... Read More →


Wednesday May 12, 2021 15:00 - 15:30 EDT
Room #6

15:00 EDT

Deploying a Sharded Vitess Sandbox Cluster in Public Cloud Kubernetes in 10 Minutes
Learning about Vitess ("A database clustering system for horizontal scaling of MySQL" - https://vitess.io) is straightforward enough, as is running the Get Started demo on your computer. But once you want to start scaling out a sandbox cluster, or want to run realistic benchmarks against your schema design (both of which are hard to do on a personal computer), setting up a full cluster in a pinch seems daunting... or is it?

I'm here to show you, with a live demo/tutorial, that deploying and evaluating a Vitess sandbox cluster, into a public cloud environment, can be done super easily. In fact, my aim is to bootstrap a fully functioning cluster within 10 minutes of starting the demo.

With the remaining demo time, I will demonstrate other Vitess operations, such as:
* Scaling up and down the cluster
* Increasing and decreasing the number of shards without losing data
* Configure zonal SSDs for MySQL
* Backup and restore (so you can shut down the cluster to save money or discard an experiment, then bring it back up again with the original data)
* Deploy the experimental Vitess orchestrator component
* Planned and unplanned failovers
* Automatic rolling upgrades, and controlled rolling upgrades
* Metrics, dashboards

Even with the best possible documentation (and the Vitess documentation is quite good!), getting a fully working cluster, experimenting with it, and getting everything configured the way you want can involve a bunch of trial and error. I hope that my demo can help you bypass some of the more boring trial-and-error, and get running more quickly with your Vitess evaluation.

For this demo, I will be using the excellent open-source Vitess-operator for Kubernetes, provided by PlanetScale. Even if you aren't considering deploying Vitess on Kubernetes in production, I still highly recommend it for sandbox use. Deploying an arbitrary number of components is super trivial with the operator, and everything auto-wires automatically. No need to delay your evaluation by needing to manually bootstrap a cluster one node at a time, or write your own deployment tools.

Speakers
avatar for Jordan Moldow

Jordan Moldow

Staff Software Engineer, Box, Inc.
Jordan Moldow is a Staff Software Engineer on Box’s Database Tools and Automations team. After earning MIT BS degrees in CSE and mathematics in 2014, Jordan moved to California to join Box. Jordan and his teammates focus on backend database infrastructure, providing the tools, intermediate... Read More →


Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #4

15:00 EDT

Should You Run Databases Natively in Kubernetes?
Kubernetes has hit a home run for stateless workloads, but can it do the same for stateful services such as distributed databases? Before we can answer that question, we need to understand the challenges of running stateful workloads on, well anything. In this talk, we will first look at which stateful workloads, specifically databases, are ideal for running inside Kubernetes. Secondly, we will explore the various concerns around running databases in Kubernetes for production environments, such as:
  • The production-readiness of Kubernetes for stateful workloads in general
  • The pros and cons of the various deployment architectures
  • How much performance may be lost when performing IO inside containers
  • The failure characteristics of a distributed database inside containers
In this session we will demonstrate what Kubernetes brings to the table for stateful workload and what database servers must provide to fit the Kubernetes model. This talk will also highlight some of the modern databases that take full advantage of Kubernetes and offer a peek into what's possible if stateful services can meet Kubernetes halfway. We will go into the details of deployment choices, how the different cloud-vendor managed container offerings differ in what they offer, as well as compare performance and failure characteristics of a Kubernetes-based deployment with an equivalent VM-based deployment.

Speakers
avatar for Karthik Ranganathan

Karthik Ranganathan

Founder and CTO, Yugabyte
Karthik was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Facebook. He is currently the co-founder... Read More →


Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #5

15:00 EDT

ARM Power! Comparing MySQL x86 vs ARM Performance
With the recent launch of Apple M1 chips and Amazon Graviton Processor, the discussion about ARM performance compared to x86 gained a lot of traction. Not only because it shows promising results in terms of performance, but when compared to the x86 instances, the costs are in general smaller.

In this talk we are going to discuss some scenarios where we compare the performance between instances that have the same cost and instances that have similar hardware capacities.

Lastly, we will check if the ARM MySQL ecosystem (backup tools, monitoring and others) is mature to support production workloads.

It is expected by the end of the session that DBAs, sysadmins and managers have a clearer idea about ARM capabilities compared to x86.

Speakers
avatar for Vinicius Grippa

Vinicius Grippa

Senior Support Engineer, Percona
Vinicius Grippa is a Percona Senior Support Engineer. Vinicius has a Bachelor's degree in Computer Science and has been working with databases for 12 years. He has experience in designing databases for mission-critical applications and in the last few years has become a specialist... Read More →


Wednesday May 12, 2021 15:00 - 16:00 EDT
Room #1

15:30 EDT

MariaDB ColumnStore – A Columnar Storage Engine, First Class Citizen in MariaDB
MariaDB has had ColumnStore (a columnar storage engine) available for a while now. The problem was that ColumnStore (formerly known as InniDB) was coded in such a way that required a custom version of MariaDB to function. The installation was also non-trivial, with quite a set of dependencies needed.
After a significant amount of work, both within MariaDB's codebase and ColumnStore's codebase, it is now possible with MariaDB 10.5 to simply load the ColumnStore plugin and run CREATE TABLE ... ENGINE=ColumnStore.
In this talk we will do an overview of the state of ColumnStore in MariaDB, discuss use cases as well as cover some implementation details to better understand performance implications when using ColumnStore.

Speakers
avatar for Vicențiu Ciorbaru

Vicențiu Ciorbaru

Team Lead, Senior Developer, MariaDB Foundation
Vicentiu works at the MariaDB Foundation as a Software Engineer and Team Lead. He focuses on optimizer development, but has also worked on other parts of the MariaDB Server. Vicențiu has been part of the MariaDB ecosystem since 2013, where he first contributed Roles to MariaDB. Over... Read More →


Wednesday May 12, 2021 15:30 - 16:00 EDT
Room #9

15:30 EDT

Optimizing and Troubleshooting MongoDB with PMM
In this presentation, we will show how you can utilize PMM (Percona Monitoring and Management) to monitor MongoDB and diagnose various issues that you can face running MongoDB whether stand alone or in a sharded cluster.
We will look at:
  • Identifying slow queries
  • Troubleshooting performance degradation
  • Looking for possible optimizations
After attending this presentation, you should be comfortable understanding how PMM can be used to work with MongoDB and can help in the daily lives of DBAs or developers in working with MongoDB

Speakers
avatar for Mike Grayson

Mike Grayson

MongoDB Database Engineer, Percona
Mike Grayson is a MongoDB Database Enginer at Percona, the unbiased open source database experts. Mike has been involved in many aspects of the MongoDB community since he started using the database in 2014. Heading the Western NY MongoDB User Group, blogging and being involved in... Read More →


Wednesday May 12, 2021 15:30 - 16:00 EDT
Room #2

15:30 EDT

Trino on Ice: Using Iceberg To Replace the Hive Table Format
Trino (formerly PrestoSQL) is a ludicrously fast query engine that evolved from the need to replace the slow query turnaround speeds of the Hive engine. Trino grew in popularity under the label of Presto for years as an interactive query engine that lives over your data lake. While this operation was certainly a step away from the initial big data days of waiting hours to days for queries to complete, there were still many tedious rules engineers had to follow in order to correctly create, manage, and use their data in the datalake due to the Hive table format.

Apache Iceberg, a table format created at Netix, aims to address many of these issues. Iceberg simplifies the life of the engineer by decoupling the logical view of the data from the physical layout of the data using techniques like hidden partitioning and allows for in-place schema migration of your tables. Iceberg also increases the speed at which you can query your system by tracking files at the le level versus the partition level and so much more. 2 of 3 Speakers

With the marrying of Trino and Iceberg, companies can take advantage of a full replacement of the big data days of old and move into the next generation of datalakes that simplify the mental load of their data engineers and focus on building out the business logic and other tasks. In this talk I will cover some of the examples of the issues Iceberg solves from the lens of Trino.

Speakers
avatar for Brian Olsen

Brian Olsen

Developer Advocate, Starburst
Brian is a U.S. Marine turned software engineer and developer advocate working to foster the open-source Trino community. Brian spent four years as a data engineer at a cybersecurity company working on pipeline maintenance and query optimization. While in this role, Brian was responsible... Read More →


Wednesday May 12, 2021 15:30 - 16:30 EDT
Room #7

15:30 EDT

Massive Data Processing in Adobe Using Delta Lake
At Adobe Experience Platform, we ingest TBs of data every day and manage PBs of data for our customers as part of the Unified Profile Offering. At the heart of this is a bunch of complex ingestion of a mix of normalized and denormalized data with various linkage scenarios power by a central Identity Linking Graph. This helps power various marketing scenarios that are activated in multiple platforms and channels like email, advertisements etc. We will go over how we built a cost-effective and scalable data pipeline using Apache Spark and Delta Lake and share our experiences.

* What are we storing?
* Multi Source - Multi Channel Problem
* Access Pattern to optimize for
* Custom High Performance Query engine
* Data Representation and Nested Schema Evolution
* PerformanceTrade Offs with Various formats
* Go over anti-patterns used
* (String FTW)
* Data Manipulation using UDFs
* Writer Worries and How to Wipe them Away
* Gotchas
* Concurrency
* Column size
* Update frequency
* Transaction Management for A Healthy State
* Staging Tables FTW
* Why we can't live without them
* Datalake Replication Lag Tracking
* Instrumentation of the data pipeline gives more confidence to the reader
* Downstream Data Pipelines
* Showcase easy building of incremental versions of applications
* Maintenance Jobs
* Go over essentials of compaction and vacuuming
* Performance Time!
* What scale are we operating at?
* Settings like autoCompact and optimizeWrite
* Timings With and Without Delta
* Cost

Speakers
avatar for Yeshwanth Vijayakumar

Yeshwanth Vijayakumar

Sr. Engineering Manager/Architect, Adobe Systems Inc
I am a Sr. Engineering Manager/Architect on the Unified Profile Team in the Adobe Experience Platform; it’s a PB scale store with a strong focus on millisecond latencies and Analytical abilities and easily one of Adobe’s most challenging SaaS projects in terms of scale. I am actively... Read More →


Wednesday May 12, 2021 15:30 - 16:30 EDT
Room #6

16:00 EDT

Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
For a considerable set of applications querying geographical data consists of a critical operation. Fast responses combined with a high level of accuracy are often the requirements when an application user interacts with functions/operations of the type “Give me near me” or “Find me in area XYZ”. Additional complexity is usually added when the points of interest are constantly on the move, like a public transportation vehicle or a taxi.

For applications that frequently access geographical data and rely on both speed and accuracy, both application and database design is crucial. In this presentation, we are going to focus on the database side. More specifically, we are going to evaluate three of the most popular open-source databases, MongoDB, Postgres, and Elastic against geospatial workloads. For each of these databases, we are going to examine the implementation and the performance of geo-queries. We are going to discuss best practices and design patterns for each database and try to find a winner among the three.

Speakers
avatar for Alex Cercel

Alex Cercel

Senior Database Engineer, Palantir technologies
A Red Hat Certified Architect that is also enjoying Windows and network administration and has an itch for Databases. Loves The Cloud. Currently working as an SRE for a team that is supporting a variety of datastores, mostly NoSQL. All round geek with a genuine passion for anything... Read More →
avatar for Antonios Giannopoulos

Antonios Giannopoulos

Senior Database Administrator, Rackspace Technology
I am working as Senior NoSQL Database Administrator at Rackspace supporting thousands of MongoDB installations over the past 7 years. I have 18 years experience in databases and system engineering. I really enjoy challenges in sharding and schema design and love migrations from Relational... Read More →
avatar for Pedro Albuquerque

Pedro Albuquerque

Staff Database Engineer, Wise (former TransferWise)
I have many years of working in various database technologies, which include relational and NoSQL platforms. I am currently focused on MariaDB, MongoDB and PostgreSQL datastores at Wise. Previously to Wise, I was focused on MongoDB at ObjectRocket by Rackspace, supporting customers... Read More →


Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #2

16:00 EDT

Debug a Kubernetes Operator
The goal of this live debugging session is to better understand how to work with a failing Kubernetes Operator and get used to some helpful Kubernetes commands.

Each of the three examples follows the same structure:

* Apply an invalid YAML manifest.
* Figure out what is wrong and how to fix it.
* Hints that may help solve the problem.
* A detailed walkthrough to understand and solve the problem.

Speakers
avatar for Philipp Krenn

Philipp Krenn

Developer Advocate, Elastic
Philipp lives to demo interesting technology. Having worked as a web, infrastructure, and database engineer for over ten years, Philipp is now a developer advocate and community team lead in EMEA at Elastic — the company behind the Elastic Stack consisting of Elasticsearch, Kibana... Read More →


Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #3

16:00 EDT

Going the distance
In every DBA's life, there is a point where data needs to be copied over great distances. This can be the other coast, or it can be another continent. It can be because of implementing a disaster recovery system, it can be because distant read replicas are needed for the application.
In this talk, we use examples with MySQL and xtrabackup to discuss the issues of long-distance copies and the potential performance tuning opportunities. I
n this talk, we will:
  • Examine the characteristics of transferring a large amount of data over WAN links.
  • Discuss compression and encryption options.
  • Check out options for copying an already existing backup.
  • Check out options for streaming backups on the y.

Speakers
avatar for Peter Boros

Peter Boros

Principal Architect, Percona
Peter is a Principal Architect in Percona's consulting team. He has been using and working with open source software from early 2000s. Peter's rst and foremost professional interest is performance tuning and large scale automation. Before rejoining Percona, Peter worked on large scale... Read More →


Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #6

16:00 EDT

Multi-colo Async Replication at LinkedIn
LinkedIn is a global site is served from multiple data centers (a.k.a colos). Member Data written at each data center is globally replicated to other data centers. To avoid write latency, we choose the replication to be async which has lead to a lot of problems related to conflicts. This talk is about why global replication is needed, how are we leveraging the multi-colo replication for site-up using [traffic shift](https://engineering.linkedin.com/blog/2017/05/trafficshift--load-testing-at-scale), how are we using Kafka to do multi-colo replication, how we architected our applications and the schema to minimize conflicts and finally how to handle conflicts in case if they arise.

So far LinkedIn has been using [Espresso](https://engineering.linkedin.com/espresso/introducing-espresso-linkedins-hot-new-distributed-document-store) and Oracle as primary data stores. There are already tools developed for handling the multi-colo replication which are covered in this talk. MySQL is growing very rapidly at LinkedIn and we are in search of an open source and reliable async multi-colo replication. I hope this talk may stress the need for it and let the open source community come up with good solutions.

Speakers
avatar for Karthik Appigatla

Karthik Appigatla

Staff SRE, LinkedIn
Karthik Appigatla has been working on various large scale data stores for a decade primarily focused on MySQL. Currently, he has been working for LinkedIn for the last 5 years. Prior to LinkedIn, he worked for Yahoo, Pythian and Percona where he was responsible for helping clients... Read More →


Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #1

16:00 EDT

The Last Mile: Delivering the Last 10% of a Four-Year Migration
In complex projects, the last 10% of the project is often the most difficult part. In this talk, I will share a case study of Box's 4-year effort to get rid of our legacy mapping DB and move the last piece of our legacy monolith MySQL traffic to our data access layer. This talk will cover how to manage technical risk and optimize team execution in a technically complex and operationally distributed environment. This talk will share reflections on useful tactics that led to the successful completion of this four-year migration project for others to learn from and leverage.

As a relatively new Staff engineer, I learned and experimented with building and maintaining a long time-horizon project plan, identifying unknown unknowns, and continually finding ways to de-risk the project at every stage of development.

As the project progressed, I found that successful execution depended not only on these technical strategies, but even more so on how the team operated. In the spirit of Agile and mitigating the isolation of the pandemic, we experimented with almost every aspect of how we worked: how/when we worked together, how our sprints ran, how we evolved designs, and even the minutia of how we retrospected.

In this session, we will have a candid discussion on the technical and organizational strategies that I believe were important to our success, or that were promising enough to warrant more experimentation in the future.

Participants will leave with a few ideas that they should be able to try out within their own teams. Additionally, there are some deeper ideas about team leadership and effectiveness that I hope participants will be able to reflect on going forwards.

Speakers
avatar for Jordan Moldow

Jordan Moldow

Staff Software Engineer, Box, Inc.
Jordan Moldow is a Staff Software Engineer on Box’s Database Tools and Automations team. After earning MIT BS degrees in CSE and mathematics in 2014, Jordan moved to California to join Box. Jordan and his teammates focus on backend database infrastructure, providing the tools, intermediate... Read More →


Wednesday May 12, 2021 16:00 - 16:30 EDT
Room #4

16:00 EDT

Kubernetes-on-Rails?! KateSQL: A Shopify-Scale Cloud-Hosted MySQL Platform
Running MySQL in *The Cloud* is a challenge. Replicated MySQL loves stability; the cloud eschews stability. Replicated MySQL loves low-latency high-performance hardware; the cloud is mainly composed of lower-performance virtualized hardware. For reliability reasons, it's necessary to run across multiple regions. Add in Kubernetes (which loves statelessness) on top of all of this and it can be difficult to keep anything safely running at all.

We needed a Shopify-scale system for managing MySQL for our internal users, in a modern and automated way, with a minimum amount of toil. We needed to support safe and seamless migration between regions, zones, and Kubernetes clusters. We needed automatic and robust configuration and user/grant management.

Shopify developed ***KateSQL***, Shopify-scale cloud-hosted MySQL platform, over the past year to solve all of these problems and provide a MySQL platform that we will be able to grow with in the future.

Speakers
avatar for Jeremy Cole

Jeremy Cole

Sr. Staff Production Engineer, Shopify, Inc.
Jeremy is a long-time MySQL Geek and a pioneer of MySQL scalability, having been in the MySQL world for more than 20 years. He's operated and supported MySQL for MySQL itself, many startups, Twitter, Google, and more. Jeremy currently works in the Database Platform team at Shopify... Read More →
avatar for Akshay Suryawanshi

Akshay Suryawanshi

Staff Production Engineer, Shopify, Inc.
Akshay has over a decade of experience in the field of Database Reliability Engineering, which he uses to solve MySQL-related scaling and performance problems, in the Cloud. Akshay has deep interests in distributed systems like Kubernetes and has leveraged it to build highly scalable... Read More →


Wednesday May 12, 2021 16:00 - 17:00 EDT
Room #5

16:00 EDT

Automating MariaDB Deployments With Ansible
This demo shows how to deploy MariaDB with Ansible. We'll look into the Ansible playbook and the Ansible commands to use to install MariaDB, upgrade MariaDB, and modify MariaDB configuration on any number of servers.
The Playbook we'll use is a realistic production installation: it includes scheduled backups and a PMM agent to monitor the server performance.
We'll discuss how using Ansible saves long work hours, makes operations virtually error-free, and provides a complete documentation of servers configuration.

Speakers
avatar for Federico Razzoli

Federico Razzoli

Director & Consultant, Vettabase
Passionate with databases, automation and open source. Federico was active in MariaDB and MySQL communities for years. He started Vettabase, a database automation and consulting company.


Wednesday May 12, 2021 16:00 - 17:00 EDT
Room #9

16:30 EDT

Open Source DBaaS with PMM
Major clouds all over Database as a Service or DBaaS (RDS, CloudSQL, etc) but this often comes with a sizable (and constantly growing) cost and typically requires you to give up significant control over both your data and your success. In this session, we will explore alternatives to an ever-increasing cloud bill with self-hosted and hybrid DBaaS models using Percona's own PMM. Do away with "tuning by credit card" and take back the control of tuning for optimal performance and balancing cost while enjoying the simplicity of serverless and maintaining control of YOUR data all with Op

Speakers
avatar for Steve Hoffman

Steve Hoffman

VP, Engineering, Percona
A lifelong technologist, Steve got his start in high performance, high availability computing while in college at Penn State. As his career in technology progressed through programming and systems engineering, he made the transition into leadership and has never looked back. His successes... Read More →


Wednesday May 12, 2021 16:30 - 17:00 EDT
Room #6

16:30 EDT

Improving Security in MySQL
MySQL is a powerful, but a very complicated beast.

In this talk we will go through the things that we can do to improve the security of MySQL by mainly covering security-related issues pertaining to MySQL by also keeping MySQL capable of performing at the very best of its ability at the same time. In this talk we will go through the security measures that you can take to secure your MySQL instances including access control (users, privileges, accounts and their security, roles), also password management, MySQL security plugins and a couple of other things - this talk will be useful for both developers and DBAs alike.

Speakers
avatar for Lukas Vileikis

Lukas Vileikis

Marketing Evangelist, Severalnines
Lukas is an ethical hacker, a MySQL DBA and a frequent conference speaker. Since 2014 Lukas has found and responsibly disclosed security flaws in some of the most visited websites in Lithuania and abroad including advertising, gift-buying, gaming, hosting websites as well as some... Read More →


Wednesday May 12, 2021 16:30 - 17:00 EDT
Room #1

16:30 EDT

Steps to Repair a Corrupted MongoDB Shard
This talk demonstrates the steps to repair a corrupted MongoDB Shard.

The main benefit of focusing on the repair of a corrupted shard as opposed to rebuilding the entire sharded cluster is obvious. Any required downtime is limited only to reading/writing of the corrupted shard and should take less time.

For the purpose of this talk, shard corruption is defined as when a shard has a collection with a different UUID as compared to the other shards, and/or different from the UUID in config.collections document for the namespace. Corrupted UUID could be the result of someone dropping the collection directly on the shard, and the restoring the collection to that shard.

A simple Google Search on “mongodb invalid uuid” would demonstrate that this problem has been encountered by multiple mongodb installations world-wide.

Speakers
avatar for Alex Leong

Alex Leong

Database Engineer, indeed
Database Engineer @ indeed.com                       20+ years in Database AdministrationMongoDB ~ 6 years Certified Mongo DBA Oracle ~ 15 years     Certified Oracle DBA ElasticSearch ~ 1 yearSQL Server ~ 5 yearsmySQL ~ 1 yearCertified Java ProgrammerLinkedIn : https://www.linkedin.com/in/alex-leong-07068b31/Email... Read More →


Wednesday May 12, 2021 16:30 - 17:00 EDT
Room #3

16:30 EDT

Percona Backup for MongoDB - Developer and User Joint Use Case Session
This session will take a look at what both sides of Percona’s backup solution for MongoDB has to offer - providing developer tool side info as well as specific end-user info to help you understand the workings of PBM as well as know the tips and tricks and common solutions and troubleshooting around this popular open-source tool for backing up MongoDB.

Percona Backup for MongoDB is an open-source distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. In part of this talk, we will take a look under the hood of PBM from the development side. We will also cover the architectural decisions and techniques that lie behind distributed backups and PITR. From the user side of PBM, there are still some general use questions ranging from the simple to the complex that we will cover relating to both the backup and the restore sides of the PBM tool. The user side goal is to explain how to set up the backup solutions and the restore situations for both replica sets and shared clusters. We will discuss some of the main problems our support customers encounter in their production environments and provide tips to help you navigate and avoid them proactively.

Speakers
avatar for Kimberly Wilkins

Kimberly Wilkins

MongoDB Technical Lead, Percona
Kimberly Wilkins, MongoDB Technical Lead - has over 20 years experience managing and architecting database systems using both relational and NoSQL technologies to help customers across a wide variety of industry verticals including vehicle inventory management and auctions for now... Read More →
avatar for Andrew Pogrenboi

Andrew Pogrenboi

Principal Software Engineer, Percona
Andrew Pogrebnoi is the primary developer on the Percona Engineering Team behind the current updates to PerconaBackup for MongoDB
avatar for Rafael Galinari

Rafael Galinari

MongoDB Support Engineer, Percona
Rafa is a Support Engineer at Percona who has spent quite a bit of time learning about MongoDB as well as working with a wide variety of customer issues. Whenever MongoDB is not consuming his time, he likes to go to the beach and practice CrossFit. He also enjoys mountain biking... Read More →


Wednesday May 12, 2021 16:30 - 17:30 EDT
Room #5

16:30 EDT

Securing PostgreSQL From External Attack
This talk explores the ways attackers with no authorized database access can steal Postgres passwords, see database queries and results, and even intercept database sessions and return false data. Postgres supports features to eliminate all of these threats, but administrators must understand the attack vulnerabilities to protect against them. This talk covers all known Postgres external attack methods.

Speakers
avatar for Bruce Momjian

Bruce Momjian

Postgres core team member, EDB VP and Postgres Evangelist, EDB
Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and... Read More →


Wednesday May 12, 2021 16:30 - 17:30 EDT
Room #2

17:00 EDT

Flame Graphs for MySQL DBAs
Flame graphs are a visualization of profiled software, allowing the most frequent code-paths to be identified quickly and accurately. They can be generated using Brendan Gregg's open source programs on github.com/brendangregg/FlameGraph, which create interactive SVG files to be checked in browser.

Different types of Flame Graphs (CPU, Off-CPU, Memory, Differential etc) are presented. Various tools and approaches to collect profile information of different aspects of MySQL or MariaDB server internal working are presented Several real-life use cases where Flame Graphs helped to understand and solve the problem are discussed.

Speakers
avatar for Valerii Kravchuk

Valerii Kravchuk

Principal Support Engineer, MariaDB Corporation
Valerii Kravchuk helps MySQL and MariaDB users and DBAs to resolve their problems since 2005. Worked in MySQL AB, Sun, Oracle, Percona and, since 2016, in MariaDB Corporation. MySQL Community Contributor of the year 2019.


Wednesday May 12, 2021 17:00 - 18:00 EDT
Room #1

17:30 EDT

MyRocks - The 30,000 Foot View
This talk aims to introduce the MyRocks storage engine at a high level. This is not a deep dive of the technology, a slide deck of benchmark results, or a how-to on migrating your data into the MyRocks engine. Rather, it will be a discussion of the fundamental differences between MyRocks and InnoDB. We'll look at the pros and cons of the engine and discuss some use cases where it may be a reasonable t. The main takeaway from this talk is that MyRocks isn't a silver bullet that should just be blindly used as a drop-in replacement for InnoDB. Rather, MyRocks is definitely an alternative to exploding data volume that should be evaluated in certain use cases.

Speakers
avatar for Mike Benshoof

Mike Benshoof

Technical Account Manager, Percona
Michael is currently a US based Technical Account Manager at Percona, originally joining the team as a consultant in 2012. Prior to Percona, Michael spent several years in a DevOps role in a company that developed and maintained a SaaS application specializing in social networking... Read More →


Wednesday May 12, 2021 17:30 - 18:00 EDT
Room #2
 
Thursday, May 13
 

07:00 EDT

Hybrid TP/AP With MySQL and ClickHouse
OLAP and OLTP databases provide different trade-offs for database users, but what if you could have the best of both worlds?

In this talk we describe how to use ClickHouse as an OLAP replication slave of MySQL databases, including support for transactional consistency (MVCC- which ClickHouse does not support). The talk will go into how the MaterializeMySQL database engine in ClickHouse works, the modifications we did to support transactional consistency, and the options users will have for choosing between freshness and OLAP performance.

Speakers
avatar for Stig Bakken

Stig Bakken

Senior Database Architect, Huawei
Stig is a Senior Database Architect at Huawei's Cloud Databases team in Trondheim, Norway. Prior to Huawei Stig has a diverse background in a lot of disciplines including data engineering at TietoEVRY financial services, DevOps/GitOps, mobile apps and cloud infrastructure at Zedge... Read More →


Thursday May 13, 2021 07:00 - 07:30 EDT
Room #3

07:00 EDT

OSINT - Do You Really Know What Data You're Leaking?
Open source intelligence. That almost sounds like a technique described in a movie plot – purely fictional right?

As it turns out, it is alarmingly much more simple than you may think and in some cases we walk the fine line between intelligence and creepy stalker-like activity.

In this talk we'll look at some examples, and discuss practical applications from an adversarial point of view.

Hopefully you'll leave with an increased appreciation for the data you may be leaking to the world.

Speakers
avatar for David Busby

David Busby

Information Security Architect, Percona LLC
David has been a Linux systems admin for more than 20 years, generally in different roles - development, network admin, support, DBA, and more.Contributor to the EPEL packages for Openstack.C.I.S.S.P and is the text book "tin foil hat" / "paranoid security guy".


Thursday May 13, 2021 07:00 - 07:30 EDT
Room #4

07:00 EDT

Top 10 Tips For MongoDB Performance
MongoDB is highly tuneable with many options for optimizing performance. However, the sheer quantity of tuning options can be overwhelming, and you can waste precious time unless you know which tuning activities are most likely to provide a return on your time investment. In this presentation, we’ll review ten of the fundamental MongoDB performance tuning practices and see how to use these in a systematic way to improve MongoDB performance.

Topics will include document design, workload and query optimization, use and misuse of transactions, configuring memory to avoid physical IO, disk IO optimization, and MongoDB cluster optimization.

The following subjects will be covered:

• Adopting a methodical tuning methodology
• MongoDB schema design
• MongoDB indexing
• Tuning tools included in the MongoDB core
• Tips for optimizing find() and aggregate() statements
• Tuning update, inserts and deletes
• Transaction performance management
• Memory Tuning
• Disk tuning
• Replica set tuning

Speakers
avatar for Guy Harrison

Guy Harrison

CTO, Southbank Software
Guy Harrison is CTO at Southbank Software, a database and blockchain tools company. He is the author of *MongoDB Performance Tuning*, *Next Generation Databases*, *MySQL Stored Procedure Programming* and many other books, articles and presentations on database technology. He writes... Read More →


Thursday May 13, 2021 07:00 - 08:00 EDT
Room #5

07:00 EDT

Building Cost-Based Query Optimizers With Apache Calcite
Query optimization is one of the most challenging problems in database systems. For many years, creating a query optimizer was considered black art, available only to a limited number of companies and products.
Not any more. Apache Calcite is an open-source framework that allows you to build query engines, and query optimizers in particular, at a significantly lower engineering cost. In this talk, I will present query optimization capabilities of Apache Calcite, including cost-based and heuristic optimization drivers and an extensive library of optimization rules. I will also present several examples of production-grade optimizers based on Apache Calcite.

Speakers
avatar for Vladimir Ozerov

Vladimir Ozerov

Co-founder, Querify Labs
Vladimir Ozerov is a co-founder of Querify Labs, where he manages the research and development of query engines for technology companies. Before that, Vladimir worked on distributed systems Apache Ignite and Hazelcast for more than eight years, focusing on distributed data processing... Read More →


Thursday May 13, 2021 07:00 - 08:00 EDT
Room #2

07:00 EDT

How Machine Learning Inside Databases Solves Significant Data-Science Challenges
Machine Learning inside databases is becoming a hot trend. Last time at Percona Live 2020, our team presented AI Tables - an open-source solution that enables automated machine learning capabilities inside databases. The main idea of AI Tables is to allow anyone who works with databases to implement ML projects in a matter of hours without requiring data science skills.

It is as simple as using SQL queries!

In the journey of bringing AI Tables to the community, we have discovered and solved Machine Learning problems that are hard even for ML engineers but are common for data inside databases.

For example:
Forecasting inventory for all products in all stores (**GROUP BY store, product_id**), given a table that contains all inventory updates over time (**ORDER BY time**).

This problem is complex even for experienced ML engineering teams. In a traditional ML approach, you would need to train one model for each product at each store, which can mean thousands or hundreds of thousands of models, not even thinking of the logistic nightmare to bring such many models to production.

Another example of a challenge solved is creating views that do **joins between data tables and ML models**. It significantly streamlines using machine learning inside BI tools to forecast data trends. Also, it opens broader possibilities for anomaly detection and much more!

We have made significant progress in solving those problems automatically through AI-Tables, and we would like to share with you our approach and discuss some interesting insights that we have made in the process.

**Agenda:**
- 5 min | Advantages of ML inside a database over the traditional approach
- 15 min | Machine learning workflows inside databases
- 15 min | Automated multivariate time-series forecasting
- 15 min | Joining tables with ML models
- 10 min | Q&A

Speakers
avatar for Jorge Torres

Jorge Torres

CEO, MindsDB
Jorge Torres is the Co-founder & CEO of MindsDB. He is also a visiting scholar at UC Berkeley researching machine learning automation and explainability. Before founding MindsDB, he worked for a number of data-intensive start-ups, most recently working with Aneesh Chopra (the first... Read More →
avatar for Patricio Cerda-Mardini

Patricio Cerda-Mardini

Machine Learning Research Engineer, MindsDB
Patricio Cerda-Mardini is a Machine Learning Research Engineer. As a masters student at PUC Chile, he focused on machine learning methods for human-robot interaction and recommendation systems, areas in which he holds a couple of academic publications. Prior to joining MindsDB, he... Read More →


Thursday May 13, 2021 07:00 - 08:00 EDT
Room #1

07:30 EDT

Open Source Databases and ARM
ARM is gaining a lot of traction, especially with High-Performance Computing Softwares.

Opensource Databases is no exception and most of the leading opensource databases are now available on ARM (MySQL, MariaDB, PostgreSQL, MongoDB, ClickHouse, etc...)

Let's explore the state of different open-source databases and their supporting ecosystems/tools, understanding the performance, functionality, active community, etc...

Whatever your use-case it is quite likely that it could be ported to ARM and this comes with a lot of advantages.

So let's unwind this completely new VERTICAL of running Opensource DBs on ARM.

Speakers
avatar for Krunal Bauskar

Krunal Bauskar

Engineer, Huawei
Krunal Bauskar has been actively working in the MySQL space for over a decade. He is currently driving the adoption of the ARM ecosystem for MySQL/MariaDB/Percona through his #mysqlonarm initiative working at Huawei. In the past he has worked on multiple MySQL projects viz. undo log... Read More →


Thursday May 13, 2021 07:30 - 08:00 EDT
Room #3

07:30 EDT

Setup and manage alerts for databases with Integrated alerting in Percona Monitoring and Management
Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance and improve the security of your business-critical database environments, no matter where they are located or deployed.
In this talk, we will show how to set up integrated alerting (sending alerts to external channels) in PMM. In this session we will:
  • Show how the alerts are sent to the external channels.
  • Examine the architecture of the alerting system in PMM.
  • Dene a custom alert, and examine it showing up on the external channels as well.

Speakers
avatar for Peter Boros

Peter Boros

Principal Architect, Percona
Peter is a Principal Architect in Percona's consulting team. He has been using and working with open source software from early 2000s. Peter's rst and foremost professional interest is performance tuning and large scale automation. Before rejoining Percona, Peter worked on large scale... Read More →
avatar for Zoriana Stefanyshyn

Zoriana Stefanyshyn

QA Analyst, Percona
Zoriana joined Percona 1 year ago as QA Analyst on the Percona Platform team. Her previous QA experience was in dierent domains - document management, automotive, car navigation systems, and SDK and now she is new to open-source but she is really inspired by the products of Percona. The... Read More →


Thursday May 13, 2021 07:30 - 08:00 EDT
Room #6

07:30 EDT

Build a Scale-Out Real-Time Data Warehouse for Analytics Within Seconds by Combining Apache Flink + TiDB
There is a growing demand for real-time data warehouses by data-driven companies to implement real-time Online Analytical Processing analytics, real-time data panels, and real-time application monitoring. However, the architecture of real-time data warehouses has long been thought complex and difficult to operate and maintain.

As an open source and distributed Hybrid Transactional/Analytical Processing (HTAP) database, TiDB can be used as a backbone storage for real-time data warehouse in multiple use: business dDataSsource, dimension table DdataSsource and the analytical database for summarized data. The combination of stream processing systems (e.g. Apache Flink) and TiDB could become an efficient, easy-to-use, real-time data warehouse that features horizontal scalability and high availability.

In this talk, Qi Zhi will deep dive into what a real-time data warehouse is, how TiDB powers real world real-time data warehouses and the patterns on combining streaming processing systems and TiDB.

Speakers
avatar for Zhi Qi

Zhi Qi

Realtime Analytics R & D Engineer, PingCAP
Zhi Qi is a software engineer at PingCAP, working on Real-time Analytics and BigData Ecosystem of TiDB. He gave a speech about Flink TiDB real-time data warehouse at Flink Forward Asia 2020.


Thursday May 13, 2021 07:30 - 08:00 EDT
Room #4

08:00 EDT

Introducing ProxyWeb - The Open Source Web Interface For ProxySQL
Introducing ProxyWeb the first Open Source ProxySQL Web User Interface. It had proven itself extremely useful during Edmodo's 25x traffic growth last march and now it's available under GPLv3. It can be installed as a docker container or as a system service in 10 seconds.

It has a responsive design, supports administering multiple ProxySQL servers, generating adhoc traffic reports, hiding unnecessary tables on a per-server basis and it comes with detailed documentation.

To make the evaluation easier it comes with a really extensive docker-compose based test environment that gives the user a fully working 'infrastructure' that consist of a MySQL cluster, ProxySQL, ProxyWeb, Orchestrator, Health Check, Sysbench.

The environment can be fully operated through a web browser after the initial start, which takes less than 45 seconds.

In the presentation the audience will be walked through the installation and the configuration of the ProxyWeb and the docker-compose based test environment will be used to set up a ProxySQL cluster from scratch. Once the setup is completed we will generate traffic with Sysbench and perform a failover with Orchestrator.

The codebase and the documentation can be accessed at http://proxyweb.org

Speakers
avatar for Miklos Szel

Miklos Szel

Senior MySQL Architect, Edmodo
Miklos Mukka Szel is a Senior DB Architect at Edmodo. With more than 20 years’ experience in system and network administration, he has also worked for Walt Disney International as its main International MySQL DBA. Miklos specializes in MySQL-based high availability solutions, performance... Read More →


Thursday May 13, 2021 08:00 - 08:30 EDT
Room #5

08:00 EDT

MySQL Server Component Manifest Files
MySQL configuration has traditionally been done via system variables with values coming from either command line, config files or SET commands. This can be a security issue since it doesn't support a trust model rooted in some well known trusted state that cannot be modified by less trusted actors.

This is what the manifest le security model is aiming at solving.
It roots server security into a well known and trusted source (the server's OS le permissions) and builds on top of it to allow secure configuration of components.

In this talk we will review how manifest files work and also check some of the early adopter components of the new secure configuration model.

Speakers
avatar for Georgi Kodinov

Georgi Kodinov

MySQL SrvGen team lead, Oracle MySQL
Georgi "Joro" Kodinov has been working on MySQL for more than 10 years. He's leading the Server General team that deals with security, performance monitoring and the mysql client server protocol. Before working on databases Joro was serving as an IT manager for a Bulgarian bank... Read More →


Thursday May 13, 2021 08:00 - 08:30 EDT
Room #7

08:00 EDT

Overview of MySQL Server plugins and what is new in MySQL 8
Plugins are the piece of the software, which provides the additional services. MySQL has the plugins and it was matured a lot on MySQL 8. I am interested to talk all about the features of MySQL server plugins and how we can install and uninstall, how we can retrieve the Plugin information!

**My Agenda:**

1. What is the scope of plugin in MySQL?

- Will explain the role of plugins in MySQL

2. How to install/Uninstall and obtain the plugin information?

- Will explain about the plugin installation
- Will explain about the plugin uninstallation
- Will explain how to obtain the plugin information like (plugin directory, plugin is active or not, information_schema.plugins tables, SHOW PLUGINS command)"

3. Different type of MySQL plugins

- Query rewriter
- DDL rewriter ( MySQL 8 )
- Version token
- Clone plugin ( MySQL 8 )
- MySQL enterprise threadpool"

4. Plugin services:

- Locking services
- Keyring services "

5. Q/A


Thursday May 13, 2021 08:00 - 08:30 EDT
Room #4

08:00 EDT

Default to Open: Steps and Traps
Sharing is caring. In an ideal world, everyone has bought into transparency and there is no problem with communication. Now you pick the right tools that let you collaborate and share information easily and go - you're open.

Unfortunately, the biggest challenge in technology is that people have differing opinions. Lenz and Sanja talk about their experiences from SUSE, Red Hat, and Percona.

From licensing to open sourcing newly acquired company products, from motivating engineering teams to both accept contributions and contribute back to other open source projects, from documentation to branding.

"Default to open" is a work style that is worth striving for, even if your product code is not open source. Let's talk about it.

Speakers
avatar for Sanja Bonic

Sanja Bonic

Head of Open Source Programs Oce, Percona
avatar for Lenz Grimmer

Lenz Grimmer

Sr. Director, Server Engineering, Percona
Lenz Grimmer supports and leads the engineering teams at Percona that work on server products like Percona Server for MongoDB, MySQL, PostgreSQL and related components. He's been involved in Linux and Open Source technologies in various roles and capacities since the mid-90s and has... Read More →


Thursday May 13, 2021 08:00 - 08:30 EDT
Room #2

08:00 EDT

Native Chaos Engineering in Databases
Chaos Engineering is revolutionizing testing means and doing it the cloud-native way is the best way in today's rapidly changing world with a huge shift in the paradigm of Kubernetes resiliency. Karthik S, one of the maintainers for LitmusChaos would be introducing how to carry out Chaos Engineering, the cloud-native way. Further, he will touch upon how Chaos Engineering is carried out in Cloud-Native Databases with LitmusChaos. He will also touch upon observability considerations for chaos engineering and what hooks Litmus provides for the same.

Speakers
avatar for Karthik Satchitanand

Karthik Satchitanand

LitmusChaos Maintainer, ChaosNative
Karthik Satchitanand is one of the maintainers of the CNCF sandbox project LitmusChaos. He is passionate about all things Kubernetes, and is generally interested in DevOps, storage performance/benchmarking & chaos engineering.


Thursday May 13, 2021 08:00 - 09:00 EDT
Room #9

08:00 EDT

Performance Optimization - How to Get the Best Out of Your Indexes on Postgres and MySQL
During this talk we will discuss how the index works on Postgres and MySQL. What are the differences between the implementations and what are the more appropriate choices for different scenarios? We will discuss the general B+-tree indexes but also discuss GIN, GIST and understand where they are best suited with examples and understand why some database migrations are a failure due to differences in implementation or lack of a specific index.

Speakers
avatar for Charly Batista

Charly Batista

Senior Support Engineer, Percona
A Brazilian living in China... Charly is passionate about new cultures, their languages and traditions. Charly has been working with database and development for more than 12 years and has participated in small and large projects in Brazil, the US, China and other countries.


Thursday May 13, 2021 08:00 - 09:00 EDT
Room #3

08:00 EDT

MySQL Backup Solutions in 2021
Backups are important! Everyone makes mistakes, bugs are easily overlooked, hardware will fail eventually. If you don't want to lose data when disaster strikes, your backups will be your savior. In this talk I will guide you through some of the most common backup techniques for MySQL that we use in 2021. I will explain the strengths and weaknesses of each solution and we'll go into detail about what the impact of each solution has on your recovery time objectives (RTO) and recovery point objectives (RPO). And we'll go into detail about how to achieve these objectives and to understand their impact on your environment.

Speakers
avatar for Matthias Crauwels

Matthias Crauwels

Lead Database Consultant, Pythian
Since the age of 10 I’ve always been passionate about computers. I’ve been working with them ever since. In 2005 I got my degree in computer science. I used to work at a major Belgian university where I was developing the e-learning applications. In that position, I was the one... Read More →


Thursday May 13, 2021 08:00 - 09:00 EDT
Room #1

08:30 EDT

MariaDB High Availability in a Cocktail Mix with Envoy and Orchestrator
For a considerable set of critical applications at Wise (former TransferWise), database high availability is a must to ensure that we don't let our customers down.

At Wise, we offload mostly of our relational databases operational toil into AWS RDS managed services. However, for some use-specific cases which need different availability requirements, we run some clusters on EC2.

In this presentation, I will show how we implemented high availability for our MariaDB clusters running on EC2 with an integration with Envoy and Orchestrator in order to decrease failovers and maintenances from few minutes to just a few seconds.

Speakers
avatar for Pedro Albuquerque

Pedro Albuquerque

Staff Database Engineer, Wise (former TransferWise)
I have many years of working in various database technologies, which include relational and NoSQL platforms. I am currently focused on MariaDB, MongoDB and PostgreSQL datastores at Wise. Previously to Wise, I was focused on MongoDB at ObjectRocket by Rackspace, supporting customers... Read More →


Thursday May 13, 2021 08:30 - 09:00 EDT
Room #4

08:30 EDT

Zoned Namespaces for the Next Era in Application Performance
Learn how ZNS SSDs (Zoned Namespaces) may be leveraged to help you achieve scalable MySQL™ performance for your next wave in growth, whether that's measured in end-user count, supported IoT devices, or data volumes.

ZNS SSDs give applications more direct control over physical data placement, bypassing the internal architecture of conventional SSDs, to achieve the next progression of application performance and scale for the digital transformation era.

User experience depends on application responsiveness which directly ties to MySQL performance and the latency of underlying storage resources. ZNS SSDs eliminate some bottlenecks of conventional SSD architecture and may deliver more predictable data response times and higher MySQL transaction rates for workloads that involve concurrent read and write operations.

Please join Percona CEO, Peter Zaitsev, and Wim De Wispelaere, Western Digital VP Corporate Strategic Initiatives for a 30- minute presentation for an overview of Zoned Namespaces technology and how we solve your next wave in application growth by using Percona Server® for MySQL with Ultrastar® ZNS SSDs. We’ll explain how ZNS zone block interface, the MyRocks pluggable ZenFS le system, and Linux® support will help you push the limits of database performance at scale

Speakers
avatar for Peter Zaitsev

Peter Zaitsev

CEO & Co-founder, Percona
Peter Zaitsev is CEO and co-founder of Percona. As one of the foremost experts on MySQL strategy and optimization, Peter leveraged both his technical vision and entrepreneurial skills to grow Percona from a two-person shop to one of the most respected open source companies in the... Read More →
avatar for Wim De Wispelaere

Wim De Wispelaere

VP Corporate Strategic Initiatives, Western Digital


Thursday May 13, 2021 08:30 - 09:00 EDT
Room #6

08:30 EDT

Insights Into the New Oracle MySQL Database Service
Oracle MySQL Database Cloud Service 2021- Where we are today...

Early 2020 Oracle introduced the MySQL Database Service (MDS) built on Oracle Gen 2 Cloud Infrastructure and MySQL 8. It is 100% developed, managed, and supported by the Oracle MySQL team.
The service is available as a fully-managed service in all commercial regions of the Oracle Cloud. Customers do not need to deploy, patch, update, backup or restore a MySQL instance - all processes are covered as part of the Cloud operation.
Administrative tasks are pushed to the OCI web console and/or via the OCI Command line.
In this presentation we are looking into ways on how to “setup” a MySQL Instance and how to work with MySQL instances based on the GUI and the Command Line (OCI).
As part of this we are looking into the specifics of the available my.cnf / Configuration, CPU/Memory Shapes, Disk and Network options.
Furthermore, we are looking into typical usage scenarios like How to migrate MySQL data to the MDS, High Availability  Scenarios and How to congure Replication from anywhere to MDS.
The presentation includes a couple of small demos to showcase the usage of MySQL on an introduction level.

Speakers
avatar for Carsten Thalheimer

Carsten Thalheimer

Senior Principal Cloud Solution Engineer, Oracle MySQL GBU
Carsten Thalheimer has worked for over 20 years in IT industry world-leading technology companies, focusing on many aspects of a technology business. He worked for Integrata AG, SCO Group Inc., Tarantella Inc. and Sun Microsystems Inc. His long-standing passion for open-source technology... Read More →


Thursday May 13, 2021 08:30 - 09:00 EDT
Room #7

08:30 EDT

Test Applications' Storage Stability by Injecting Storage Errors
Storage is always an important issue for cloud applications because the stability of the whole cluster highly depends on the availability of the storage and storage is more fragile and less reliable than other parts of the server.

Therefore, emulating a storage fault (e.g. disk broken and filesystem corruption) or degradation (e.g. the slowed distributed file system) is significantly helpful to make sure applications are able to sustain failure scenarios. Injecting these errors could help developers understand and predict the behavior of these applications when the volume doesn’t work perfectly so that applications could be prepared for these disasters.

In this talk, Keao Yang will introduce IOChaos, a custom resource developed by Chaos Mesh team, and explain how it makes emulating a storage error for applications running on Kubernetes easy and painless. Also, he will illustrate how it could be used in other applications.

Speakers
avatar for Keao Yang

Keao Yang

Engineer, Research & Development Engineer
Keao Yang is an engineer at PingCAP, who is mainly responsible for the controller framework, the Network and IO related fault injection in Chaos Mesh. Also, he is a maintainer of Chaos Mesh.


Thursday May 13, 2021 08:30 - 09:00 EDT
Room #2

09:00 EDT

Everything You Ever Wanted To Know About Databases but Were Too Afraid To Ask
The talk does exactly what it says on the tin - everything you want to know about databases from somebody with decades of experience implementing, architecting, and building databases. Gavin will talk through the way he currently thinks about implementation - the problems he most cares about - and then will open up to your questions! In the absence of Stonebreaker, nobody is better placed to answer database Q&A.

Speakers
avatar for Dr Gavin Mendel

Dr Gavin Mendel

CTO, TerminusDb
Dr Gavin Mendel-Gleason is CTO of TerminusDB. He is a former research fellow at Trinity College Dublin in the School of Statistics and Computer Science. His research focuses on databases, logic and verification in software engineering. His work includes contributing to the Seshat... Read More →


Thursday May 13, 2021 09:00 - 09:30 EDT
Room #9

09:00 EDT

MariaDB Notebooks in JupyterHub
The MariaDB Jupyter kernel project helps you use MariaDB from within the Jupyter notebook ecosystem.
You can display the results of your favourite queries in a notebook, plot result sets using %magic commands or export data from MariaDB to Python notebooks unleashing the full power of these technologies for data analytics.

This talk covers the current state of the MariaDB kernel, the existing features,
how to install and use it and demonstrates the simplest way to deploy JupyterHub in your organization so that people can use MariaDB in individual notebook workspaces using shared MariaDB Server deployments.

There is no background knowledge expected to understand the content of this talk, if you've ever used a Jupyter notebook, MariaDB or both or maybe you'd just love to hear about these technologies, you're more than welcome to attend.

Speakers
avatar for Robert Bindar

Robert Bindar

Server Developer, MariaDB Foundation
Robert started working for the MariaDB Foundation in 2018 as a server developer. His main focus is divided between server development and helping the community contribute faster and more efficiently to the MariaDB codebase. Robert is based mostly in Brasov, Romania.


Thursday May 13, 2021 09:00 - 09:30 EDT
Room #2

09:00 EDT

The Many Ways to Copy Your Database
Everyone needs to copy the data in their database for backups and to clone more database instances. This talk will describe and compare many ways to do this - everything from logical data dump and file copying through native cloning and backup tools to advanced scale-out techniques for large-scale copying. Whether you are using a cloud or servers in your data centres, this talk will tell you how to choose the best way to copy your database in every circumstance with performance comparisons and lessons from real-life experience.

Speakers
avatar for Nicolai Plum

Nicolai Plum

Database Engineer, Booking.com Ltd
Nicolai Plum works in the Database Engineering team of Booking.com managing database product features and service design. His previous roles at Booking.com have ranged widely from Linux systems administration team lead through storage and systems architecture to regulatory compliance... Read More →


Thursday May 13, 2021 09:00 - 09:30 EDT
Room #3

09:00 EDT

Oracle MySQL Database Service with HeatWave for Real-Time Analytics
MySQL HeatWave - Extreme Performance, Cloud Scale, Significant Cost Savings.

Since 2020 the MySQL development team is offering a fully managed database service of the MySQL Enterprise Edition. Traditionally MySQL InnoDB is designed for online transaction processing (OLTP) load. MySQL can cover online analytics processing (OLAP) load, but it is often rather slow and tricky.
Beginning of December 2020, the MySQL team started a second MySQL Cloud offering called MySQL HeatWave for Real-time Analytics which is based on a new in-memory analytic accelerator which has been designed for extreme performance and cloud scale. This service provides a single, unified platform for both OLTP and OLAP workloads. It can scale to several hundreds of cores and provides around 400x speedup over MySQL for analytic workloads and enables scalable analysis over tens of terabytes of MySQL data. 2 of 4 Speakers Customers can now run all their OLTP and analytics workloads with MySQL without the need to move their data out of MySQL or without requiring any change to their application.
In this presentation we provide an overview about what is going on under the hood and will support our slides with a demo of the technology.

Speakers
avatar for Carsten Thalheimer

Carsten Thalheimer

Senior Principal Cloud Solution Engineer, Oracle MySQL GBU
Carsten Thalheimer has worked for over 20 years in IT industry world-leading technology companies, focusing on many aspects of a technology business. He worked for Integrata AG, SCO Group Inc., Tarantella Inc. and Sun Microsystems Inc. His long-standing passion for open-source technology... Read More →


Thursday May 13, 2021 09:00 - 09:30 EDT
Room #7

09:00 EDT

Production Grade ProxySQL in 2021
Widespread adoption of ProxySQL, the high performance, high availability, protocol-aware proxy for MySQL has lead to a plethora of different and highly innovative solutions for the inevitable scalability issues that MySQL DBAs run into with highly demanding workloads and the fast paced data growth of this modern era.

This talk aims to provide insights into real world MySQL scalability solutions implemented using ProxySQL by deep diving into the key areas to focus on when rolling out ProxySQL in production, examples on how to bulletproof the failover process and many other pertinent tuning recommendations.

- Where should I deploy ProxySQL?
- What hardware does ProxySQL need to run on?
- What does a typical production grade deployment look like?
- How should I design my ProxySQL query rules to ensure an efficient yet granularly controlled ruleset?
- What are the steps in planning a failover process?
- What should I do to achieve a transparent failover?
- What are the key variables I must tune on my production deployment?

Speakers
avatar for René Cannaò

René Cannaò

CEO, ProxySQL
René founded ProxySQL in 2016 after developing it since 2013. He has over 18 years of experience as Database Administrator mostly on MySQL, working as Senior MySQL Support Engineer at Sun/Oracle, Senior Operational DBA at Blackbird, and consulting for small and large companies like... Read More →
NV

Nick Vyzas

CTO, ProxySQL
Nick focuses on maximizing the scalability, availability, and performance of MySQL environments of all shapes and sizes with ProxySQL. Over the last 15 years his focus has been on MySQL database administration and open source software projects at various companies around the world... Read More →


Thursday May 13, 2021 09:00 - 10:00 EDT
Room #1

09:00 EDT

Everything a DBA Should Know About Kubernetes
What is this Kubernetes thing? Why should you care? What happens to a database deployed on Kubernetes? Is it even possible?

Looking from the outside Kubernetes can be frightening, especially when stateful applications are concerned, but fear not! This session will take you through the steps of deploying an application on Kubernetes.

Speakers
avatar for Janos Pasztor

Janos Pasztor

Senior Software Engineer, Red Hat
Janos is a Senior Software Engineer at Red Hat and enjoys coding in his free time as well. Sometimes he comes up with ideas that occupy his evenings and weekends.


Thursday May 13, 2021 09:00 - 10:00 EDT
Room #4

09:00 EDT

Deconstructing Postgres into a Cloud Native Platform
Is deploying Postgres in Kubernetes just repackaging it into a container? Can’t Postgres leverage the wide range of Cloud-Native software and integrate well with K8s? Join this journey that will cover and demonstrate, with demos running on StackGres on OpenShift:

* How to structure Postgres into an init-less container, plus several sidecar containers for connection pooling, backups, agents, etc.
* Defining high level CRDs as the single API to interact with the Postgres operator.
* Using K8s RBAC for user authentication of a web UI management interface.
* Using Prometheus for monitoring; bundling a node, Postgres and PgBouncer exporters together.
* Proxying Postgres traffic through Envoy. Terminate Postgres SSL with an Envoy plugin, that also exports wire protocol metrics to Prometheus.
* Using Fluentbit to capture Postgres logs and forward them to Fluentd, which stores them on a centralized Postgres database.
* Automating Day 2 operations (backups, minor and major version upgrades, pg_repacks, benchmarks and others) in simple YAML files.

Speakers
avatar for Álvaro Hernández

Álvaro Hernández

CEO, OnGres Inc
Álvaro is a passionate database and software developer. He founded and works as the Founder & CEO of OnGres (https://ongres.com). He has been dedicated to PostgreSQL and R&D in databases for two decades.Website: https://aht.esAn open source advocate and developer at heart, Álvaro... Read More →


Thursday May 13, 2021 09:00 - 10:00 EDT
Room #5

09:00 EDT

Joining Heterogeneous Databases is a reality, not a Myth (PostgreSQL FDW)
PostgreSQL provides a way to communicate with external data sources. This could be another PostgreSQL instance or any other database. The other database might be a relational database such as Clickhouse, MySQL, or Oracle; or any NoSQL database such as MongoDB or Hadoop. To achieve this, PostgreSQL implements ISO Standard call SQL-MED in the form of Foreign Data Wrappers (FDW). This presentation will explain in detail how PostgreSQL FDWs work. It will include a detailed explanation of simple features and will introduce more advanced features that were added in recent versions of PostgreSQL. Examples of these would be to show how aggregate-pushdown and join-pushdown work in PostgreSQL. The talk will include working examples of these advanced features and demonstrating their use with dierent databases. These examples show how data from dierent database avors can be used by PostgreSQL, including those from heterogeneous relational databases, and showing NoSQL joins.

Speakers
avatar for Ibrar Ahmed

Ibrar Ahmed

Sr. Software Architect, Percona
Ibrar Ahmed is a Software Architect in Percona LLC. Prior to coming to open source development, he had vast experience in software design and development. His main focus was on system-level embedded development. After joining EnterpriseDB in 2006, an Enterprise PostgreSQL company... Read More →


Thursday May 13, 2021 09:00 - 10:00 EDT
Room #6

09:30 EDT

Wrangling Data to Multiple Places With Fluent Bit
As more and more users move to Kubernetes they also may start using multiple backends and analytic tools. How do you collect once and send everywhere? In this talk, Anurag will talk about a Cloud Native Computing Foundation (CNCF) graduated project Fluent Bit and how you can collect once, and send to all the backends you want. Additionally, Anurag will discuss some of Fluent Bit's advanced capabilities such as enrichment, parsing, and data reduction that helps users get the most out of their backends.

Speakers
avatar for Anurag Gupta

Anurag Gupta

Product Manager, Calyptia


Thursday May 13, 2021 09:30 - 10:00 EDT
Room #9

09:30 EDT

What Do We Want to Monitor? All the Databases!
Your databases and monitoring are all set up and you've got your MySQL, PostgreSQL and MongoDB databases figured out - you're monitoring them and everything is fine. But now you've been tasked to keep tabs on that new Cassandra cluster your company has - we'll show you how to incorporate monitoring it into the Percona Monitoring and Management tool and which features enable you to get the best out of any new and existing database you're incorporating. Database problems? Not on your watch.

Speakers
avatar for Agustín Gallego

Agustín Gallego

Support Engineer, Percona
Agustín joined Percona's Support team in December 2013. He has previously worked as a Cambridge IT examinations Supervisor and as a Junior BI, SQL & C# developer. He is studying to get a Computer Systems Engineer degree at the Universidad de la República, in Uruguay.


Thursday May 13, 2021 09:30 - 10:00 EDT
Room #2

09:30 EDT

MySQL Shell for DBAs
Get an overview of the possibilities offered by MySQL Shell for DBAs. How to deploy infrastructures (overview only), how to dump and load tables/schema/instances etc.. it's time to t get more familiar with the util object. And finally see the possibility of User Defined Reports and Plugins for the MySQL Shell.

During the session we will cover the most useful plugins for general DBA tasks

Speakers
avatar for Frédéric Descamps

Frédéric Descamps

MySQL Community Manager, Oracle
"@lefred" has been consulting OpenSource and MySQL for almost 20 years. After graduating in Management Information Technology, Frédéric Descamps started his career as a developer for an ERP under HPUX. He will then opt for a career in the world of open-source by joining one of the... Read More →


Thursday May 13, 2021 09:30 - 10:30 EDT
Room #7

10:00 EDT

How to Develop BPF Tools with libbpf + BPF CO-RE
Distributed clusters might encounter performance problems or unpredictable failures, especially when they are running in the cloud. Of all the kinds of failures, kernel failures may be the most difficult to analyze and simulate.

Based on Berkeley Packet Filter (BPF), BCC (BPF Compiler Collection) offers many useful resources to construct effective kernel tracing and manipulation programs but it might cause inconveniences for developers in certain situations. Compared with BCC, libbpf + BPF CO-RE seems a better solution because it greatly reduces storage space and runtime overhead, which enables BPF to support more hardware environments, and it optimizes programmers' development experience.

In this talk, Wenbo Zhang will share his BPF practices on how to develop BPF tools with libbpf + BPF CO-RE. He will introduce the advantages of this development method, how to use this method to develop tools, and some tips and tricks for writing Linux BPF applications with libbpf.

Speakers
avatar for Wenbo Zhang

Wenbo Zhang

R&D, PingCAP
Wenbo Zhang is a PingCAP Development Engineer, focusing on performance analysis and diagnosis of Linux kernel. He talked about BPF for chaos and tracing in Kubernetes at Cloud Native + Open Source Summit China 2020.


Thursday May 13, 2021 10:00 - 10:30 EDT
Room #4

10:00 EDT

PMM: Migration From Prometheus to VictoriaMetrics
Recently, PMM replaced Prometheus with VictoriaMetrics. In the talk we want to cover the motivation behind this transition, the architecture and internals of PMM and technical details of the replacement.
The talk is going to be held by members of both organizations who took a part in migration: Percona and VictoriaMetrics. We expect the talk to be divided into the following parts:
1. The evolution path of PMM and decision to replace Prometheus (by Percona members)
2. PMM architecture and technical details of the transition to VM (by VM members)
3. The summary and results of collaboration

Some key slide titles from the talk:
1. Architecture of PMM
2. Why we decided to replace Prometheus
3. Transition period - what to do with historical data
4. Mutually beneficial collaboration for PMM and VM

The talk will also include the following observability topics:
1. Push vs Pull metrics collection approaches
2. Efficiency of monitoring systems

Speakers
AV

Aliaksandr Valialkin

Founder and CTO at VictoriaMetrics, VictoriaMetrics
VictoriaMetrics founder and core developer. Go contributor and author of popular libraries fasthttp, fastcache, quicktemplate
avatar for Roma Novikov

Roma Novikov

Technical Director, Percona Monitoring and Management at Percona, Percona
Roma Novikov joined Percona at the beginning of 2017 as Director of Platform Engineering. He started programming in 6th grade and has more than 15 years commercial experience in web development. He previously worked as CTO of one of the biggest web development/web design e-commerce... Read More →


Thursday May 13, 2021 10:00 - 10:30 EDT
Room #5

10:00 EDT

A Change-Data-Capture Use-Case: Designing an Evergreen Cache
When one’s app is challenged with poor performances, it’s easy to set up a cache in front of one’s SQL database. It doesn’t fix the root cause (e.g. bad schema design, bad SQL query, etc.) but it gets the job done. If the app is the only component that writes to the underlying database, it’s a no-brainer to update the cache accordingly, so the cache is always up-to-date with the data in the database.

Things start to go sour when the app is not the only component writing to the DB. Among other sources of writes, there are batches, other apps (shared databases exist, unfortunately), etc. One might think about a couple of ways to keep data in sync i.e. polling the DB every now and then, DB triggers, etc. Unfortunately, they all have issues that make them unreliable and/or fragile.

You might have read about Change-Data-Capture before. It’s been described by Martin Kleppmann as turning the database inside out: it means the DB can send change events (SELECT, DELETE and UPDATE) that one can register to. Just opposite to Event Sourcing that aggregates events to produce state, CDC is about getting events out of states. Once CDC is implemented, one can subscribe to its events and update the cache accordingly. However, CDC is quite in its early stage, and implementations are quite specific.

In this talk, I’ll describe an easy-to-setup architecture that leverages CDC to have an evergreen cache.

Speakers
avatar for Nicolas Fränkel

Nicolas Fränkel

Developer Advocate, Hazelcast
Developer Advocate with 15+ years experience consulting for many different customers, in a wide range of contexts (such as telecoms, banking, insurances, large retail and public sector). Usually working on Java/Java EE and Spring technologies, but with focused interests like Rich... Read More →


Thursday May 13, 2021 10:00 - 10:30 EDT
Room #6

10:00 EDT

Towards a K8s Native Streaming Application
Starting from a simple application which can be deployed in every machine running Docker, we will go through all steps required to transform the simple app into a Kubernetes native streaming application. We will explain the theory and then exemplify the learnt concepts to dene a recipe for running streaming applications on Kubernetes. We will focus both on cultural and technical tricks to help you successfully adopt streaming applications at scale.

At the end of the talk, you will have a comprehensive view regarding all platform building blocks and application requirements needed to successfully run a streaming application on Kubernetes.

Spoiler: you will hear several times the words Apache Kafka, Kafka Streams and Strimzi.

Speakers
avatar for Jérémy Frénay

Jérémy Frénay

Engineering Manager - Data Operations, Babylon Health
Jeremy Frenay is an Engineering manager at Babylon Health. He has been leading Babylon’s Data Operations efforts since late 2017, building the Kafka based data infrastructure, the automation and the tooling required to support teams of software and data engineers working on data... Read More →
avatar for Francesco Nobilia

Francesco Nobilia

Principal Engineer, Nutmeg
Francesco is an enthusiastic engineer focused on building the next generation of a self-service and cost-effective streaming data platform. Event-Drive addicted. Apache Kafka fun. Kafka Summit and Meetup speaker. Currently, he is Principal Engineer at Nutmeg.


Thursday May 13, 2021 10:00 - 11:00 EDT
Room #9

10:00 EDT

Almost Perfect Service Discovery and Failover with ProxySQL and Orchestrator
Of course there is no such thing as perfect service discovery, and we will see why in the talk. However, the way ProxySQL is deployed in this case minimizes the risk for split-brains, and this is why I qualify it as almost perfect. But let’s step back a little...

MySQL alone is not a high availability solution. To provide resilience to primary failure, other components need to be integrated with MySQL. At MessageBird, these additional components are ProxySQL and Orchestrator. In this talk, we describe how ProxySQL is architectured to provide close to perfect Service Discovery and how this, combined with Orchestrator, allows for automatic failover. The talk presents the details of the integration of MySQL, ProxySQL and Orchestrator in Google Cloud (and it would be easy to re-implement a similar architecture at other cloud vendors or on-premises). We will also cover lessons learned for the 2 years this architecture has been in production. Come to this talk to learn more about MySQL high availability, ProxySQL and Orchestrator.

Speakers
avatar for Jean-François Gagné

Jean-François Gagné

System and MySQL Expert, HubSpot
Jean-François is a System / Infrastructure Engineer and MySQL Expert. He currently works at HubSpot. Before that, J-F missions were scaling the MySQL and MariaDB infrastructure at MessageBird and Booking.com. He was also involved on projects related to systems, storage, network and... Read More →
avatar for Art van Scheppingen

Art van Scheppingen

Senior Database Engineer, MessageBird
Art van Scheppingen is a Senior Database Engineer at MessageBird with focus on database scalability and reliability. He's a pragmatic MySQL and Database expert with over 20 years experience in web development. He previously worked in various database architectural roles and as Senior... Read More →


Thursday May 13, 2021 10:00 - 11:00 EDT
Room #3

10:00 EDT

Low-Latency and High-Concurrency Analytical APIs for All Databases
The more companies strive to make sense of their (big) data and use it to generate insights or enhance their products, the more developers face the inability of popular databases, query engines, and developer tools to provide low latency and high concurrency for analytical queries. In this talk, you'll learn how to overcome the limitations and architectural nuances of said tools and get operational analytics capabilities for your applications regardless of the underlying data store and data volume. We'll explore how you can bootstrap performant APIs with [Cube.js](https://cube.dev?ref=percona-live), an [open-source](https://github.com/cube-js/cube.js) analytical API platform which works with any SQL-enabled database or query engine. In the end, we'll use reproducible performance testing to prove the viability of the suggested approach and reassure you that your analytical APIs can perform with sub-second latency under heavy load.

Speakers
avatar for Igor Lukanin

Igor Lukanin

Developer Advocate, Cube Dev
Igor is a developer advocate from the Cube.js team that provides developers with tools to build modern analytical applications. He's obsessed with data visualization and storytelling and feels equally comfortable writing SQL and ECMAScript.


Thursday May 13, 2021 10:00 - 11:00 EDT
Room #2

10:00 EDT

Evolution of Partitioning Features in PostgreSQL - A Super-Charged Elephant
The Partitioning feature in PostgreSQL is not something new. But it has matured over several years, release after release, especially the last 3 releases. The evolutionary nature (small changes in each version) is often overlooked by users. These small changes resulted in a build-up of a powerful Partitioning in PostgreSQL new versions. Now it is considered capable of even replacing the world's biggest database software in terms of market share even in Data Warehouses. Gradual evolution leads to no big announcements. So surprisingly very fewer users are still moved to native partitioning.

This talk gives a detailed look at
1. The birth of Native Partitioning and features.
2. How PostgreSQL 11 addressed some of the missing pieces.
3. How PostgreSQL 12 made a dramatic improvement in usability
4. 4. PostgreSQL 13 and ready to take on giants

This presentation is expected to create enthusiasm in the audience to drive towards one of the most powerful features in PostgreSQL.

Speakers
avatar for Jobin Augustine

Jobin Augustine

PostgreSQL Escalation Specialist, Percona
Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open... Read More →


Thursday May 13, 2021 10:00 - 11:00 EDT
Room #1

10:30 EDT

Migration From 5.6 to 8.xx.xx
Intro

I start working in a Belgium company that use Mysql 5.6.
We use 6 production servers and we decide to migrate from this old version to mysql 8.0.19.
All the servers have Centos 7 system (so RPM and YUM available)

The first test was done by devops team, they use mysqldump to make this job, but as each server is about 4T the down time was over 30 hours and it's not acceptable for customer.
It's when I take this task and start reading docs.. I saw that there is the IN - PLACE upgrade that just change the binarys and catalog, but don't move any byte of the DB data.

I In TEST

First I connect to slack mysql Channel and I start talking with Fred. He give me a very nice documentation and advises to achieve this migration with a big % of probability to be fine.

I create a copy of the main DB in test ENV, and I apply the procedure with yum. The migration complete after 2 hours. This down time is acceptable.

II Issues found

The migration process is simple, but depending on what you have in your DB, you can face some stranges situations. For example, we use puppet in our servers, and puppet manage the /etc/my.cnf le, so in TEST I have the first migration CRASH because of puppet. We just comment puppet in the crontab to avoid this issue

Other issue was some warnings after the util checkForServerUpgrade
  • 'NO_ZERO_DATE', 'NO_ZERO_IN_DATE'
  • The syntax 'expire-logs-days' is deprecated
  • character-set-server: 'utf8'

Warnings easy to resolve

III Timings

For 4T of data, the miration was about 2 hours for the whole process.
The upgrade catalog is the process that take more time.
Use screen or tmux for this step :)

IV Why mysql 8

We stay with Mysql DB because for us it's a very good product, the performences are fine and the new features created by the version 8.xx are amazing, this are the most important for us:

- The shell dump
I make some comparative tests and the same server instance can be dumped in 2 hours when mysqldump was over 9 hours to complete. And I check that the shell dump don't create locks.
The fact that this shell dump don't create locks is important because CLONE procedure use shell dump to create the replicas, and this performances and no locks allow to create replicas during the working period.

- Replicaset
I start working 15 years ago as Oracle DBA, and I use Oracle dataguard broker to manager switchovers for example. With Replicaset I found a very nice set of commands to manage Source and Replicas very easy. I use it to move old Centos 6 version servers to Centos 7 without downtime, just switching the Master role.

- Plugins
I was looking for easy ways to catch locks or bad queries or metadata stats from Mysql Catalog, the plugins are the answer. There are a lot of them already created and people can create their own plugin.

Speakers
avatar for Luis Dias

Luis Dias

DBA, Oracle
Oracle dba last 15 years Mysql dba since.... the pandemic startup


Thursday May 13, 2021 10:30 - 11:00 EDT
Room #7

10:30 EDT

Build your team, build your product, pay technical debt... REPEAT
Technology is moving faster than it used to be, teams and products need to adapt and react fast to the pace, technical debt is part of the journey.
I've spent 7 years building the engineering team at MedTrainer using talent in Mexico, our team is our strength, we are now paying the technical debt by leveraging knowledge from experts like Percona, and learning in the process. The talk will include some of the decisions and lessons learned in the process

Speakers
avatar for Mariano Rentería

Mariano Rentería

Director of Engineering, MedTrainer
Passionate about building software products, he is the Director of Engineering of MedTrainer. Have a podcast in spanish Chile, Mole & Tech, in which he and guests explain and discuss technology topics from Agile to NFT. I'm also part of the board of the PHP Mexico community, we host... Read More →


Thursday May 13, 2021 10:30 - 11:00 EDT
Room #5

10:30 EDT

How to Contribute to a Big, Complex Open Source Project
Don’t be intimidated by contributing to a complex open source project! In this session we’ll go over how to approach an open source project and get started contributing. The session will use Open Distro for Elasticsearch as an example but the information provided will be applicable to a variety of open source projects.

Speakers
avatar for Kyle Davis

Kyle Davis

Senior Developer Advocate, AWS
Kyle Davis is the Senior Developer Advocate with Open Distro for Elasticsearch at AWS. While being a relative newcomer to AWS, Kyle has a long history with software development and databases. When not working, Kyle enjoys 3D printing, and getting his hand dirty in his Edmonton, Alberta-based... Read More →


Thursday May 13, 2021 10:30 - 11:00 EDT
Room #4

10:30 EDT

Don’t Feed Me Dog Food and Call it a 5 Star Meal. How the Open Source Landscape is Being Hijacked.
The changing landscape of the open-source industry has taken a potentially dark turn in the last few years. Instead of focusing on inclusion, innovation, and collaboration a new generation of so-called open source drive companies has emerged flush with investor money and looking to maximize the returns for their investors and shareholders at all costs. In an effort to accelerate “revenue” and “profits” these companies are looking to rewrite the definition of what they consider open source. We are in a battle for not only the hearts and minds of the FOSS community but our collective future. As new developers start open source projects more will be compelled to choose more restrictive licensing models ( i.e. SSPL ), invest less in the community, and “control” as much of the code and product as possible. I will talk about the trend, talk about the common business models, and offer a few alternatives.

Speakers
avatar for Matt Yonkovit

Matt Yonkovit

HOSS (Head of Open Source Strategy), Percona
Matt Yonkovit has been in the Open Source Database Community for over 15 years working for MySQL AB, Sun Microsystems, Mattermost, and Percona. Matt has held technical roles, management, and executive roles serving the open source community.    He is currently serving as Percona's... Read More →


Thursday May 13, 2021 10:30 - 11:30 EDT
Room #6

11:00 EDT

A First Look at Aurora Serverless v2
During the latest re:Invent, AWS announced the next version of Amazon Aurora Serverless. The new version for the MySQL 5.7-compatible edition of Amazon Aurora scales in a fraction of a second and introduces multi-AZ support, global databases, and read replicas. What are the differences between v1 and v2? Why did AWS introduce an entirely separated new product? Is Aurora Serverless v1 a service to be already forgotten? A journey on the latest changes and a few tests running serverless databases on AWS.

Speakers
avatar for Renato Losio

Renato Losio

Principal Cloud Architect, Funambol
Renato is the Principal Cloud Architect at Funambol and an AWS Data Hero. He has over 15 years of experience as a software engineer, tech lead, and cloud architect across Italy, the UK, Portugal, and Germany. His main working interests include location-based services, relational... Read More →


Thursday May 13, 2021 11:00 - 11:30 EDT
Room #1

11:00 EDT

How We Processed 12 Trillion Rows During Black Friday
One of our clients in the retail space wanted to run real time analytics on all the sales data being generated during Black Friday.

In this talk we explain how we set up the infra, the data model and how we created the API endpoints to feed their dashboards to process 12T rows during Black Friday night

Speakers
avatar for Javi Santana

Javi Santana

Tech, Tinybird
co founder of TInybird, former CARTO CTO, works mostly designing data products


Thursday May 13, 2021 11:00 - 11:30 EDT
Room #2

11:00 EDT

How Open Source Powers the Modern Data Stack
In this talk, we’ll describe how you can leverage 3 open-source standards - workflow management with Aifrow, EL with Airbyte, transformation with DBT - to build your organization's modern data stack.

We’ll explain how to configure your Aifrow DAG to trigger Airbyte’s data replication jobs and DBT’s transformations with a concrete use case.

Speakers
avatar for Michel Tricot

Michel Tricot

Co-Founder & CEO, Airbyte
Michel has been working in data engineering for the past 15 years. As head of integrations and engineering director at Liveramp (NYSE: RAMP), he grew the team responsible for building and scaling the data ingestion and data distribution connectors, syncing 100s TB every day. In 2020... Read More →


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #9

11:00 EDT

Brand New Development Announced at PL
Brand New Development announced at Percona Live !!

This talk is about something we cannot announce publicly now and will be exclusively announced during Percona Live.

This is something new, that will be released after the conference, don't miss this session to discover new trendy stuff.

Brand New Development announced at Percona Live !!

Speakers
avatar for Johannes Schlüter

Johannes Schlüter

Software Engineering Manager, Oracle MySQL
Johannes Schlüter is a Software Engineering Manager in Oracle's MySQL Team. After development and management for different MySQL Connectors, he is now leading a new team, working on improving the MySQL experience in Cloud environments. Johannes is a long term Open Source contributor... Read More →
avatar for Kenny Gryp

Kenny Gryp

MySQL Product Manager, Oracle MySQL
MySQL Product Manager focussing on InnoDB, Replication and all things High Availability.


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #7

11:00 EDT

Scaling Out Distributed Storage Fabric with RocksDB
Engineers at Nutanix have been working on the challenge of building a next-generation architecture for its distributed storage fabric. Scaling this architecture to the needs of the future required three primary objectives: significant improvements in sustained random write performance, support for large-capacity deep storage nodes for multi-petabyte scale and reducing storage latency by a significant magnitude.
 
These goals required re-imagining the core approach to how metadata is stored in the fabric management system and move the metadata closer to where is the data is stored.
 
After extensive research and testing, RocksDB was chosen as the core component for this project, based on its open-source pedigree and proven reliability and industry adoption. Within a few months, the engineering team was able to ramp up expertise, build confidence with the open-source technology and eventually grow its adoption into several core products at Nutanix.
 
In this technical talk, we will share the new architecture, deployment mode and some of the early lessons learned in adopting RocksDB and discuss some innovative enhancements we were able to make to fit our performance goals and objectives.
 
One of the significant improvements has been the addition of async read/write support to RocksDB. Currently, the open source RocksDB exposes blocking I/O APIs which can limit overall system throughput under resource constraints. We developed a Fibers/Co-routine based non-blocking I/O solution for RocksDB.
 
In addition to this, we plan to talk about topics and projects that have been built on this enhanced RocksDB implementation.
These projects will become the foundation for the Nutanix future products.

Speakers
YK

Yasaswi Kishore

Senior member of Technical Staff, Nutanix
Yasaswi is a senior member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Yasaswi completed his undergraduate program in Computer Science at PES University, Bangalore, India.
avatar for Sandeep Madanala

Sandeep Madanala

Nutanix
Sandeep is a Senior technical manager in the metadata subsystem for Nutanix distributed filesystem. He leads and manages the ChakrDB team, a scale out KV Store built on top of RocksDB. Prior to Nutanix, Sandeep worked at VMWare and graduated from Indian Institute of Technology, M... Read More →
avatar for Raghav Tulshibagwale

Raghav Tulshibagwale

Staff Engineer, Core Data Path, Nutanix Inc.
Raghav is a Staff engineer and technical lead in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Raghav worked on Database Kernels and filesystems. Raghav completed his Masters in Computer Science from University of Southern California, Los Angeles.
avatar for Pulkit Kapoor

Pulkit Kapoor

MTS, Core Data Path, Nutanix, Inc.
Pulkit is a member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Pulkit completed his Masters in Computer Science at University of Wisconsin, Madison.


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #4

11:00 EDT

Data Access Control in PostgreSQL with Row-Level Security
Row-Level Security has been around in PostgreSQL since version 9.5, but very few projects are using it. This is probably because applications have been historically controlling access to data at the level of the application. Delegating data access control to the database is not common practice, but now it is possible with Row-Level Security (RLS). In this talk we will cover from the basics of RLS to some design patterns to move data access control to PostgreSQL, simplifying parts of the implementation of the application.

Speakers
avatar for Boriss Mejias

Boriss Mejias

Solution Architect, EDB
I'm a holistic system software engineer (officially known as Solution Architect), PostgreSQL consultant and trainer, free software activist, and headbanger. I have been working with PostgreSQL since version 9.1. First, as part of my job related to other projects, and with full dedication... Read More →


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #3

11:00 EDT

Interesting Features of PostgreSQL 13
The latest major version of PostgreSQL (13) was released last Autumn.

Every summer is generally considered as the starting point for mass migrations to new versions.
Before someone migrates to the new version, it will be interesting to know what are the new features in PostgreSQL 13.

This talk includes:

1. New features and functionalities introduced
2. Indexing Improvements
3. HA / Standby Improvements
4. Interesting features for DBAs and Operations team
5. Optimizer Improvements
6. Monitoring Improvements
7. Security / Authentication improvements
8. Server Configuration
9. General performance and optimization
10. Partitioning improvements
11. Client library improvements
12. Deprecated/obsolete features
13. Improvements and changes to tools - psql, pgbench, vacuumdb, pg_ctl, pg_upgrade, pg_checksums, pg_rewind, pg_dump, pg_dumpall,pg_restore

Small live demonstrations will be included.
This presentation is expected to spearhead to upgrade to PostgreSQL 13

Speakers
avatar for Jobin Augustine

Jobin Augustine

PostgreSQL Escalation Specialist, Percona
Jobin Augustine is a PostgreSQL expert and Open Source advocate and has more than 19 years of working experience as consultant, architect, administrator, writer, and trainer in PostgreSQL, Oracle and other database technologies. He has always been an active participant in the Open... Read More →


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #5

11:30 EDT

Implementing a Hybrid Column Level Encryption in MySQL
Many databases have column (or field) level encryption - a special type of encryption to protect sensitive data like credit card numbers or social security numbers. MySQL has transparent table level encryption but does not have more granular / application centric column level encyption. In this talk I will demonstrate how one can implement a column level encryption which combines an application key and a server key. This design will also require no to minimal changes to the application and provide a higher level of protection for sensitive data.

Speakers
AR

Alexander Rubin

Senior Database Engineer, Amazon Web Services
Alexander was doing MySQL consulting since 2006 at MySQL Ab, Sun, Oracle and then Percona. At AWS Alexander is working on RDS MySQL and MariaDB.


Thursday May 13, 2021 11:30 - 12:00 EDT
Room #1

11:30 EDT

How SQLAlchemy and Python DB-API 2.0 Lets Superset Support Hundreds of Databases
Apache Superset is a modern, open source BI platform that can talk to just about any SQL speaking database. While many BI tools opted for the technical strategy of building highly native connector libraries to support each database, the Superset project decided to bet on the SQLAlchemy ORM and the Python DB-API 2.0 spec to support many databases without much custom code.

I provide an overview of these ideas in this blog post (https://preset.io/blog/building-database-connector/) but I'd like to go much, much deeper in this conference talk and hopefully convince more database creators out there to bet on these same platforms and specs. To really drive the point home, I will also live code and add in support for a new database in Superset "on stage".

Speakers
avatar for Srini Kadamati

Srini Kadamati

Senior Developer Advocate and Apache Superset Committer, Preset.io
I'm a Senior Data Scientist that's on a mission to enable more people to work with data effectively. I spent 5 years building an online learning platform specifically to help people learn existing data tools before turning my attention to improving the data tools themselves. I now... Read More →


Thursday May 13, 2021 11:30 - 12:00 EDT
Room #2

11:30 EDT

When and Why to Use MariaDB: Key Features in 10.0 to 10.5
MariaDB sees about one new release each year, and each adds new features that may go unnoticed. This session will be a high-level overview of new and underused features included in recent MariaDB releases. There are far too many to cover in detail, but we will go through some you should know about if you are developing on MariaDB.

Some examples include:
* The CONNECT storage engine in 10.0
* IF EXISTS, IF NOT EXISTS, and OR REPLACE clauses in 10.1
* The MyRocks storage engine in 10.2
* System versioned tables 10.3
* Temporal tables in 10.4
* INSERT and REPLACE ... RETURNING in 10.5
and much more!

Speakers
avatar for Ian Gilfillan

Ian Gilfillan

Principal technical writer: documentation, MariaDB Foundation
Ian first came across MySQL in the 90s, upgrading from mSQL while developing South Africas' first online grocery store, and teaching and developing internet programming courses. He was lead developer for South Africa’s largest media company from 2000, and wrote the book Mastering... Read More →


Thursday May 13, 2021 11:30 - 12:00 EDT
Room #6

11:30 EDT

Kubernetes Operator for Presto
Presto Operator for Kubernetes is used to manage Presto clusters which are deployed as custom resources. In short, the task of configuring, creating and managing of Presto cluster(s) in a Kubernetes environment has been made simple, easy and quick with a Presto Operator.

This session will be a walkthrough of provisioning Presto in K8s using Presto Operator. The advanced features of Presto Operator like HTTPS support, additional volumes support, autoscaling and graceful shutdown of workers will also be discussed. Future work and areas where the community can contribute will be discussed too.

Speakers
avatar for Hemant Bhanawat

Hemant Bhanawat

Engineering, Yugabyte
Founded Falarica Analytics which was acquired by YugabyteDB. Founding engineer with SnappyData which was acquired by TIBCO. More than 18 years of experience in building and delivering products like distributed databases, distributed query processing engines and distributed KV sto... Read More →


Thursday May 13, 2021 11:30 - 12:00 EDT
Room #10

12:00 EDT

Collaboration in Open Source: A Jungle That Needs Structure
Open Source is more than software, and contributions don't just happen at the code level.

Open Source is a development model and a mentality, which differs from legacy, closed, commercial models. Sure, the code is open, but so is communication and collaboration. Or rather, so they should be. Open collaboration is easier said than done, but it's a key aspect of the overall usability and productivity of Open Source software usage.

With MariaDB as the case in point, we look at the use of open tools to collaborate within the community of developers using MariaDB Server. What is the right way to report bugs and request features (over Jira, in our case)? And using and improving documentation (over Knowledgebase, for MariaDB)? Yes, Github hosts the MariaDB source code, but doesn’t it overlap with Jira? How do you get to chat with the fellow developers (over Zulip, in our case)? What is the proper use of blogs? Of Youtube videos? Of conferences? Of Stack Overflow and other tools?

MariaDB does not claim to sit in with the perfect Open Source development and collaboration model. But we have put thought into it. This presentation hopefully provides the audience new ideas to structure their own communication, and gives MariaDB improvement ideas for its own processes.

Speakers
avatar for Kaj Arnö

Kaj Arnö

CEO, MariaDB Foundation
Kaj Arnö is CEO of the MariaDB Foundation. He is a software industry generalist, having served as VP Professional Services, VP Engineering, CIO and VP Community Relations of MySQL AB prior to the acquisition by Sun Microsystems. At Sun, Kaj served as MySQL Ambassador to Sun and Sun... Read More →


Thursday May 13, 2021 12:00 - 12:15 EDT
Keynote

12:15 EDT

State of the Dolphin
Frederic will talk about the latest improvements in MySQL 8.0, MySQL 8.0.24 is just out! He will talk about MySQL Engineering Team's steady progress with MySQL 8.0. These include solutions like Document Store, InnoDB Cluster, and InnoDB ReplicaSet where MySQL Router and MySQL Shell are playing an important role and of course what we have been busy with in the past 6 months. Fred will also announce something new developed by the MySQL Team. And of course, all of these Oracle solutions are completely OpenSource.

Speakers
avatar for Frédéric Descamps

Frédéric Descamps

MySQL Community Manager, Oracle
"@lefred" has been consulting OpenSource and MySQL for almost 20 years. After graduating in Management Information Technology, Frédéric Descamps started his career as a developer for an ERP under HPUX. He will then opt for a career in the world of open-source by joining one of the... Read More →


Thursday May 13, 2021 12:15 - 12:30 EDT
Keynote

12:30 EDT

Database Administrators, Your Skills Are Needed in the Cloud-Native Future
As a database administrator, you have been in the middle of some amazing stories of scale and digital transformation. Now our industry is moving quickly into cloud-native architectures and the need for skills to get there has never been greater. The role of Site Reliability Engineer(SRE) is one of the fastest-growing job fields in IT and DBAs have the right combination of background and skills to make the transition. I’m here to make the case. Time to make the move from DBA to SRE. 

Speakers
avatar for Patrick McFadin

Patrick McFadin

VP Developer Relations, Datastax


Thursday May 13, 2021 12:30 - 12:45 EDT
Keynote

12:45 EDT

ScyllaDB, Beyond Cassandra
The ScyllaDB team began a journey of implementing a better, C++ version of Cassandra with the same API and a promise of 10x the performance more than 6 years ago. Finally, in 2020 the mission of a complete, battle-tested API was done and we even added a compatible DynamoDB API.

All along the team improved not just the performance but other aspects, from shard-aware drivers, heat-based load balancing, and seedless nodes. We’re proud of the result, but our work is far from done. This year, we introduce Raft which allows us to leap beyond the eventual consistency nature of Cassandra. The talk will cover the reasons for the introduction of Raft and how Raft comes into play with regard to consistency, elasticity, and ease of operation of Scylla.

Speakers
avatar for Dor Laor

Dor Laor

CEO, ScyllaDB
Dor Laor is the CEO of ScyllaDB. Previously, Dor was part of the founding team of the KVM hypervisor under Qumranet that was acquired by Red Hat. At Red Hat Dor was managing the KVM and Xen development for several years. Dor holds an MSc from the Technion and a PhD in snowboardin... Read More →


Thursday May 13, 2021 12:45 - 13:00 EDT
Keynote

13:00 EDT

Understanding AWS RDS Aurora Capabilities
The RDS Aurora MySQL/PostgreSQL capabilities of AWS extend the HA capabilities of RDS read replicas and Multi-AZ.

In this presentation we will discuss the different capabilities and HA configurations with RDS Aurora including:

* RDS Cluster single instance
* RDS Cluster multiple instances (writer + 1 or more readers)
* RDS Cluster multi-master
* RDS Global Cluster
* RDS Cluster options for multi-regions

Each option has its relative merits and limitations. Each will depend on your business requirements, global needs and budget.

This presentation will include setup, monitoring and failover evaluations for the attendee with the goal to provide a feature matrix of when/how to consider each option as well as provide some details of the subtle differences Aurora provides.

This presentation is not going to go into the technical details of RDS Aurora's underlying infrastructure or a feature by feature comparison of AWS RDS to AWS RDS Aurora.

Speakers
avatar for Ronald Bradford

Ronald Bradford

Lead Database Engineer/Architect, Lifion by ADP
A seasoned professional in the RDBMS industry, Ronald brings over a decade of AWS experience with using MySQL and supporting technologies in the Cloud. His broad architectural expertise across a variety of different industry sectors helps organizations tackle the needs of ensuring... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #6

13:00 EDT

Working Effectively with Cloud SQL on Kubernetes
There's a lot to like about Kubernetes and Cloud SQL, but it's not always clear how to use them together. This talk will be a deep-dive into Cloud SQL and how to work with it effectively on Kubernetes. The talk will be developer-centric and will start with the basics of connecting to Cloud SQL from Kubernetes. It will include how to run load tests and then size an instance. And, it will end with how to enable connection pooling to get the most out of Cloud SQL while having some fun along the way.

Speakers
avatar for Eno Compton

Eno Compton

Developer Relations Engineer, Google
Eno is a Developer Relations Engineer at Google working on Cloud SQL. He is one of the maintainers of the Cloud SQL Auth proxy. He is also a total language nerd with a Ph.D. in Classical Chinese and Japanese and a decade of experience in nearly a dozen computer languages. Nowadays... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #8

13:00 EDT

Optimizing and Troubleshooting PostgreSQL with PMM
Optimizing and Troubleshooting PostgreSQL with PMM. In this presentation, we will show how you can utilize PMM (Percona Monitoring and Management) to eciently monitor PostgreSQL systems, and diagnose various issues that you can face running PostgreSQL.
We will look at, but not only:
  • Identifying pending queries
  • Troubleshooting performance degradation
  • Looking for possible optimizations
After attending this presentation, you should be comfortable understanding how PMM can be used to work with PostgreSQL and help in the daily lives of DBAs or anyone dealing with that database.

Speakers
avatar for Agustín Gallego

Agustín Gallego

Support Engineer, Percona
Agustín joined Percona's Support team in December 2013. He has previously worked as a Cambridge IT examinations Supervisor and as a Junior BI, SQL & C# developer. He is studying to get a Computer Systems Engineer degree at the Universidad de la República, in Uruguay.
avatar for Sergey Kuzmichev

Sergey Kuzmichev

Support Engineer, Percona
Sergey is a support engineer in Percona. Interested in all things databases, he's currently working mainly with MySQL and PostgreSQL. He started his career working as an Oracle DBA, later moving to a DevOps engineer role supporting a Java-based trading platform running on PostgreSQL... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #1

13:00 EDT

Successfully Run Your MySQL NDB Cluster in Kubernetes
Fortunately, MySQL NDB Cluster already has auto-healing, data distribution, instant scaling and many other features built-in - making it a perfect t for Cloud Native. This session walks through the few steps necessary to deploy a distributed NDB setup in a Kubernetes cluster manually or with an operator.

NDB runs in Kubernetes serving mission critical microservices at the heart of Cloud Native production systems. The experience from these adventures mix with knowledge gained from building an NDB operator from scratch. Boiled down to a few tips and tricks are hopefully helpful to guide around the usual traps running NDB or any database in Kubernetes.

Speakers
avatar for Tiago Alves

Tiago Alves

Oracle MySQL
Tiago is a software engineering manager for MySQL NDB Cluster. Tiago is passionate about software engineering with focus on software quality.


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #7

13:00 EDT

GraphJin - The Automagical GraphQL to SQL Compiler
In 2015, Facebook introduced GraphQL, a front-end query framework designed to shield users from the intricacies of the various backends one would find in modern data stacks. As with ORMs before then, most backend code ended up being databases, adding layers of abstraction that ended up being inefficient and required immense investments to scale out for performance.

Additionally, most app developers are not very familiar with SQL and go to great lengths to avoid learning it. This has created several problems like n+1 queries, inefficient queries, minimal use of database features, etc. While Postgres is growing in popularity with this audience the large majority of advanced features like JSON support, Recursive CTE’s, Window functions are never used. Often developers are unfamiliar with even simple features like the various types of JOINS and choose inefficient solutions like multiple queries instead.

GraphJin, an open-source project, was developed to solve this disconnect by putting all the power in the hands of the UI/UX developer and freeing up the backend developer to focus on the truly hard problems and optimize the queries to take full advantage of the advanced features of Postgres.

GraphJin is a compiler written in Go that can convert the GraphQL describing the data needed into a single efficient SQL query optimized for Postgres or MySQL. It discovers the schema and relationship graph of the database to help it build efficient queries and provide the frontend developer an auto-complete enabled GraphQL query builder to quickly fetch the data needed.

https://github.com/dosco/graphjin

Speakers
avatar for Vikram Rangnekar

Vikram Rangnekar

Founder 42papers.com, 42papers.com
Vikram Rangnekar grew up in Bombay, studied computer science at the University of Delaware. He founded Socialwok a Techcrunch50 startup that was early in the enterprise collaboration space. This led him to Linkedin early in 2010 where he worked on various things from the API platform... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #4

13:00 EDT

Introducing Transit Nodes: A Sparse Data Structure for Recording (Sharding) Denormalizations
At Box, we have a fairly uncommon combination of business requirements that, when taken together, means that our relational data access layer must implement cross-shard move operations and orchestration. These moves can be large, and often need to be split across multiple asynchronous transactions. In the middle of this asynchronous orchestration, objects that would ordinarily live on the same shard, may be split across two shards. Our mapping database must faithfully record where each object currently resides, as well as the intended destination.

Viewed more generally, we have a system described by the following:
* A sharded data store;
* With a tree of relationships between object types that can be traversed upwards and downwards;
* With denormalized data that is propagated through the graph (in our case, the target shard id);
* Where the denormalized data is mutable, and might need to be updated in response to a move operation higher up in the tree;
* Where the application needs to control when and how the denormalized data is updated;
* And the application does not need to use the denormalized data in a relational fashion (it doesn't need to be indexed, used in a WHERE clause, etc.)

We recently finished developing and deploying an enhancement to our mapping system, to be able to store the denormalize data in a sparse data structure, with high read performance. When moves are not in progress, no additional data storage is needed besides the graph itself, and reads on the denormalized data are made efficient via caching. When moves are in progress, "transit node" rows are inserted into the mapping database in order to precisely record the new state of objects that have moved already, but while retaining the state of the objects that haven't moved yet. After the moves, the transit node rows can be garbage collected.

The transit node concept was carefully designed with a number of invariants, which make it very safe to cache values without worrying about cache corruption or cache invalidation. We designed the concept for ourselves to store shard IDs, but can theoretically be used for other kinds of denormalizations that match the above generalization.

We will briefly cover the context of sharding at Box, to provide the motivation for the transit node concept. The rest of the talk will present the semantics, invariants, and behaviors of transit nodes, and some results from our deployment. My hope is that the concept can be more broadly useful beyond what we originally designed it for.

Speakers
avatar for Jordan Moldow

Jordan Moldow

Staff Software Engineer, Box, Inc.
Jordan Moldow is a Staff Software Engineer on Box’s Database Tools and Automations team. After earning MIT BS degrees in CSE and mathematics in 2014, Jordan moved to California to join Box. Jordan and his teammates focus on backend database infrastructure, providing the tools, intermediate... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #3

13:00 EDT

Introduction to Presto: The SQL Engine for Data Platform Teams
Presto is an open source high performance, distributed SQL query engine. Born at Facebook in 2012, Presto was built to run interactive queries on large Hadoop-based clusters. Today it has grown to support many users and use cases including ad hoc query, data lake analytics, and federated querying. In this session, we will give an overview on Presto including architecture and how it works, the problems it solves, and most common use cases. We'll also share the latest innovation in the project as well as what's on the roadmap.

Speakers
avatar for Tim Meehan

Tim Meehan

Software Engineer, Facebook
Tim is a Software Engineer at Facebook working the core Presto engine. He is also the Chairperson of the Technical Steering Committee of Presto Foundation that hosts Presto under the Linux Foundation. As the chair and a Presto committer, he is works with other foundation members to... Read More →
avatar for Dipti Borkar

Dipti Borkar

Cofounder & Chief Product Officer, Ahana
Dipti Borkar is the Cofounder, Chief Product Officer & Chief Evangelist at Ahana, the Presto company. She is responsible for all things strategy, product and community. She is also the Chairperson of the Presto Foundation, Outreach team. She has over 15 years of experience in data... Read More →


Thursday May 13, 2021 13:00 - 13:30 EDT
Room #10

13:30 EDT

A QLDB Cheatsheet for MySQL Users
Amazon's new ledger database (QLDB) is an auditor's best friend and lives up to the stated description of "Amazon QLDB can be used to track each and every application data change and maintains a complete and verifiable history of changes over time."

This presentation will go over what was done to take a MySQL application that provided auditing activity changes for key data, and how it is being migrated to QLDB.

While QLDB does use a SQL-format for DML, and you can perform the traditional INSERT/UPDATE/DELETE/SELECT. The ability to extend these statements to manipulate Amazon Ion data (a superset of JSON) gives you improved data manipulation and for example the FROM SQL statement.

Get a blow by blow comparison of MySQL structures (multiple tables and lots of columns) and SQL converted into a single QLDB table, with immutable, and cryptographically verifiable transaction log. No more triggers, duplicated tables, extra auditing for abuse of binary log activity.

We also cover the simplicity of using X Protocol and JSON output for data migration, and the complexity of AWS RDS not supporting X Protocol.

Speakers
avatar for Ronald Bradford

Ronald Bradford

Lead Database Engineer/Architect, Lifion by ADP
A seasoned professional in the RDBMS industry, Ronald brings over a decade of AWS experience with using MySQL and supporting technologies in the Cloud. His broad architectural expertise across a variety of different industry sectors helps organizations tackle the needs of ensuring... Read More →


Thursday May 13, 2021 13:30 - 14:00 EDT
Room #6

13:30 EDT

Demystifying Database Performance Issues With SQL Commenter and Cloud SQL Insights
Have you ever tried to troubleshoot a database performance issue in an application that was built using an ORM? ORMs can simplify development of applications that communicate with databases, but since the ORMs are generating the SQL statements, it can be difficult to determine which application code is resulting in slow queries.

SQL Commenter is an open source library that enables ORMs to augment SQL statements with comments about the code that caused its execution, making it easier to correlate your application code with the SQL statements that were generated by the ORM.

In this session, we will demonstrate how to set up and use SQL Commenter with an application that uses Sequelize.js to diagnose query performance. We'll also touch on the other frameworks and ORMs that sqlcommenter supports as well as how you can view this data in db logs and observability tools, including Cloud SQL Insights.

Speakers
avatar for Jan Kleinert

Jan Kleinert

Developer Advocate, Google
Jan Kleinert leads a team of Developer Advocates as part of Google Cloud, focusing on Compute and Databases. Prior to joining Google, she worked in a variety of roles ranging from developer relations to web analytics and conversion optimization.


Thursday May 13, 2021 13:30 - 14:00 EDT
Room #8

13:30 EDT

Percona Monitoring and Management Customization for Greater Visibility
PMM (Percona Monitoring and Management) delivers out-of-the-box on a rich set of MySQL, PostgreSQL, MongoDB, ProxySQL, and HAProxy service metrics, along with OS resource metrics, providing deep visibility for the thousands of PMM installations worldwide. Did you know that PMM can be enhanced to also integrate metrics about your Data? Once you teach PMM about your Data, you'll be able to bring Alerting and other customizations to your environment - come to this talk in order to learn how to:

  • Leverage custom queries in MySQL in order to generate new graphs of your data
  • Compose and interact with PMM's Integrated Alerting
  • Importing and designing custom dashboards - build rich visualizations that map your data alongside your system performance
  • Developing Alerts based on query performance variations
  • How you can contribute to PMM

Speakers
avatar for Michael Coburn

Michael Coburn

Principal Architect, Percona
Michael joined Percona as a Consultant in 2012 after having worked with high-volume stock photography websites and email service provider platforms. With a foundation in Systems Administration, Michael acted as Product Manager responsible for Percona Monitoring and Management (PMM... Read More →


Thursday May 13, 2021 13:30 - 14:00 EDT
Room #1

13:30 EDT

Crave for Speed? Accelerating Open-Source Project Builds
Joining and contributing to an open-source project often involves a significant effort and learning curve and can often end up as a challenging experience.

Helping newcomers who may be junior or experienced developers get up to speed quickly with minimal changes to their local development environment is a desirable path and can help bring in a new generation of developers and hobbyists to sustain and grow the open source communities.

Crave Cloud is a free service that allows developers to submit pull requests to their favorite open-source projects, and receive a private build for testing and download in a fraction of the time it usually takes to build the entire project. All with minimal setup and changes to your local development environment and without taking up the precious cycles of your local machine. Using the elastic capacity of the cloud, Crave can automatically submit your changes, accelerate the build by 6-10x and return a private binary for your testing.

With Crave OSS, we hope to encourage developers to join open source projects, learn by tinkering with code and effortlessly submit changes without making changes to the core code. Join us for a fast-paced introduction to Crave Developer Cloud and get access to a free cloud sponsored by EquinixMetal, Nutanix to support CNCF projects.

Speakers
YK

Yuvraaj Kelkar

CEO/Co-founder, Crave.io
Yuvraaj is a systems engineer who wrote code in his previous jobs that took hours to build and test. He then cofounded Crave.io to improve developer productivity using a remote task execution platform called Crave that reduces the time required to clone, build and test code.
avatar for Mehboob Alam

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.
Mehboob is a long-time open-source advocate and evangelist in the Postgres community, co-organizer of various community meetups and the annual global Postgres US conference. At Nutanix, he guides the development and support of Postgres in the Era DBaaS platform and helps customers... Read More →


Thursday May 13, 2021 13:30 - 14:00 EDT
Room #2

13:30 EDT

PrestoDB Administration Fundamentals – Why, What and How
The session will discuss how to set up, run, and scale Presto at your organization. You will learn about configuring data sources, memory requirements and monitoring, in addition to how to access the metadata information and get run time information by queries as well as through the admin UI. In case you need to see the live plan, there will be methods shown and explained. We'll also cover some of the tuning mechanisms for Presto. By the end of this session you'll be able to handle PrestoDB in large deployments.

Speakers
avatar for Ravi Shankar

Ravi Shankar

Chief Consultant, PassionBytes
Chief Consultant at PassionBytes providing big data consultancy and services in United States and an early adopter of Presto since it has been open sourced by Facebook. Ravi’s passion is to combine/merge various systems/components into a more reusable and innovative platform, reducing... Read More →


Thursday May 13, 2021 13:30 - 14:00 EDT
Room #10

13:30 EDT

HammerDB: A Better Way to Benchmark Your Open Source Database
HammerDB is the leading open source database benchmarking software for commercial and open source databases. Hosted by the industry-standard benchmarking body the TPC, HammerDB supports workloads derived from the transactional TPC-C and Analytic TPC-H benchmark specifications.
In this session the lead developer of HammerDB will explain what it does, how it works and how it has been designed to avoid the pitfalls so common to other database benchmarking software to deliver high performance and scalability.
Using PostgreSQL and MySQL we will walk through practical transactional benchmarking scenarios looking at the operating system and database configuration tuning and analysis giving insights into benchmarking skills that can be deployed in your own environment.
Finally, we will look at where HammerDB is going with future development and features planned for 2021 and beyond and how you can get involved in the HammerDB community to help make comparing and contrasting database performance open to all.

Speakers
avatar for Steve Shaw

Steve Shaw

open source database lead, Intel
Steve Shaw is the open source database lead for Intel and lead developer of the open source database benchmarking tool HammerDB. With more than 20 years experience in commericial database he is also the author of 2 books on Oracle on Linux. He now focuses on levelling up open source... Read More →


Thursday May 13, 2021 13:30 - 14:30 EDT
Room #5

13:30 EDT

MySQL High Availability Options in the Cloud - Compared
High availability is one of the most important characteristics of a mission-critical database environment, doesn’t matter where it runs. Running MySQL in the public cloud is really easy these days. Pick a cloud provider, MySQL service, and start using it. Each service is different, though. AWS RDS, AWS Aurora, Google Cloud SQL, Azure Database for MySQL, Oracle MySQL Database Service - each service provides different high availability options and guarantees. Navigating through the options may be difficult and wrong choices may have a real business impact! Do you know how many “nines” you can’t count on?

In this session, Michal will discuss the true characteristics of MySQL cloud services' high availability options, their cost impact, and DIY on IaaS alternatives - so the next time you’re choosing how to run your MySQL in the cloud, you’ll be able to make a well-informed decision. Additionally, you’ll learn about some of the typical misconceptions concerning the most popular MySQL cloud services.

If you’re given the number of “nines” to guarantee but a myriad of choices and true cost still mystifies you, or (what’s worse!) you think that you don’t need to care because the cloud is “always-on” - this talk is for you.

Speakers
avatar for Michal Nosek

Michal Nosek

Enterprise Architect, Percona
During ten years of his career, Michal took different roles from a software engineer and business analyst to a technical sales consultant, always staying close to the technology. He has hands-on experience with a broad range of programming languages and database technologies in different... Read More →


Thursday May 13, 2021 13:30 - 14:30 EDT
Room #4

13:30 EDT

How Adobe Does Millions of Records Per Second Using Apache Spark
Adobe's Unified Profile System is the heart of its Experience Platform. It ingests TBs of data a day and is PBs large. As part of this massive growth we have faced multiple challenges in our Apache Spark deployment which is used from Ingestion to Processing. We want to share some of our learnings and hard-earned lessons and as we reached this scale.

Repeated Queries Optimization - or the Art of How I learned to cache my physical Plans. SQL interfaces expose prepared statements, how do we use the same analogy for batch processing?
Know thy Join - Joins/Group By are unavoidable when you don't have much control over the data model, But one must know what exactly happens underneath given the deadly shuffle that one might encounter.
Structured Streaming - Know thy Lag - While consuming off a Kafka topic that sees sporadic loads, its very important to monitor the Consumer lag. Also makes you respect what a beast backpressure is.
Skew! Phew! - Skewed data causes so many uncertainties, especially at runtime. Configs that applied on day zero no longer apply on day 100. The code must be made resilient to Skewed datasets.
Sample Sample Sample - Sometimes the best way to approach a large problem is to eat a small part of it first.
Redis - Sometimes the best tool for the job is actually outside your JVM. Pipelining + Redis is a powerful combination to supercharge your data pipeline.
We will present our war stories and lessons for the above and hopefully will benefit the broader community.

Speakers
avatar for Yeshwanth Vijayakumar

Yeshwanth Vijayakumar

Sr. Engineering Manager/Architect, Adobe Systems Inc
I am a Sr. Engineering Manager/Architect on the Unified Profile Team in the Adobe Experience Platform; it’s a PB scale store with a strong focus on millisecond latencies and Analytical abilities and easily one of Adobe’s most challenging SaaS projects in terms of scale. I am actively... Read More →


Thursday May 13, 2021 13:30 - 14:30 EDT
Room #3

14:00 EDT

How To Choose the Right Solution When Lifting and Shifting Your Database to the Cloud
Database migrations can be complex, time-consuming, and costly, increasing the risk of moving to the cloud. Enterprises need effective and reliable database migration tools that help automate this process for seamless cloud adoption. In this session, we will elaborate on the considerations to take into account when choosing database migration methods and technology, and the different aspects that can impact risk such as migration fidelity and downtime. You’ll leave knowing how to take into consideration critical aspects of the migration journey such as preparation, connectivity, and downtime, and how to leverage Database Migration Service for Google Cloud for migration success.

Speakers
avatar for Shachar Guz

Shachar Guz

Product Manager, Google
Shachar is a product manager at Google Cloud, he works on the Cloud Database Migration Service. Shachar worked in various product and engineering roles and shares a true passion about data and helping customers get the most out of their data. Shachar was formerly a product manager... Read More →


Thursday May 13, 2021 14:00 - 14:30 EDT
Room #8

14:00 EDT

Validating JSON
JSON or JavaScript Object Notation has become the data interchange format of choice. Most relational databases have added a JSON data type (Oracle, Postgresql, MySQL) or some accommodation for JSON data (SQL Server, MariaDB). But the free form nature of JSON is problematic for relational databases resulting in compromises in speed, handling of key-value pairs, and general lack of the ability to validate data. RDMS have had the ability to check for missing values, data type checks, and range checks but that is lacking in the JSON sphere. However, JSON-Schema.org has developed a vocabulary to annotate and validate JSON documents to describe your data formats, documents your implementation, and provides a way to validate data to allow both automatic testing and assuring data quality.

The work of JSON-Schema.org is heading towards RFC status and could very well remove many of the objections to JSON data use in a relational system. We will look at whom in starting to use their methods and the progress in standardization.

Speakers
avatar for Dave Stokes

Dave Stokes

MySQL Community Manager, Oracle
Dave Stokes is a MySQL Community Manager for Oracle Corporation and travels extensively to promote MySQL, speaking over thirty times each year for the past several years. He is also the author of MySQL & JSON - A Practical Programming Guide which is a guide for those wishing to take... Read More →


Thursday May 13, 2021 14:00 - 14:30 EDT
Room #7

14:00 EDT

Databases: The Anchor in Your CI/CD Process
DevOps is about improving processes to develop and deliver quality software with both speed and stability. At face value, it's a simple concept. However, fear of instability and the desire to control databases are preventing many organizations from updating their process.
 
The fear of changing processes and automating is understandable. Databases have never been more important because data has never been more important. Every disaster nightmare a DBA, compliance officer, or PR team can imagine is wrapped around ensuring the database is safe.

In this talk, Kristyl Gomes and Robert Reeves will demonstrate why implementing standardization and automation is necessary to achieve what every team wants—speed and stability, with control.

Speakers
avatar for Kristyl Gomes

Kristyl Gomes

Director of Quality Engineering, Liquibase
Kristyl has over 15 years of experience in software quality assurance that spans mainframe, desktop, mobile & web applications. At Liquibase, Kristyl is responsible for ensuring the technical quality of all Liquibase products. Kristyl holds a BE degree in Electronics Engineering from... Read More →
avatar for Robert Reeves

Robert Reeves

CTO, Liquibase
As chief technical officer, Robert Reeves advocates for Datical's customers and provides technical architecture leadership. Prior to co-founding Liquibase, Robert was a Director at the Austin Technology Incubator. At ATI, he provided real world entrepreneurial expertise to ATI member... Read More →


Thursday May 13, 2021 14:00 - 14:30 EDT
Room #1

14:00 EDT

GraphQL as Analytical Language for Data Warehouses
GraphQL is a perfect language to query OLAP databases and make BI analytics on top of data warehouses (DWH). We at Bitquery built API based on GraphQL, allowing users to easily query DWH without knowledge on underlying low-level things like servers, databases, cubes and metrics.

We will share our approach, experience, tools that we used, pro and cons of this approach. Our experience will be useful for the developers and users of OLAP, DWH and BI solutions.

Speakers
AS

Aleksey Studnev

CTO, Bitquery LLC
Aleksey is CTO and founder of Bitquery LLC. Before he tool chief architect and founder of successfull start-ups in AdTech industry, focused on data analytics and optimisation. Aleksey is passionate about applying mathematical approaches in the software development


Thursday May 13, 2021 14:00 - 14:30 EDT
Room #6

14:00 EDT

5 Ways Facebook’s Ludicrous Usage Drives Presto Innovation
Presto at Facebook has evolved significantly since its inception in 2012. In this session, Ariel will discuss this evolution including fundamental architectural improvements on scale and efficiency, the business cases that drive improvements like these, how the Presto use cases at Facebook have grown (and what they are), and how this all is balanced with features that the Presto community looks for. You'll learn how a company like Facebook thinks about Presto, how it can be used, and where it's going.

Speakers
avatar for Ariel Weisberg

Ariel Weisberg

Software Engineer, Facebook
Currently working on Presto @ Facebook. Previous was Apache Cassandra committer and PMC member Before that I was the third engineer to start working on VoltDB back when it was incubating in Vertica and we called it Horizontica. I enjoy developing scalable, reliable, consistently... Read More →


Thursday May 13, 2021 14:00 - 14:30 EDT
Room #10

14:00 EDT

Introduction into MySQL Query Tuning for Dev[Op]s
In this talk I will show how to get started with MySQL Query Tuning. I will make a short introduction into physical table structure and demonstrate how it may influence query execution time. Then we will discuss basic query tuning instruments and techniques, mainly EXPLAIN command with its latest variations. You will learn how to understand its output and how to rewrite queries or change table structure to achieve better performance.

Speakers
avatar for Sveta Smirnova

Sveta Smirnova

Principal Support Escalation Specialist, Percona
Sveta Smirnova is a MySQL Support Engineer with over 10 years of experience. She currently works in Percona. Her main professional interests are problem-solving, working with tricky issues, bugs, finding patterns that can solve typical issues quicker, teaching others how to deal with... Read More →


Thursday May 13, 2021 14:00 - 15:00 EDT
Room #2

14:00 EDT

Extending PostgreSQL to a Google Spanner Architecture
PostgreSQL is an open source RDBMS, that is widely adopted for its powerful set of features while being fully extensible. However, it is hard to run PostgreSQL as a cloud-native database - to inherently survive failures, be highly available, scale horizontally, and be deployed in geo-distributed configurations. Google Spanner is a distributed SQL database that has these features, however does not offer the power of PostgreSQL.

Combining the best of these two databases would result in a very compelling database. YugabyteDB is a fully open source distributed SQL database aimed at achieving exactly this goal. In this talk, we will look at the architecture of YugabyteDB that enables it to support all PostgreSQL features along with distributed transactions, resilience, scalability and geo-distribution of data.

Speakers
avatar for Karthik Ranganathan

Karthik Ranganathan

Founder and CTO, Yugabyte
Karthik was one of the original database engineers at Facebook responsible for building distributed databases including Cassandra and HBase. He is an Apache HBase committer, and also an early contributor to Cassandra, before it was open-sourced by Facebook. He is currently the co-founder... Read More →


Thursday May 13, 2021 14:00 - 15:00 EDT
Room #3

14:30 EDT

Deploying Highly Available PostgreSQL With GKE
When you have an application running in Google Kubernetes Engine, there are multiple options and considerations for how to deploy a database. In this session, you'll learn some of the architectural considerations for choosing a database deployment option in GKE. We will demonstrate one of these options, as you learn how to configure PostgreSQL as a container in GKE based on regional persistent disks and PersistentVolumeClaims. Running PostgreSQL on regional persistent disks provides a RPO of zero in case of a zone outage and we will demonstrate how a failover takes place.

Speakers
avatar for Shashank Agarwal

Shashank Agarwal

Database Migrations Engineer, Google LLC
avatar for Christoph Bussler

Christoph Bussler

Solutions Architect, Google
Chris was always fascinated by systems and data integration between on-premises systems, clouds, and their combination. As a Solutions Architect at Google Cloud (Google, Inc.) he is focusing on databases, data migration, multi-cloud database deployments, and data integration in enterprise... Read More →


Thursday May 13, 2021 14:30 - 15:00 EDT
Room #8

14:30 EDT

Building A Customer Journey Using Domain Driven Design and GraphQL
Customers expect to have nuanced journeys in their interaction with several aspects of sales, including ordering, shipping and payments. For example, a customer may want to order using a voice channel, send a pinned location on a map as a delivery location and rely on self-service for returns and payments. These ordering journeys are characterized by a reliance on a mesh of API-driven apps. With Dgraph, you get out-of-the-box support for GraphQL APIs.

Another unique aspect of these journeys is the iterative style of development involved. Developers rely directly on feedback from active users, and constantly update the coding artifacts involved. A major impediment to rapid iterations is the time taken by developers to implement changes made to the data model. Developers tend to make changes to the database, and then refactor the API to accommodate the changes across the Create, Read, Update and Delete (CRUD) actions. In this talk, you will learn modeling techniques using GraphQL and Dgraph that support these rapid iteration needs.

Speakers
avatar for Anand Chandrashekar

Anand Chandrashekar

Principal Engineer, Dgraph Labs
Anand Chandrashekar is an experienced Solution Architect in areas of Master Data Management, Data Streaming and Microservices. In his free time, he likes to spend time with his family or play football / cricket.


Thursday May 13, 2021 14:30 - 15:00 EDT
Room #5

14:30 EDT

How to Cope With (Unexpected) Millupling of Your Workload?
At MessageBird we love APIs and our customers love them even more! One of our APIs allows our customers to send messages in bulk and we were able to cope with that for many years. As the number of large customers increases, so does the amount of bulk messages sent by our customers. How do you scale a system that's receiving 300 messages per second and, unexpectedly, receives 4 million messages? As the API team at MessageBird started to improve the performance of their APIs it became apparent that also the database required an overhaul.

Our ultimate goal was to move this workload to a sharded system (Vitess or DIY shards) but that required an extensive overhaul that could take months to complete. The focus of this talk is the steps we took before our move towards sharding. How can you extend the life of an existing system and buy time to work on the sharding step? We will cover the topics of RFCs, read-offloading, parallel replication but also topics like understanding the implications of UUIDs on the normal workload when a customer starts pushing 4 million messages through.

Speakers
avatar for Art van Scheppingen

Art van Scheppingen

Senior Database Engineer, MessageBird
Art van Scheppingen is a Senior Database Engineer at MessageBird with focus on database scalability and reliability. He's a pragmatic MySQL and Database expert with over 20 years experience in web development. He previously worked in various database architectural roles and as Senior... Read More →


Thursday May 13, 2021 14:30 - 15:00 EDT
Room #6

14:30 EDT

Database Hardware Selection Guidelines
Database servers have hardware requirements different from other infrastructure software, specifically unique demands on I/O and memory. This presentation covers these differences and various I/O options and their benefits. Topics include solid-state drives (SSD), battery-backed RAID, controllers, and caching. Though it references Postgres, the concepts apply to all relational databases.

Speakers
avatar for Bruce Momjian

Bruce Momjian

Postgres core team member, EDB VP and Postgres Evangelist, EDB
Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. He has been employed by EDB since 2006. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and... Read More →


Thursday May 13, 2021 14:30 - 15:00 EDT
Room #4

14:30 EDT

Dbdeployer in Action - Optimised MySQL Sandboxes
The Data Charmer will show how to use dbdeployer in the wild by answering the questions of lefred. This session will mimic an Ask Me Anything session where all chapters will be an answer to a specic question. Throught this list of questions, the audience will learn how to start with dbdeployer but also discover more advanced featured.
Join this team to see and learn how to use dbdeployer in the wild.

Speakers
avatar for Giuseppe Maxia

Giuseppe Maxia

Software Explorer and creator of tools, vmware
Formerly at MySQL AB, and then through acquisitions at Sun Microsystems and Oracle, and currently at Formerly at MySQL AB, and then through acquisitions at Sun Microsystems and Oracle, and currently at VMware through a merge. I am active member of the MySQL community and long timer... Read More →
avatar for Frédéric Descamps

Frédéric Descamps

MySQL Community Manager, Oracle
"@lefred" has been consulting OpenSource and MySQL for almost 20 years. After graduating in Management Information Technology, Frédéric Descamps started his career as a developer for an ERP under HPUX. He will then opt for a career in the world of open-source by joining one of the... Read More →


Thursday May 13, 2021 14:30 - 15:30 EDT
Room #7

14:30 EDT

Running Presto on AWS With Ahana Cloud
Presto, the fast-growing open source SQL query engine, disaggregates storage and compute and leverages all data within an organization for data-driven decision making. It is driving the rise of Amazon S3-based data lakes and on-demand cloud computing. In this session you'll learn how Ahana Cloud, the only managed service for Presto, simplifies Presto deployment & management on AWS using Kubernetes so data platforms teams of any size can use it.

Speakers
avatar for Gary Stafford

Gary Stafford

Solutions Architect, AWS
Gary is a solutions architect at AWS where he works with some of the world's largest Enterprise customers to understand their business drivers, assess application portfolios, and design reliable and cost-effective cloud native architectures. Previously he was an enterprise architect... Read More →
avatar for Dipti Borkar

Dipti Borkar

Cofounder & Chief Product Officer, Ahana
Dipti Borkar is the Cofounder, Chief Product Officer & Chief Evangelist at Ahana, the Presto company. She is responsible for all things strategy, product and community. She is also the Chairperson of the Presto Foundation, Outreach team. She has over 15 years of experience in data... Read More →


Thursday May 13, 2021 14:30 - 15:30 EDT
Room #10

15:00 EDT

Docstore - Uber’s Highly Scalable Distributed SQL Database
Uber had 93 million monthly active platform consumers in Q4 2020 and there were more than 5 billion trips on our platform in 2020 alone. No wonder we have to deal with a massive volume of data. The real-time nature of the Uber platform also imposes certain restrictions related to availability and consistency.

This is exactly why we built Docstore. Docstore is a general-purpose multi-model database that provides a strict serializability consistency model on a partition level and can scale horizontally to serve high volume workloads. It is currently in production and is serving business-critical use cases.

In this session we will be doing an in-depth study of the architecture of Docstore.

Speakers
avatar for Ovais Tariq

Ovais Tariq

Senior Manager, Uber Technologies
Ovais is a Sr. Manager in the Core Storage team at Uber. He leads the Operational Storage Platform group with a focus on providing a world-class platform that powers all the critical business functions and lines of business at Uber. The platform serves tens of millions of QPS with... Read More →
avatar for Himank Chaudhary

Himank Chaudhary

Staff Software Engineer, Uber Technologies
Himank is the Tech Lead of Docstore at Uber. His primary focus area is building distributed databases that scale along with Uber's hyper-growth. Prior to Uber, he worked at Yahoo in the mail backend team to build a metadata store. Himank holds a master's degree in Computer Science... Read More →


Thursday May 13, 2021 15:00 - 15:30 EDT
Room #5

15:00 EDT

Monitoring Hundreds of RDS PostgreSQL Instances with PMM: The Rappi Case
The popularity of DBaaS cannot be denied. They are incredibly helpful for growth not only due to the operational tasks assistance but also for monitoring and visibility of database internals. In the case of RDS, Amazon provides pretty cool features other than the well-known CloudWatch: things like Enhanced Monitoring or Performance Insights are fantastic....but they came with a (unusually high) cost.

Enter PMM: The Percona Monitoring and Managing tool. PMM being highly customizable and based on well-known open source tools, appears as a great alternative, especially for DBA teams that require deep understanding of what is going on inside the databases.

However, PMM also require a considerable amount of time and effort to have it the way we wanted, especially for PostgreSQL.

Our journey involves work on several aspects like:
- PMM server capacity
- Limitations due to being at a DBaaS
- Several dashboard customization
- Additional data sources via custom queries and textfile-collectors
- Grafana tune
- Prometheus magic
- And some hacking...

Speakers
avatar for Daniel Guzman Burgos

Daniel Guzman Burgos

Performance & Scalability DBA, Rappi Inc.
Daniel studied Electronic Engineering, but quickly becomes interested in all data things. He has worked as a DBA since 2007 for several companies including a 7 years journey at Percona as the MySQL Tech Lead for the Managed Services department. He is currently a member of the Performance... Read More →
RC

Rodrigo Cadaval

Database Engineer Lead, Rappi Inc.
Rodrigo studies Information Systems Engineering. Started working in 2014 as Full Stack Developer (PHP - Laravel) and data was always his main focus and interest, reaching the point of replacing multiple backend processes with stored procedures. In 2016 he becomes PostgreSQL DBA, incorporating... Read More →


Thursday May 13, 2021 15:00 - 16:00 EDT
Room #4

15:00 EDT

OtterTune: Using Machine Learning to Automatically Optimize Database Configurations
Database management systems (DBMS) expose dozens of configurable knobs that control their runtime behavior. Setting these knobs correctly for an application's workload can improve the performance and efficiency of the DBMS. But such tuning requires considerable efforts from experienced administrators, which is not scalable for large DBMS fleets. This problem has led to research on using machine learning (ML) to devise strategies to optimize DBMS knobs for any application automatically. The OtterTune database tuning service from Carnegie Mellon uses ML to generate and install optimized DBMS configurations. OtterTune observes the DBMS's workload through its metrics and then trains recommendation models that select better knob values. It then reuses these models to tune other DBMSs more quickly.

In this talk, I will present an overview of OtterTune and discuss the challenges one must overcome to deploy an ML-based service for DBMSs. I will also highlight the insights we learned from real-world installations of OtterTune to tune MySQL, PostgreSQL, and Oracle.

Speakers
avatar for Andy Pavlo

Andy Pavlo

Associate Professor (CMU), Co-Founder (OtterTune), Carnegie Mellon University
[Andy Pavlo](http://www.cs.cmu.edu/~pavlo/) is an Associate Professor of Databaseology in the Computer Science Department at Carnegie Mellon University. He is also the co-founder of [OtterTune](https://ottertune.com).


Thursday May 13, 2021 15:00 - 16:00 EDT
Room #1

15:00 EDT

Prepping Kubernetes for Stateful Workloads Pt.1
Data and Kubernetes have historically had an oil and water relationship. Keeping data alive in an ecosystem where everything is ephemeral is a messy proposition at best. Many solutions opt to simply use external services for anything that needs to survive past the typical life cycle of a pod.

In this 2-part hands-on session, I will cover an open-source solution and some best practices to get whatever stateful workloads you have up and running and keep them around long after the pods of today are a distant memory.

Speakers
avatar for Eric Zietlow

Eric Zietlow

Director of Developer Relations, MayaData
Eric has been everything from a full-stack developer to a distributed systemssolutions architect. He takes his varied experience into his current role in developerrelations at MayaData and as an ambassador for the Data on Kubernetes Community.


Thursday May 13, 2021 15:00 - 16:00 EDT
Room #2

15:30 EDT

Efficiently Deploying PostgreSQL Instances
In this talk we will review how to deploy PostgreSQL environments, to be able to have any version of PostgreSQL running within minutes... or even seconds! We will show you what cool tools we use in the Percona Support team to efficiently deploy from standalone servers to more complex replication and HA topologies. After attending, you will have all the knowledge you need to start testing your applications against fully functional PostgreSQL instances... fast!

Speakers
avatar for Agustín Gallego

Agustín Gallego

Support Engineer, Percona
Agustín joined Percona's Support team in December 2013. He has previously worked as a Cambridge IT examinations Supervisor and as a Junior BI, SQL & C# developer. He is studying to get a Computer Systems Engineer degree at the Universidad de la República, in Uruguay.


Thursday May 13, 2021 15:30 - 16:00 EDT
Room #6

15:30 EDT

Presto and Apache Iceberg
Apache Iceberg is an open table format for huge analytic datasets. At Twitter, engineers are working on the Presto-Iceberg connector, aiming to bring high-performance data analytics on Iceberg to the Presto ecosystem. In this session, Chunxu will share what they have learned during the development and the future work of interactive queries.

Speakers
avatar for Chunxu Tang

Chunxu Tang

Software Engineer, Twitter
Chunxu is a software engineer in Twitter's Interactive Query team where he works on developing and maintaining Presto and Druid services. He received his doctoral degree from Syracuse University, where he did research on machine learning and distributed collaboration systems.


Thursday May 13, 2021 15:30 - 16:00 EDT
Room #10

15:30 EDT

MySQL Architectures in a Nutshell
Following MySQL InnoDB Cluster as our first, fully integrated MySQL High Availability solution based on Group Replication, MySQL Shell 8.0.19 includes MySQL InnoDB ReplicaSet which delivers another complete solution, this time based on MySQL Replication.

The basic idea for InnoDB ReplicaSet is to do the same for classic MySQL Replication as InnoDB Cluster did for Group Replication. We take a strong technology that is very powerful but can be complex, and provide an easy-to-use AdminAPI for it in the MySQL Shell.

In just a few easy to use Shell commands, a MySQL Replication database architecture can be configured from scratch including:

  • Data provisioning using MySQL CLONE
  • Setting up replication
  • Performing manual switchover/failover.

and we keep improving, join the session to discover the last developments related to MySQL Database Architectures.

Speakers
avatar for Kenny Gryp

Kenny Gryp

MySQL Product Manager, Oracle MySQL
MySQL Product Manager focussing on InnoDB, Replication and all things High Availability.


Thursday May 13, 2021 15:30 - 16:30 EDT
Room #7

16:00 EDT

How Twitter Runs Presto at Scale in the Cloud
Presto is a widely adopted federated SQL engine for federated querying across multiple data sources. With Presto, you can perform ad hoc querying of data in place.

In this session, Twitter engineer Beinan Wang will share how they use Presto at scale with over 3K Presto workers and 10 million queries and a highly-scalable query predictor service. At Twitter, this service helps improve the performance of Presto clusters and provides expected execution statistics on Business Intelligence dashboards.

Speakers
avatar for Beinan Wang, Ph.D.

Beinan Wang, Ph.D.

Sr. Software Engineer, Twitter
Beinan builds large scale distributed SQL systems (presto&hive) for Twitter's data platform team.


Thursday May 13, 2021 16:00 - 16:30 EDT
Room #10

16:00 EDT

MySQL & PostgreSQL Migration to AWS at Groupon
Strategies & Challenges while migrating MySQL5.6 & PostgreSQL 9.4 from on-premise into AWS world.
- Database Topologies On-premise/Cloud
- Database Versions
- Migration Methods
- Prerequisites for the migration
- Migration Paths
- Multi-tenants
- What is different in AWS
- Checklist for onboarding
- Cutover Process

Speakers
avatar for Mani Subramanian

Mani Subramanian

Sr. Manager, Global Database Services, Groupon


Thursday May 13, 2021 16:00 - 17:00 EDT
Room #1

16:00 EDT

Inspecting MySQL servers: The Percona Support Way
You are handled a MySQL, MariaDB, or Percona Server database server and asked to have a look at it, to check if the server is "well-tuned" and whether there is anything obviously wrong with it; where do you start?

This may sound like an overwhelming task for a beginner DBA, and MySQL doesn't really facilitate much on this front with its dozens of customizable variables. There are, however, multiple ways to approach this challenge.

At Percona Support, we have a method for such an initial assessment, one that has been crafted in the early years of the company and improved by all the Support Engineers, remote DBAs, and Consultants that have worked and still work in the Services team. During this talk, I will walk you through this method, explaining how you can make use of a handful of tools available in the Percona Toolkit to extract the diagnostics data we need from the server and how to interpret the main points.

Speakers
avatar for Fernando Laudares Camargos

Fernando Laudares Camargos

Senior Support Engineer, Percona
Fernando joined Percona in early 2013 after 8 years working for a Canadian company specialized in Linux and Open Source technologies. As a member of Percona's Support team, Fernando works closely with customers helping them troubleshoot issues with MySQL, PostgreSQL, and MongoDB servers... Read More →


Thursday May 13, 2021 16:00 - 17:00 EDT
Room #3

16:00 EDT

A Sharding Tale: Then, Now, There, and Back Again
Sharding in MongoDB is used to horizontally scale databases by distributing large data sets across multiple machines. In the past, this usually involved using lower-cost commodity hardware in your data center or infrastructure. More recently this often involves adding more storage and more compute nodes to your cloud, hybrid, or on-prem environment. Either way, sharding is one of the most important features for scaling out your MongoDB environment. This presentation will cover how sharding has changed from the earlier versions of MongoDB to now - focusing on changes from 3.6 immutable shard keys to the latest 4.4 version with its renable shard keys. We will discuss proposed changes coming in 5.0. We will take a look at sharding feature enhancements but we will also take a look at the real-life, practical impacts to performance, storage, and other aspects of your overall application. Finally, we will cover shard key selection tips and how to implement sharding in the most effective ways.

Speakers
avatar for Kimberly Wilkins

Kimberly Wilkins

MongoDB Technical Lead, Percona
Kimberly Wilkins, MongoDB Technical Lead - has over 20 years experience managing and architecting database systems using both relational and NoSQL technologies to help customers across a wide variety of industry verticals including vehicle inventory management and auctions for now... Read More →


Thursday May 13, 2021 16:00 - 17:00 EDT
Room #6

16:00 EDT

Prepping Kubernetes for Stateful Workloads Pt.2
Data and Kubernetes have historically had an oil and water relationship. Keeping data alive in an ecosystem where everything is ephemeral is a messy proposition at best. Many solutions opt to simply use external services for anything that needs to survive past the typical life cycle of a pod.

In this 2 part hands-on session, I will cover an open-source solution and some best practices to get whatever stateful workloads you have up and running and keep them around long after the pods of today are a distant memory.

Speakers
avatar for Eric Zietlow

Eric Zietlow

Director of Developer Relations, MayaData
Eric has been everything from a full-stack developer to a distributed systemssolutions architect. He takes his varied experience into his current role in developerrelations at MayaData and as an ambassador for the Data on Kubernetes Community.


Thursday May 13, 2021 16:00 - 17:00 EDT
Room #2

16:00 EDT

THE MANY FLAVORS OF REPLICATION
This talk presents an overview of the many forms of replication currently supported in Postgres.

Here's a breakdown of the topics that will be covered:
- We start with the replication configurations and which includes:
- multi node PRIMARY-STANDBY replication cluster
- Cascading Replication
- Other:
- Replicating to another READ-WRITE host
- active-active
- analytics
- Replicating to a READ-ONLY host using a time delay
- Working with detached READ-ONLY systems

- There are three (3) forms of replicating technologies that have been developed for database systems:
- statement replication
- trigger replication
- binary (the most commonly used solution)

- There are two types, or classifications, of replication which is used by Postgres:
- asynchronous replication
- synchronous replication

Of course this is by no means a complete list as there are so many methods and their variations which are possible in Postgres.

Speakers
avatar for Robert Bernier

Robert Bernier

PostgreSQL Consultant, Percona
Robert's experience extends several decades. His first experience was playing hangman on a DECwriter shortly after man first landed on the moon. His foray into commercial applications was programming Fortran, via punchcards, on an IBM 360 which in those days had 4MB RAM. Over the... Read More →


Thursday May 13, 2021 16:00 - 17:00 EDT
Room #5
 
  • Timezone
  • Filter By Date Percona Live Online May 12 -13, 2021
  • Filter By Venue Venues
  • Filter By Type
  • Altinity Community Track
  • Amazon
  • Amazon Aurora Community Track
  • Data on Kubernetes Community Track
  • Deployment
  • Google Community Track
  • HA/Cluster
  • Hybrid or Mixed Deployments
  • IDE
  • Keynote
  • Kubernetes
  • Management & Backup
  • MariaDB Community Track
  • Microsoft
  • MongoDB
  • Monitoring
  • MySQL
  • MySQL Community Track
  • OpenSearch Community Track
  • Other
  • Other Cloud
  • Other NoSQL
  • Other OSDB Topics
  • Other SQL
  • PostgreSQL
  • Presto Community Track