Percona Live Online: Full Schedule

Welcome to Percona Live Online 2021
Online Open Source Database Conference
REGISTER HERE!

06:30 EDT

Building and Scaling a Robust Zero-Code Data Pipeline With Open Source Technologies [30min]

With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.

We built a similar (but simpler) demonstration pipeline for ingesting, indexing, and visualizing some publicly available tidal data using multiple open source technologies including Apache Kafka, Apache Kafka Connect, Apache Camel Kafka Connectors, Open Distro for Elasticsearch and Kibana, Prometheus and Grafana.

In this talk, we introduce each technology, the pipeline architecture, and walk through the steps, challenges and solutions to build an initial integration pipeline to consume USA National Oceanic and Atmospheric Administration (NOAA) Tidal data, map and index the data types in Elasticsearch, and add missing data with an ingest pipeline. The goal being to visualize the results with Kibana, where we’ll see the period of the “Lunar” day, and the size and location of some small and large tidal ranges.

But what can go wrong? The initial pipeline only worked briefly, failing when it encountered exceptions. To make the pipeline more robust, we investigated Apache Kafka Connect exception handling, and evaluated the benefits of using Apache Camel Kafka Connectors, and Elasticsearch schema validation.

With a sufficiently robust pipeline in place, it’s time to scale it up. The first step is to select and monitor the most relevant metrics, across multiple technologies. We configured Prometheus to collect the metrics, and Kibana to produce a dashboard. With the monitoring in place we were able to systematically increase the pipeline throughput by increasing Kafka connector tasks, while watching out for potential bottlenecks. We discovered, and fixed, two bottlenecks in the pipeline, proving the value of this approach to pipeline scaling.

We conclude the presentation with lessons learned so far, and some potential future challenges.

Speakers

Paul Brebner

Open Source Technology Evangelist, Instaclustr by NetApp

Open Source Technology Evangelist at Instaclustr by NetApp. For the last 5 years, Paul has been learning new open source technologies, building realistic demonstration applications, writing blogs, and presenting at international conferences including FOSSASIA, All Things Open and... Read More →

Wednesday May 12, 2021 06:30 - 07:00 EDT
Room #4

Other Cloud, Cloud Technologies

11:30 EDT

Postgres HA in the Hybrid Cloud, a Look Under the Hood of Implementing Patroni on Multiple Clouds

Patroni has quickly become recognized as the state-of-the art "high-availability" standard for running Postgres in mission-critical enterprises and public clouds. But Patroni is not a fully architected software system, but a template of best practices to implement a highly-available architecture for running Postgres.

In this presentation we will share some of the early experiences building on the original Patroni open-source project, key architecture and design advantages over alternative approaches. Building a close relationship with the core Nutanix storage platform, we were able to gain some significant performance improvements and satisfy very stringent requirements in failover time and data protection capabilities.

Additionally, we will share some of our recent experiences adopting the recent release of Patroni 2.0 and implementing more advanced capabilities.

Speakers

Manish Pratap Singh

Staff Software Engineer @ Nutanix Era, Nutanix Inc.

Manish leads the open-source database team in the Nutanix Era product group, managing the implementations of Postgres, MySQL and MariaDB. Previous to joining Nutanix, Manish was a Senior MTS at Oracle, working on the Parallel Query framework in the Oracle database engine.

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.

Mehboob is a long-time open-source advocate and evangelist in the Postgres community, co-organizer of various community meetups and the annual global Postgres US conference. At Nutanix, he guides the development and support of Postgres in the Era DBaaS platform and helps customers... Read More →

Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #6

Other Cloud, Cloud Technologies

11:00 EDT

Scaling Out Distributed Storage Fabric with RocksDB

Engineers at Nutanix have been working on the challenge of building a next-generation architecture for its distributed storage fabric. Scaling this architecture to the needs of the future required three primary objectives: significant improvements in sustained random write performance, support for large-capacity deep storage nodes for multi-petabyte scale and reducing storage latency by a significant magnitude.

These goals required re-imagining the core approach to how metadata is stored in the fabric management system and move the metadata closer to where is the data is stored.

After extensive research and testing, RocksDB was chosen as the core component for this project, based on its open-source pedigree and proven reliability and industry adoption. Within a few months, the engineering team was able to ramp up expertise, build confidence with the open-source technology and eventually grow its adoption into several core products at Nutanix.

In this technical talk, we will share the new architecture, deployment mode and some of the early lessons learned in adopting RocksDB and discuss some innovative enhancements we were able to make to fit our performance goals and objectives.

One of the significant improvements has been the addition of async read/write support to RocksDB. Currently, the open source RocksDB exposes blocking I/O APIs which can limit overall system throughput under resource constraints. We developed a Fibers/Co-routine based non-blocking I/O solution for RocksDB.

In addition to this, we plan to talk about topics and projects that have been built on this enhanced RocksDB implementation.
These projects will become the foundation for the Nutanix future products.

Speakers

Yasaswi Kishore

Senior member of Technical Staff, Nutanix

Yasaswi is a senior member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Yasaswi completed his undergraduate program in Computer Science at PES University, Bangalore, India.

Sandeep Madanala

Nutanix

Sandeep is a Senior technical manager in the metadata subsystem for Nutanix distributed filesystem. He leads and manages the ChakrDB team, a scale out KV Store built on top of RocksDB. Prior to Nutanix, Sandeep worked at VMWare and graduated from Indian Institute of Technology, M... Read More →

Raghav Tulshibagwale

Staff Engineer, Core Data Path, Nutanix Inc.

Raghav is a Staff engineer and technical lead in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Raghav worked on Database Kernels and filesystems. Raghav completed his Masters in Computer Science from University of Southern California, Los Angeles.

Pulkit Kapoor

MTS, Core Data Path, Nutanix, Inc.

Pulkit is a member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Pulkit completed his Masters in Computer Science at University of Wisconsin, Madison.

Thursday May 13, 2021 11:00 - 12:00 EDT
Room #4

Other Cloud, Cloud Technologies

13:30 EDT

MySQL High Availability Options in the Cloud - Compared

High availability is one of the most important characteristics of a mission-critical database environment, doesn’t matter where it runs. Running MySQL in the public cloud is really easy these days. Pick a cloud provider, MySQL service, and start using it. Each service is different, though. AWS RDS, AWS Aurora, Google Cloud SQL, Azure Database for MySQL, Oracle MySQL Database Service - each service provides different high availability options and guarantees. Navigating through the options may be difficult and wrong choices may have a real business impact! Do you know how many “nines” you can’t count on?

In this session, Michal will discuss the true characteristics of MySQL cloud services' high availability options, their cost impact, and DIY on IaaS alternatives - so the next time you’re choosing how to run your MySQL in the cloud, you’ll be able to make a well-informed decision. Additionally, you’ll learn about some of the typical misconceptions concerning the most popular MySQL cloud services.

If you’re given the number of “nines” to guarantee but a myriad of choices and true cost still mystifies you, or (what’s worse!) you think that you don’t need to care because the cloud is “always-on” - this talk is for you.

Speakers

Michal Nosek

Enterprise Architect, Percona

During ten years of his career, Michal took different roles from a software engineer and business analyst to a technical sales consultant, always staying close to the technology. He has hands-on experience with a broad range of programming languages and database technologies in different... Read More →

Thursday May 13, 2021 13:30 - 14:30 EDT
Room #4

Other Cloud, Cloud Technologies