Loading…
Welcome to Percona Live Online 2021
Online Open Source Database Conference
REGISTER HERE!
Other Cloud [clear filter]
Wednesday, May 12
 

06:30 EDT

Building and Scaling a Robust Zero-Code Data Pipeline With Open Source Technologies [30min]
With the rapid onset of the global Covid-19 Pandemic in 2020 the USA Centers for Disease Control and Prevention (CDC) quickly implemented a new Covid-19 pipeline to collect testing data from all of the USA’s states and territories, and produce multiple consumable results for federal and public agencies. They did this in under 30 days, using Apache Kafka.

We built a similar (but simpler) demonstration pipeline for ingesting, indexing, and visualizing some publicly available tidal data using multiple open source technologies including Apache Kafka, Apache Kafka Connect, Apache Camel Kafka Connectors, Open Distro for Elasticsearch and Kibana, Prometheus and Grafana.

In this talk, we introduce each technology, the pipeline architecture, and walk through the steps, challenges and solutions to build an initial integration pipeline to consume USA National Oceanic and Atmospheric Administration (NOAA) Tidal data, map and index the data types in Elasticsearch, and add missing data with an ingest pipeline. The goal being to visualize the results with Kibana, where we’ll see the period of the “Lunar” day, and the size and location of some small and large tidal ranges.

But what can go wrong? The initial pipeline only worked briefly, failing when it encountered exceptions. To make the pipeline more robust, we investigated Apache Kafka Connect exception handling, and evaluated the benefits of using Apache Camel Kafka Connectors, and Elasticsearch schema validation.

With a sufficiently robust pipeline in place, it’s time to scale it up. The first step is to select and monitor the most relevant metrics, across multiple technologies. We configured Prometheus to collect the metrics, and Kibana to produce a dashboard. With the monitoring in place we were able to systematically increase the pipeline throughput by increasing Kafka connector tasks, while watching out for potential bottlenecks. We discovered, and fixed, two bottlenecks in the pipeline, proving the value of this approach to pipeline scaling.

We conclude the presentation with lessons learned so far, and some potential future challenges.

Speakers
avatar for Paul Brebner

Paul Brebner

Open Source Technology Evangelist, Instaclustr by NetApp
Open Source Technology Evangelist at Instaclustr by NetApp. For the last 5 years, Paul has been learning new open source technologies, building realistic demonstration applications, writing blogs, and presenting at international conferences including FOSSASIA, All Things Open and... Read More →


Wednesday May 12, 2021 06:30 - 07:00 EDT
Room #4

11:30 EDT

Postgres HA in the Hybrid Cloud, a Look Under the Hood of Implementing Patroni on Multiple Clouds
Patroni has quickly become recognized as the state-of-the art "high-availability" standard for running Postgres in mission-critical enterprises and public clouds. But Patroni is not a fully architected software system, but a template of best practices to implement a highly-available architecture for running Postgres.

In this presentation we will share some of the early experiences building on the original Patroni open-source project, key architecture and design advantages over alternative approaches. Building a close relationship with the core Nutanix storage platform, we were able to gain some significant performance improvements and satisfy very stringent requirements in failover time and data protection capabilities.

Additionally, we will share some of our recent experiences adopting the recent release of Patroni 2.0 and implementing more advanced capabilities.

Speakers
avatar for Manish Pratap Singh

Manish Pratap Singh

Staff Software Engineer @ Nutanix Era, Nutanix Inc.
Manish leads the open-source database team in the Nutanix Era product group, managing the implementations of Postgres, MySQL and MariaDB. Previous to joining Nutanix, Manish was a Senior MTS at Oracle, working on the Parallel Query framework in the Oracle database engine.
avatar for Mehboob Alam

Mehboob Alam

Sr. Solutions Architect, Nutanix, Inc.
Mehboob is a long-time open-source advocate and evangelist in the Postgres community, co-organizer of various community meetups and the annual global Postgres US conference. At Nutanix, he guides the development and support of Postgres in the Era DBaaS platform and helps customers... Read More →


Wednesday May 12, 2021 11:30 - 12:00 EDT
Room #6
 
Thursday, May 13
 

11:00 EDT

Scaling Out Distributed Storage Fabric with RocksDB
Engineers at Nutanix have been working on the challenge of building a next-generation architecture for its distributed storage fabric. Scaling this architecture to the needs of the future required three primary objectives: significant improvements in sustained random write performance, support for large-capacity deep storage nodes for multi-petabyte scale and reducing storage latency by a significant magnitude.
 
These goals required re-imagining the core approach to how metadata is stored in the fabric management system and move the metadata closer to where is the data is stored.
 
After extensive research and testing, RocksDB was chosen as the core component for this project, based on its open-source pedigree and proven reliability and industry adoption. Within a few months, the engineering team was able to ramp up expertise, build confidence with the open-source technology and eventually grow its adoption into several core products at Nutanix.
 
In this technical talk, we will share the new architecture, deployment mode and some of the early lessons learned in adopting RocksDB and discuss some innovative enhancements we were able to make to fit our performance goals and objectives.
 
One of the significant improvements has been the addition of async read/write support to RocksDB. Currently, the open source RocksDB exposes blocking I/O APIs which can limit overall system throughput under resource constraints. We developed a Fibers/Co-routine based non-blocking I/O solution for RocksDB.
 
In addition to this, we plan to talk about topics and projects that have been built on this enhanced RocksDB implementation.
These projects will become the foundation for the Nutanix future products.

Speakers
YK

Yasaswi Kishore

Senior member of Technical Staff, Nutanix
Yasaswi is a senior member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Yasaswi completed his undergraduate program in Computer Science at PES University, Bangalore, India.
avatar for Sandeep Madanala

Sandeep Madanala

Nutanix
Sandeep is a Senior technical manager in the metadata subsystem for Nutanix distributed filesystem. He leads and manages the ChakrDB team, a scale out KV Store built on top of RocksDB. Prior to Nutanix, Sandeep worked at VMWare and graduated from Indian Institute of Technology, M... Read More →
avatar for Raghav Tulshibagwale

Raghav Tulshibagwale

Staff Engineer, Core Data Path, Nutanix Inc.
Raghav is a Staff engineer and technical lead in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Raghav worked on Database Kernels and filesystems. Raghav completed his Masters in Computer Science from University of Southern California, Los Angeles.
avatar for Pulkit Kapoor

Pulkit Kapoor

MTS, Core Data Path, Nutanix, Inc.
Pulkit is a member of technical staff in the metadata subsystem for Nutanix distributed filesystem. Prior to Nutanix, Pulkit completed his Masters in Computer Science at University of Wisconsin, Madison.


Thursday May 13, 2021 11:00 - 12:00 EDT
Room #4

13:30 EDT

MySQL High Availability Options in the Cloud - Compared
High availability is one of the most important characteristics of a mission-critical database environment, doesn’t matter where it runs. Running MySQL in the public cloud is really easy these days. Pick a cloud provider, MySQL service, and start using it. Each service is different, though. AWS RDS, AWS Aurora, Google Cloud SQL, Azure Database for MySQL, Oracle MySQL Database Service - each service provides different high availability options and guarantees. Navigating through the options may be difficult and wrong choices may have a real business impact! Do you know how many “nines” you can’t count on?

In this session, Michal will discuss the true characteristics of MySQL cloud services' high availability options, their cost impact, and DIY on IaaS alternatives - so the next time you’re choosing how to run your MySQL in the cloud, you’ll be able to make a well-informed decision. Additionally, you’ll learn about some of the typical misconceptions concerning the most popular MySQL cloud services.

If you’re given the number of “nines” to guarantee but a myriad of choices and true cost still mystifies you, or (what’s worse!) you think that you don’t need to care because the cloud is “always-on” - this talk is for you.

Speakers
avatar for Michal Nosek

Michal Nosek

Enterprise Architect, Percona
During ten years of his career, Michal took different roles from a software engineer and business analyst to a technical sales consultant, always staying close to the technology. He has hands-on experience with a broad range of programming languages and database technologies in different... Read More →


Thursday May 13, 2021 13:30 - 14:30 EDT
Room #4
 
  • Timezone
  • Filter By Date Percona Live Online May 12 -13, 2021
  • Filter By Venue Venues
  • Filter By Type
  • Altinity Community Track
  • Amazon
  • Amazon Aurora Community Track
  • Data on Kubernetes Community Track
  • Deployment
  • Google Community Track
  • HA/Cluster
  • Hybrid or Mixed Deployments
  • IDE
  • Keynote
  • Kubernetes
  • Management & Backup
  • MariaDB Community Track
  • Microsoft
  • MongoDB
  • Monitoring
  • MySQL
  • MySQL Community Track
  • OpenSearch Community Track
  • Other
  • Other Cloud
  • Other NoSQL
  • Other OSDB Topics
  • Other SQL
  • PostgreSQL
  • Presto Community Track

Filter sessions
Apply filters to sessions.